[Dxspider-support] BadIP files - duplicate data.

charges.larder0p at icloud.com charges.larder0p at icloud.com
Fri Feb 24 05:46:03 GMT 2023


I looked at the excellent suggestion for updating bad 

namely having this as a crontab 

30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torexit')
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torrelay')
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.global')
31 * * * * run_cmd('load/badip')


However, the source files contain main duplicates - which should be removed.

cd /tmp
wget -qN http://www.dxspider.net/download/badip.torexit <http://www.dxspider.net/download/badip.torexit>

The number of lines in this file is  calculated using "wc -l badip.torexit <http://www.dxspider.net/download/badip.torexit>", and outputs 1658 
Running through a basic de-dupe "sort  badip.torexit | uniq | wc -l”, outputs 1173 


It would be more optimal if this data filtering is done on www.dxspider.net <http://www.dxspider.net/> (he asked nicely)

sort badip.torrelay | wc -l
9450
sort badip.torrelay | uniq | wc -l
8115

 badip.global is already without duplicates having very few record in it.

Not sure who can process this suggestion ….

  regards

    Tim, DU3TW
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20230224/87d0cedd/attachment.htm>


More information about the Dxspider-support mailing list