[Dxspider-support] BadIP files - duplicate data.

Kin ea3cv at cronux.net
Fri Feb 24 10:01:49 GMT 2023


Hi Tim,

 

I think Dirk will be able to solve the problem.

 

But for those who want to remove duplicates for the time being, just do it:

sort badip.torrelay | uniq > badip.torrelay

 

Or modify the line in crontab by putting the following:

 

30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torrelay;sort badip.torrelay | uniq > badip.torrelay')

 

Thanks for the info.

 

Kin EA3CV

 

De: Dxspider-support <dxspider-support-bounces at tobit.co.uk> En nombre de du3tw via Dxspider-support
Enviado el: viernes, 24 de febrero de 2023 6:46
Para: dxspider-support at tobit.co.uk
CC: charges.larder0p at icloud.com
Asunto: [Dxspider-support] BadIP files - duplicate data.

 

I looked at the excellent suggestion for updating bad 

 

namely having this as a crontab 

 

30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torexit')

30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torrelay')

30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.global')

31 * * * * run_cmd('load/badip')

 

 

However, the source files contain main duplicates - which should be removed.

 

cd /tmp

wget -qN http://www.dxspider.net/download/badip.torexit

 

The number of lines in this file is  calculated using "wc -l badip.torexit <http://www.dxspider.net/download/badip.torexit> ", and outputs 1658 

Running through a basic de-dupe "sort  badip.torexit | uniq | wc -l”, outputs 1173 

 

 

It would be more optimal if this data filtering is done on www.dxspider.net <http://www.dxspider.net>  (he asked nicely)

 

sort badip.torrelay | wc -l

9450

sort badip.torrelay | uniq | wc -l

8115

 

 badip.global is already without duplicates having very few record in it.

 

Not sure who can process this suggestion ….

 

  regards

 

    Tim, DU3TW

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20230224/d702ca76/attachment-0001.htm>


More information about the Dxspider-support mailing list