[Dxspider-support] BadIP files - duplicate data.
Kin
ea3cv at cronux.net
Fri Feb 24 10:01:49 GMT 2023
Hi Tim,
I think Dirk will be able to solve the problem.
But for those who want to remove duplicates for the time being, just do it:
sort badip.torrelay | uniq > badip.torrelay
Or modify the line in crontab by putting the following:
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torrelay;sort badip.torrelay | uniq > badip.torrelay')
Thanks for the info.
Kin EA3CV
De: Dxspider-support <dxspider-support-bounces at tobit.co.uk> En nombre de du3tw via Dxspider-support
Enviado el: viernes, 24 de febrero de 2023 6:46
Para: dxspider-support at tobit.co.uk
CC: charges.larder0p at icloud.com
Asunto: [Dxspider-support] BadIP files - duplicate data.
I looked at the excellent suggestion for updating bad
namely having this as a crontab
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torexit')
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.torrelay')
30 * * * * spawn('cd /spider/local_data; wget -qN http://www.dxspider.net/download/badip.global')
31 * * * * run_cmd('load/badip')
However, the source files contain main duplicates - which should be removed.
cd /tmp
wget -qN http://www.dxspider.net/download/badip.torexit
The number of lines in this file is calculated using "wc -l badip.torexit <http://www.dxspider.net/download/badip.torexit> ", and outputs 1658
Running through a basic de-dupe "sort badip.torexit | uniq | wc -l”, outputs 1173
It would be more optimal if this data filtering is done on www.dxspider.net <http://www.dxspider.net> (he asked nicely)
sort badip.torrelay | wc -l
9450
sort badip.torrelay | uniq | wc -l
8115
badip.global is already without duplicates having very few record in it.
Not sure who can process this suggestion ….
regards
Tim, DU3TW
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.tobit.co.uk/pipermail/dxspider-support/attachments/20230224/d702ca76/attachment-0001.htm>
More information about the Dxspider-support
mailing list