[Dxspider-support] Unicode support?
Dirk Koopman G1TLH
gb7tlh at dxcluster.org
Sat Oct 23 11:21:47 BST 2010
On 22/10/10 23:55, Brendan Minish wrote:
> I just did a look at talk using the cluster console application but I
> get the same output via telnet from my linux desktop when sysop
> sh/talk and see some output that looks like the following
> 21Oct2010 at 03:38:06 RZ4FA -> UA4HBW (>WR3D) � ������ �� ����
> 21Oct2010 at 03:39:08 UA4HBW -> RZ4FA (<S50DXS) �� ����� ��� �� � ����� ������ T IK2HKT �� ���������, �� �� ��������, � �� ����� ����� ���������
> 21Oct2010 at 03:39:30 RZ4FA -> UA4HBW (>WR3D) �� �������� �������� -)
> I presume that this output is a local character set issue and the the
> talk messages are in Cyrillic, do I need to do something to add Unicode
> support or does spider not use unicode ?
> It appears that my server is using unicode but do I need to do anything
> to add unicode support to perl?
> It would be nice if talk was consistent in how it displays
Ah, that new fangled character representation wot we are all using - not?
In answer to your question, I would absolutely *love* to go over to
utf8. All modern perls support it. There are only few teensy problems
that need to be ironed out:
* There is (and has never been) any standard.
* Not all DXSpider nodes run utf8 as their native encoding.
There are still several iso-8859 or even codepage character sets out
there. This is either because nodes are running on old Linuxes / Windows
XP implementations or old versions of perl.
* What does one do with stuff coming in which is not utf8?
Because it *will* happen. Either for the reasons set out below or just
because. My policy has always been to pass stuff coming in - out
unaltered. I try to extract the text as best I can for de-duping, but is
imperfect at best.
* And then there is the competition.
Only Lee's stuff is AFAIK still been developed in any meaningful way, AR
Cluster seems to have shuddered to a halt and the author's site/node is
unavailable a lot of the time. Oh and he refuses to talk to me (possibly
because he thinks I ruined his business model). I would *love* to be
able to interface to his stuff better in any case.
In theory all modern(ish) Windows software is UTF (but I am prepared to
be recalibrated on that) so they should be easy to fix. However, there
are nodes running on earlier versions of Windows that seem to emit
non-utf8 characters and there seems to be no standard on what software
does to text coming in non-local character sets.
This leads to some of the duplicates that one sometimes sees. Because
some software alters the text when it re-emits a PCxx sentence. Dunno
whether it is one package or all of them.
It is, frankly, a mess. Unless we authors can a) agree a standard b)
implement it and c) get all our users to upgrade, I don't see a viable
way to get to universal utf8.
More information about the Dxspider-support