[Dxspider-support] Unicode support?

Sat Oct 23 11:21:47 BST 2010

On 22/10/10 23:55, Brendan Minish wrote:
> I just did a look at talk using the cluster console application but I
> get the same output via telnet from my linux desktop when sysop
>
> sh/talk and see some output that looks like the following
>
> 21Oct2010 at 03:38:06 RZ4FA ->  UA4HBW (>WR3D) � ������ �� ����
> 21Oct2010 at 03:39:08 UA4HBW ->  RZ4FA (<S50DXS) �� ����� ��� �� � ����� ������ T IK2HKT �� ���������, �� �� ��������, � �� ����� ����� ���������
> 21Oct2010 at 03:39:30 RZ4FA ->  UA4HBW (>WR3D) �� �������� �������� -)
>
> I presume that this output is a local character set issue and the the
> talk messages are in Cyrillic, do I need to do something to add Unicode
> support or does spider not use unicode ?
>
> It appears that my server is using unicode but do I need to do anything
> to add unicode support to perl?
> It would be nice if talk was consistent in how it displays
>

Ah, that new fangled character representation wot we are all using - not?

In answer to your question, I would absolutely *love* to go over to 
utf8. All modern perls support it. There are only few teensy problems 
that need to be ironed out:

* There is (and has never been) any standard.

* Not all DXSpider nodes run utf8 as their native encoding.

There are still several iso-8859 or even codepage character sets out 
there. This is either because nodes are running on old Linuxes / Windows 
XP implementations or old versions of perl.

* What does one do with stuff coming in which is not utf8?

Because it *will* happen. Either for the reasons set out below or just 
because. My policy has always been to pass stuff coming in - out 
unaltered. I try to extract the text as best I can for de-duping, but is 
imperfect at best.

* And then there is the competition.

Only Lee's stuff is AFAIK still been developed in any meaningful way, AR 
Cluster seems to have shuddered to a halt and the author's site/node is 
unavailable a lot of the time. Oh and he refuses to talk to me (possibly 
because he thinks I ruined his business model). I would *love* to be 
able to interface to his stuff better in any case.

In theory all modern(ish) Windows software is UTF (but I am prepared to 
be recalibrated on that) so they should be easy to fix. However, there 
are nodes running on earlier versions of Windows that seem to emit 
non-utf8 characters and there seems to be no standard on what software 
does to text coming in non-local character sets.

This leads to some of the duplicates that one sometimes sees. Because 
some software alters the text when it re-emits a PCxx sentence. Dunno 
whether it is one package or all of them.

It is, frankly, a mess. Unless we authors can a) agree a standard b) 
implement it and c) get all our users to upgrade, I don't see a viable 
way to get to universal utf8.

Dirk