Mark Ellis
2008-12-23 17:14:26 UTC
Hi Guys
Putting this out to a wider audience because I haven't dealt with this
much before. David's sharing the pain in the bug, but I thought why
should you lot miss out :)
The two bugs referenced in the title relate to segfaults caused by
non-ascii characters being passed to wstr_to_ascii, resulting in an
unchecked NULL return.
The immediate response, wstr_*_ascii is evil :)
However, the solution isn't as obvious, at least to me. I see two
options.
1) change all wstr_* to utf8, and assume the system can deal with utf8.
Advantages, it's quite easy. Disadvantages, it's a big assumption and
will blow up.
2) use utf8 internally, so we never lose anything in conversions from
either the host or the device. We then have to convert to the locale
when going out to the host. Much more work, but more reliable.
Ok, so 2) sounds good. When getting input from the host we convert from
locale to utf8 and lose nothing. However, what I can't see is what to do
if, for instance, we get a wstr from the device, then convert that to
the locale that can't deal with the characters in the wstr ? Substitute
something like "__untranslatable__" ?
Mark
Putting this out to a wider audience because I haven't dealt with this
much before. David's sharing the pain in the bug, but I thought why
should you lot miss out :)
The two bugs referenced in the title relate to segfaults caused by
non-ascii characters being passed to wstr_to_ascii, resulting in an
unchecked NULL return.
The immediate response, wstr_*_ascii is evil :)
However, the solution isn't as obvious, at least to me. I see two
options.
1) change all wstr_* to utf8, and assume the system can deal with utf8.
Advantages, it's quite easy. Disadvantages, it's a big assumption and
will blow up.
2) use utf8 internally, so we never lose anything in conversions from
either the host or the device. We then have to convert to the locale
when going out to the host. Much more work, but more reliable.
Ok, so 2) sounds good. When getting input from the host we convert from
locale to utf8 and lose nothing. However, what I can't see is what to do
if, for instance, we get a wstr from the device, then convert that to
the locale that can't deal with the characters in the wstr ? Substitute
something like "__untranslatable__" ?
Mark