Lightning Talks- UNICODE.html

                                [PREV - ORDERED_MONADS]    [TOP]


What about Unicode?

Perl represents Unicode strings internally
(approximately) as utf8, and perl strings
are really strings of bytes, not characters.

Perl does have a good trick for this: the
regular expression feature \X, matches an
actual unicode character, even if it's
multibyte.

So you can easily load up an stp with ctps
of multibyte unicode characters...

Further: a lot of the built-in perl functions
are broken for unicode: length?  substr?           Or at least, that was
They think in terms of bytes.                      my impression.
                                                   The new "perlunicook"
If you really need to work with text-- and         seems to say otherwise.
increasingly, that means unicode-- then you've
got to get used to using a lot of library code
instead of the standard built-ins.

Which starts making something like a
Text::Properties module look more plausible.

(Damn.)
--------
[NEXT - MORE_MADNESS]