|
If you look at the pattern of dedicated bits in UTF-8 you might
wonder if you could keep going... you can, and that was clearly
the original idea.
It's very interesting that with perl you can still use the 21 to
31 bit range of characters internally (and you can save it to files,
provided you use perl's "lax" form of UTF-8 encoding). It's hard
to think of a good reason to do this, though
(Tom
Christiansen has an interesting suggestion he included in the
4th camel, an idiom where you mark which portion of a string
you've handled by shifting it's codepoints way up, and then
back down when you're done...)
|