multilanguage_web_site

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Subject: multilanguage site
From: "Francesco Pasqualini" <f.pasqualini@cpsinformatica.it>
Date: Tue, 29 Aug 2000 14:58:13 +0200

can someone suggest me the best way to build a multilanguage web site
(english, french, ..).
I'm using Apache + mod_perl + Apache::asp (for applications)

Can be usefull XML/XSL whit AxKit ?
Is there any example/guideline ?

===

Subject: Re: multilanguage site
From: Matt Sergeant <matt@sergeant.org>
Date: Tue, 29 Aug 2000 14:15:41 +0100 (BST)

On Tue, 29 Aug 2000, Francesco Pasqualini wrote:

> can someone suggest me the best way to build a multilanguage web site
> (english, french, ..).
> I'm using Apache + mod_perl + Apache::asp (for applications)
> 
> Can be usefull XML/XSL whit AxKit ?
> Is there any example/guideline ?

This month's Web Techniques is all about this (albeit in a framework
independant manner). I suggest you try as hard as you can to get a copy as
it covers way more than I could possibly type here.

Also look up content negotiation in the Apache docs.

===

Subject: RE: multilanguage site
From: Jerrad Pierce <Jerrad.Pierce@networkengines.com>
Date: Tue, 29 Aug 2000 09:24:36 -0400

Try this:
http://webtechniques.com/archives/2000/09/yunker/
and perhaps this:
http://webtechniques.com/archives/2000/09/lagon/

===

Subject: Re: multilanguage site
From: David Hodgkinson <daveh@hodgkinson.org>
Date: 29 Aug 2000 14:29:34 +0100

Francesco Pasqualini" <f.pasqualini@cpsinformatica.it> writes:

> can someone suggest me the best way to build a multilanguage web site
> (english, french, ..).
> I'm using Apache + mod_perl + Apache::asp (for applications)
> 
> Can be usefull XML/XSL whit AxKit ?
> Is there any example/guideline ?

I'm interested in this too :-) The Deep Purple site just went vaguely
multilingual, but I'm doing this with straight Apache MultiViews
(which _are_ honoured by SSI, which is nice) and I can see this
becoming a huge headache.

I'd like to do it with the Template Toolkit if at all possible.


===

Subject: Re: multilanguage site
From: Joshua Chamas <joshua@chamas.com>
Date: Tue, 29 Aug 2000 13:10:46 -0700

Francesco Pasqualini wrote:
> 
> can someone suggest me the best way to build a multilanguage web site
> (english, french, ..).
> I'm using Apache + mod_perl + Apache::asp (for applications)
> 
> Can be usefull XML/XSL whit AxKit ?
> Is there any example/guideline ?
> 

The approach used by Paul at RedHat seems to have been
to wrap internationalized messages with <tag>message</tag>
where <tag> is an XMLSub, which would do a lookup at runtime
into a message catalog for the right message, based on what
language the client was set to.  I'm sure its much more
complicated than that, but that was the gist of it.

===

Subject: Re: multilanguage site
From: Paul Lindner <plindner@redhat.com>
Date: Tue, 29 Aug 2000 13:18:26 -0700

On Tue, Aug 29, 2000 at 01:10:46PM -0700, Joshua Chamas wrote:
> Francesco Pasqualini wrote:
> > 
> > can someone suggest me the best way to build a multilanguage web site
> > (english, french, ..).
> > I'm using Apache + mod_perl + Apache::asp (for applications)
> > 
> > Can be usefull XML/XSL whit AxKit ?
> > Is there any example/guideline ?
> > 
> 
> The approach used by Paul at RedHat seems to have been
> to wrap internationalized messages with <tag>message</tag>
> where <tag> is an XMLSub, which would do a lookup at runtime
> into a message catalog for the right message, based on what
> language the client was set to.  I'm sure its much more
> complicated than that, but that was the gist of it.

Yeah, it's more complicated than that.  :-)

Basically there are four tools that we use, based on a hacked version
of Locale::PGetText, and the standard .po file format provided by GNU
gettext.  The tools are:


XText      - extracts <msg>xxx</msg> text, Apps::gettext() strings into
             messages.po

... then we cp messages.po to messages.<LANGCODE>.po and convert

MsgProcess - processes messages.<LANGCODE>.po into messages.db

msgmerge   - standard GNU gettext stuff.


At runtime the code dynamically looks up the message text in the local
messages.db file.

Let me know if anyone is interested in this stuff.  It's a bit rough
at this point but works quite well for us.


===

Subject: Re: multilanguage site
From: "Eric L. Brine" <ebrine@home.com>
Date: Fri, 01 Sep 2000 23:18:13 -0400

As far as I can tell there's no way in html to indicate to the browser 
> that a chunk of content is in some other encoding other than what was 
> specified in the headers or meta tag. There's no <span charset=...> 
> attribute or anything like that. This seems to make truly multilingual 
> pages really awkward.

> You basically must use an encoding like UTF-8 which can reach the
> entire unicode character set or else you cannot mix languages.

Not quite. To display characters not in the current character set, use
"&...;" encodings, such as "&eacute;" and "&#9999;" (where 9999 is
unicode).

===

Subject: Re: multilanguage site
From: Matt Sergeant <matt@sergeant.org>
Date: Sat, 2 Sep 2000 08:50:34 +0100 (BST)

On 1 Sep 2000, Greg Stark wrote:

> 
> > >> can someone suggest me the best way to build a multilanguage web site
> > >> (english, french, ..).
> > >> I'm using Apache + mod_perl + Apache::asp (for applications)
> 
> I'm really interested in what other people are doing here. We've just released
> our first cut at i18n and it's going fairly well. But so far we haven't dealt
> with the big bugaboo, character encoding. 
> 
> One major problem I anticipate is what to do when individual include files are
> not available in the local language. For iso-8859-1 encoded languages that's
> not a major hurdle as we can simply use the english text until it's
> translated. But for other encodings does it make sense to include english
> text? 
> 
> If we use UTF-8 all the ascii characters would display properly, but do most
> browsers support UTF-8 now? Or do people still use BIG5, EUS, etc? 

My experience has been really good. With 4.x+ browsers UTF8 displays just
fine, with the obvious caveat that you have to be using the right
fonts. Generally the people you are displaying to have the right fonts
(otherwise they wouldn't be able to use their computers!).

My only problems were two things: 1. Title bars in Linux just displayed
junk. This was probably both an encoding/window manager issue and a font
issue. 2. People don't want their content in UTF8 - they want it in the
character set they are used to, like ISO-8859-2. So I added support in
AxKit for alternate output encodings.

Of course being XML, AxKit handles different character sets in included
files just fine - everything is UTF8 to axkit.

> As far as I can tell there's no way in html to indicate to the browser that a
> chunk of content is in some other encoding other than what was specified in
> the headers or meta tag. There's no <span charset=...> attribute or anything
> like that.

Yes, there is.

> This seems to make truly multilingual pages really awkward. You
> basically must use an encoding like UTF-8 which can reach the entire unicode
> character set or else you cannot mix languages.

Or use AxKit ;-)

===

Subject: Re: multilanguage site
From: "Eric L. Brine" <ebrine@home.com>
Date: Sat, 02 Sep 2000 13:08:05 -0400

As far as I can tell there's no way in html to indicate to the
> > browser that a chunk of content is in some other encoding other
> > than what was specified in the headers or meta tag. There's no
> > <span charset=...> attribute or anything like that.
> 
> Yes, there is.

None exists in the standard, as seen below, and I don't see anything in
CSS either.

<!ELEMENT SPAN - - (%inline;)*    -- generic language/style container
 -->
<!ATTLIST SPAN
  %attrs;                         -- %coreattrs, %i18n, %events --
  %reserved;                      -- reserved for possible future use --
  >

<!ENTITY % attrs "%coreattrs; %i18n; %events;">

<!ENTITY % coreattrs
 "id       ID             #IMPLIED  -- document-wide unique id --
  class    CDATA          #IMPLIED  -- space-separated list of classes
 --
  style    %StyleSheet;   #IMPLIED  -- associated style info --
  title    %Text;         #IMPLIED  -- advisory title --"
  >

<!ENTITY % i18n
 "lang     %LanguageCode; #IMPLIED  -- language code --
  dir      (ltr|rtl)      #IMPLIED  -- direction for weak/neutral text
 --"
  >

<!ENTITY % events
 "onclick     %Script;  #IMPLIED  -- a pointer button was clicked --
  ondblclick  %Script;  #IMPLIED  -- a pointer button was double
clicked--
  onmousedown %Script;  #IMPLIED  -- a pointer button was pressed down
 --
  onmouseup   %Script;  #IMPLIED  -- a pointer button was released --
  onmouseover %Script;  #IMPLIED  -- a pointer was moved onto --
  onmousemove %Script;  #IMPLIED  -- a pointer was moved within --
  onmouseout  %Script;  #IMPLIED  -- a pointer was moved away --
  onkeypress  %Script;  #IMPLIED  -- a key was pressed and released --
  onkeydown   %Script;  #IMPLIED  -- a key was pressed down --
  onkeyup     %Script;  #IMPLIED  -- a key was released --"
  >

===

Subject: Re: multilanguage site
From: Matt Sergeant <matt@sergeant.org>
Date: Sun, 3 Sep 2000 07:41:46 +0100 (BST)

On Sat, 2 Sep 2000, Eric L. Brine wrote:

> 
> > > As far as I can tell there's no way in html to indicate to the
> > > browser that a chunk of content is in some other encoding other
> > > than what was specified in the headers or meta tag. There's no
> > > <span charset=...> attribute or anything like that.
> > 
> > Yes, there is.
> 
> None exists in the standard, as seen below, and I don't see anything in
> CSS either.

My bad. I was mistaken by HTML form's accept-charset attribute.

===

Subject: Re: multilanguage site
From: =?UTF-8?Q?Ri=C4=8Dardas_=C4=8Cepas?= <rch@richard.eu.org>
Date: Sun, 3 Sep 2000 06:27:38 +0200

On Fri Sep  1 23:18:13 2000 -0400 Eric L. Brine wrote:

> 
> > You basically must use an encoding like UTF-8 which can reach the
> > entire unicode character set or else you cannot mix languages.
> 
> Not quite. To display characters not in the current character set, use
> "&...;" encodings, such as "&eacute;" and "&#9999;" (where 9999 is
> unicode).
> 
        This would require unicode capable browser anyway.  Even more,
Netscape v4 doesn't show these escapes unless you set encoding to utf-8.

===

Subject: Re: [OT] multilanguage site
From: "G.W. Haywood" <ged@www.jubileegroup.co.uk>
Date: Sun, 3 Sep 2000 08:49:12 +0100 (BST)

Hi all,

On Sun, 3 Sep 2000, [UTF-8] Ri

the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu