This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
From: JS Bangs <jaspax@u.washington.edu> Subject: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules Date: Thu, 24 Jul 2003 15:00:22 -0700 Organization: University of Washington In the continuing development of Lingua::Phonology, I'm starting to consider what the benefits would be of moving my file-parsing formats to XML from the current custom format. Currently, two of the sub-modules do some form of file-parsing, and the formats they use are described at: http://search.cpan.org/author/JASPAX/Lingua-Phonology-0.25/Phonology/Features.pm#loadfile http://search.cpan.org/author/JASPAX/Lingua-Phonology-0.25/Phonology/Symbols.pm#loadfile The existing formats are concise and human-readable, but completely custom. As I'm thinking of adding file-parsing to Lingua::Phonology::Rules (and perhaps other modules), I was looking for something more reusable, general, and powerful (especially since the Rules submodule will require some fairly complex parsing rules). If I use XML, I can pass parsing duties off to XML::Whatever, but I'm concerned that the costs (in terms of verbosity) will outweight the benefits of portability and extensibility. For example, I can currently write the following line in a file to be parsed by Lingua::Phonology::Symbols: d +anterior -distributed voice In XML, this might have to be as verbose as: <symbol label="d`"> <feature name="anterior" value="+" \> <feature name="distributed" value="-" \> <feature name="voice" \> </symbol> Which is significantly heavier and less clear. I'm rather torn on this, so I was wondering what insight the minds here have to offer. Many thanks-- === From: Rich <scriptyrich@yahoo.co.uk> Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules Followup-To: comp.lang.perl.misc Date: Thu, 24 Jul 2003 23:41:01 +0000 Reply-To: scriptyrich@yahoo.co.uk JS Bangs wrote: snip > In XML, this might have to be as verbose as: > > <symbol label="d`"> > <feature name="anterior" value="+" \> > <feature name="distributed" value="-" \> > <feature name="voice" \> > </symbol> > > Which is significantly heavier and less clear. I'm rather torn on this, so > I was wondering what insight the minds here have to offer. Many thanks-- I'd consider YAML whenever you need XML like structures that poor old humans might have to read/edit. The slight downer is that YAML seems to be developing at a pace similar to p6, though in both cases it'll be worth the wait. === From: usenet@megazone.org (MegaZone) Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules Date: 24 Jul 2003 23:51:03 GMT Organization: WPI Discordian Society, Undocumented Cabal of the Accursed Saint Shiranto Joe JS Bangs <jaspax@u.washington.edu> shaped the electrons to say: >d +anterior -distributed voice > >In XML, this might have to be as verbose as: > ><symbol label="d`"> > <feature name="anterior" value="+" \> > <feature name="distributed" value="-" \> > <feature name="voice" \> ></symbol> <symbol label="d" anterior="+" distributed="-" voice="+" /> Something like that is just as valid in XML, and XML::LibXML works well, I've been using it for a few months now and I'm really starting to like it now that it has sunk into my brain so I don't have to keep looking things up. :-) Since XML requires attributes to have values I just used "+" for voice, but you could do things like voice="voice", etc. It really depends on what you're looking to use the data for - I just created a file like: <?xml version="1.0" encoding="UTF-8"?> <CurrencyTable xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="pcCurrencyTable.xsd"> <Version>1.0</Version> <CurrencyShift number="840" name="USD">2</CurrencyShift> </CurrencyTable> (there were a lot more CurrencyShift elements...) Using XPath you can find the element based on one attribute and get the value of another - as in this: --- use strict; use warnings; use XML::LibXML 1.0053; my $xmlFile; my $parser = XML::LibXML->new(); open (XMLCONF, "<./pcCurrencyTable.xml") || die "Can't open table: $!"; while (<XMLCONF>) { $xmlFile .= $_; } close (XMLCONF); my $dom = $parser->parse_string($xmlFile); $xpath = "//CurrencyTable/CurrencyShift[\@number='840']/\@name"; print( ($dom->findnodes($xpath))[0]->textContent() . "\n"); ---- That prints "USD". (And I make no claim that is the the most elegant way to do that, just what came to me first.) === From: Bren <iambrenNOSPAM@sympatico.ca> Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules,comp.text.xml Date: Fri, 25 Jul 2003 18:41:55 -0400 Organization: Newsfeeds.com http://www.newsfeeds.com 100,000+ UNCENSORED Newsgroups. On Fri, 25 Jul 2003 14:16:04 -0700, JS Bangs <jaspax@u.washington.edu> wrote: >> print( ($dom->findnodes($xpath))[0]->textContent() . "\n"); >> >> ---- >> >> That prints "USD". Actually, that would print "USD " ;-) === Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules,comp.text.xml Date: 25 Jul 2003 23:48:48 GMT Organization: WPI Discordian Society, Undocumented Cabal of the Accursed Saint Shiranto Joe Bren <iambrenNOSPAM@sympatico.ca> shaped the electrons to say: >Actually, that would print "USD >" Yes. Point. :-) I actually changed my mind when doing my production code which I'd written the test file as prep for and used the ->getAttribute() method since I could first do -hasAttribute() in an if clause and else set it to some default value, etc. More than one way. ;-) === From: JS Bangs <jaspax@u.washington.edu> Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules,comp.text.xml Date: Fri, 25 Jul 2003 14:16:04 -0700 Organization: University of Washington I've added comp.text.xml to the cross-posting for this, since it's probably more concerned with XML than anything else at this point. So far, we've been discussing whether it's worth the trouble to move a custom file format for the perl module Lingua::Phonology over to XML. I pointed out an original example line like: > >d +anterior -distributed voice Which would have to become: > >In XML, this might have to be as verbose as: > > > ><symbol label="d`"> > > <feature name="anterior" value="+" \> > > <feature name="distributed" value="-" \> > > <feature name="voice" \> > ></symbol> To which MegaZone suggested the shorter version: > <symbol label="d" anterior="+" distributed="-" voice="+" /> > > Something like that is just as valid in XML, My response: The example you gave is *well-formed* XML, which is different from *valid* XML. The problem is that your example could never be valid XML, because the attributes needed to define a given <symbol> cannot be known ahead of time in the module. Rather, the list of feature names is given in a separate <featureset></featureset> section. True, one could make the featureset declaration into a DTD, but that would require the users of my module to write their own DTD's, which is too much work for them. I'd rather leave the validation of features against the featureset to the application--which I'm also writing, so it's not much of a problem. I could go your way, but it would require all XML files parsed by my module to run in standalone mode, and would prevent writing any DTD that could validate all such files. > Using XPath you can find the element based on one attribute and get > the value of another - as in this: > --- > use strict; > use warnings; > use XML::LibXML 1.0053; > > my $xmlFile; > my $parser = XML::LibXML->new(); > > open (XMLCONF, "<./pcCurrencyTable.xml") || > die "Can't open table: $!"; > while (<XMLCONF>) { > $xmlFile .= $_; > } > close (XMLCONF); > > my $dom = $parser->parse_string($xmlFile); > $xpath = "//CurrencyTable/CurrencyShift[\@number='840']/\@name"; > print( ($dom->findnodes($xpath))[0]->textContent() . "\n"); > > ---- > > That prints "USD". Something like this could provide an elegant way for the Lingua::Phonology module to do checking that a given file doesn't contain errors (i.e. that all attributes or feature names given for a <symbol> match some feature declared in the <featureset> section. Once I've decided on my format, I'll have to consider exactly how to do this. === From: "Julian Scarfe" <julian@avbrief.com> Subject: Re: XML or home-grown format? Newsgroups: comp.lang.perl.misc,comp.lang.perl.modules Date: Sat, 26 Jul 2003 16:50:58 +0100 Organization: ntl Cablemodem News Service "JS Bangs" <jaspax@u.washington.edu> wrote in message news:Pine.A41.4.56.0307241439550.111292@dante03.u.washington.edu... > For example, I can currently write the following line in a file to be > parsed by Lingua::Phonology::Symbols: > > d +anterior -distributed voice > > In XML, this might have to be as verbose as: > > <symbol label="d`"> > <feature name="anterior" value="+" \> > <feature name="distributed" value="-" \> > <feature name="voice" \> > </symbol> > > Which is significantly heavier and less clear. I'm rather torn on this, so > I was wondering what insight the minds here have to offer. Many thanks-- My guess is that you find this less clear because you're used to reading the current format. However: <symbol label="d"> <feature name="anterior" value="true" \> <feature name="distributed" value="false" \> <feature name="voice" \> </symbol> means a great deal more to me than trying to work out what your +s and -s mean. The structure is immediately clear and it's not hard to edit using an XML editor or even a simple text editor. I'd check out XML schema (rather than playing with DTDs) if you haven't already. ===