perl_dos_line_endings

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



From: Chris Stith <mischief@velma.motion.net>
Subject: Re: create a text file in DOS format.
Date: Sat, 10 Mar 2001 15:04:46 -0000

Renato Santos <Bing@home.com> wrote:
> How do I create a text file in UNIX perl that
> is in DOS format rather than UNIX?

DOS and Windows use the network line ending.
End all your lines with qq{\r\l} or preferably
qq{\015\012}.

You could also send the file via FTP in ASCII
mode, but if you transfer them in compressed
archive files it might be easier to write them
how Windows wants them in the first place.

> I tried the
> system("/bin/UNIX2DOS $file1 $file2"); but it
> keeps giving me a server error.

I don't have this program, but are you sure it's
all uppercase? I also don't think it'd be a
client/server program.

> I'm using the perl script to create a text
> file from a user input and the file has to
> be read by an NT machine.

You could convert it there if you had ActiveState
Perl available.

> I also tried using the binmode function after
> I open the filehandle but that did not work
> either.

That only makes a difference _on_ DOS, Windows,
or some other system where there is a distinction
between text and binary files. In Unix, a text
file is something that you can read and a line
ending is what certain libraries and the text
editor consider a certain character. On DOS, the
whole OS sees things differently between a binary
and a text file. Since you're on a Unix flavor
when writing the file with your program, binmode()
has no effect.

> The file looks fine in UNIX but when I open
> it in WindowsNT it is all just one long line.
> The carriage return for each line is ignored.

It's not ignored. DOS just doesn't call a
carriage return a newline. A newline to DOS
is a carriage retuirn and linefeed back to
back, just like an old teletype.

The standard DOS editor, EDIT.COM, might fix
this if you open the file and save it with that
on DOS/Windows/NT. It has been known to change
\r, \l, or \l\r into \r\l in certain versions.
This is a work around instead of a fix, but it's
handy to know.

> Any clue?  I'm really a newbie with PERL so
> please excuse me if this question is too trivial.

This doesn't have to do much with Perl or with perl,
and I'm not sure how it applies to PERL, since that
looks like an acronym, and Perl isn't one. ;-) This
is a portability issue between operating systems.
It can be handled in Perl, but it can also apply to
other languages.

Chris

-- 
Christopher E. Stith
Product shown enlarged to make you think you're getting more.


Path: nntp.stanford.edu!newsfeed.stanford.edu!news.tele.dk!193.174.75.178!news-fra1.dfn.de!news-ge.switch.ch!cern.ch!lxplus003.cern.ch!flavell
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Newsgroups: comp.lang.perl.misc
Subject: Re: create a text file in DOS format.
Date: Sat, 10 Mar 2001 16:07:32 +0100
Organization: Knights of the Round Tuit
Lines: 35
Message-ID: <Pine.LNX.4.30.0103101559070.5062-100000@lxplus003.cern.ch>
References: <3AA9A73D.296821E6@home.com> <slrn9ak90p.jdk.tadmc@tadmc26.august.net>
NNTP-Posting-Host: lxplus003.cern.ch
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: sunnews.cern.ch 984236853 3983 (None) 137.138.161.124
X-Complaints-To: news@sunnews.cern.ch
X-Sender:  <flavell@lxplus003.cern.ch>
In-Reply-To: <slrn9ak90p.jdk.tadmc@tadmc26.august.net>
X-Disclaimer: speaking for myself only - and not for CERN
Comments: I hate unsolicited commercial email - boycott companies that use it
Xref: nntp.stanford.edu comp.lang.perl.misc:375954

On Sat, 10 Mar 2001, Tad McClellan wrote:

> Just write Unix files on Unix and DOS files on DOS.
>
> Use "text mode" when transferring the files with FTP, and the
> ftp program will convert the line endings for you!

Don't forget that "DOS" format implies a DOS 8-bit coding (CP850 in
most Latin-1 locales - apparently it's often still CP437 in the USA,
which I discovered to my surprise recently), whereas (again in Latin 1
locales) unix would use iso-8859-1.  So, conversion between the two
formats, in general, calls for an 8-bit transcoding in addition to the
resolution of newline representations.

I've met half a dozen implementations of tools called dos2unix and
unix2dos.  About half of them performed this 8-bit transcoding, and
half of them didn't.  About two thirds of them never even mentioned
the issue in their man pages.  So beware.

The same issue comes up with implementations of FTP.  And again, some
of them do and some of them don't - as a (sometimes optional) addition
to handling the newline representation issue.  Again, beware (and
Windows brings yet another area of concern, namely the characters in
the range 128 to 159 decimal, which are displayable characters in
Windows-1252 which don't exist in iso-8859-1).

Newlines are almost trivial in comparison with the problems caused by
miscoding of 8-bit characters   ;-}

-- 

         This .sig only acknowledges that the message was displayed on
         the recipient's machine. There is no guarantee that the
         content has been read or understood.


Path: nntp.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!opentransit.net!newsfeeds.belnet.be!news.belnet.be!afrodite.telenet-ops.be!not-for-mail
From: Bart Lateur <bart.lateur@skynet.be>
Newsgroups: comp.lang.perl.misc
Subject: Re: create a text file in DOS format.
Organization: MediaMind
Message-ID: <ehhkatkkcrne1pqgbdrm8c2mpakn9f3fst@4ax.com>
References: <3AA9A73D.296821E6@home.com>
X-Newsreader: Forte Agent 1.8/32.548
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 22
Date: Sat, 10 Mar 2001 15:21:27 GMT
NNTP-Posting-Host: 213.224.4.124
X-Complaints-To: abuse@pandora.be
X-Trace: afrodite.telenet-ops.be 984237687 213.224.4.124 (Sat, 10 Mar 2001 16:21:27 MET)
NNTP-Posting-Date: Sat, 10 Mar 2001 16:21:27 MET
Xref: nntp.stanford.edu comp.lang.perl.misc:375955

Renato Santos wrote:

>How do I create a text file in UNIX perl that is in DOS format rather
>than UNIX?

>I also tried using the binmode function after I open the filehandle but
>that did not work either.

No, that works to make a Unix compatible file on Windows.

Try doing

	s/\n/\015\012/g;

before printing the output.

Or, set $\ to "\015\012", and NEVER print a newline yourself, but rather
rely on the system adding one after every single print statement. Thus,
a print() prints one line.

-- 
	Bart.

Path: nntp.stanford.edu!newsfeed.stanford.edu!cyclone.bc.net!newsfeeds.belnet.be!news.belnet.be!afrodite.telenet-ops.be!not-for-mail
From: Bart Lateur <bart.lateur@skynet.be>
Newsgroups: comp.lang.perl.misc
Subject: Re: create a text file in DOS format.
Organization: MediaMind
Message-ID: <3fikatgk39rijtklb7ufgggvsctlrm32op@4ax.com>
References: <3AA9A73D.296821E6@home.com> <slrn9ak90p.jdk.tadmc@tadmc26.august.net> <Pine.LNX.4.30.0103101559070.5062-100000@lxplus003.cern.ch>
X-Newsreader: Forte Agent 1.8/32.548
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 13
Date: Sat, 10 Mar 2001 15:36:00 GMT
NNTP-Posting-Host: 213.224.4.124
X-Complaints-To: abuse@pandora.be
X-Trace: afrodite.telenet-ops.be 984238560 213.224.4.124 (Sat, 10 Mar 2001 16:36:00 MET)
NNTP-Posting-Date: Sat, 10 Mar 2001 16:36:00 MET
Xref: nntp.stanford.edu comp.lang.perl.misc:375956

Alan J. Flavell wrote:

>Don't forget that "DOS" format implies a DOS 8-bit coding (CP850 in
>most Latin-1 locales - apparently it's often still CP437 in the USA,
>which I discovered to my surprise recently), whereas (again in Latin 1
>locales) unix would use iso-8859-1. 

But the OP said he wants to read it on NT. NT is mostly ISO-8859-1
compatible. So actually, he doesn't really want a DOS file, but only a
text file with DOS line endings.

-- 
	Bart.

Path: nntp.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news-ge.switch.ch!cern.ch!lxplus003.cern.ch!flavell
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Newsgroups: comp.lang.perl.misc
Subject: Re: create a text file in DOS format.
Date: Sat, 10 Mar 2001 18:43:43 +0100
Organization: Knights of the Round Tuit
Lines: 34
Message-ID: <Pine.LNX.4.30.0103101836390.5062-100000@lxplus003.cern.ch>
References: <3AA9A73D.296821E6@home.com> <slrn9ak90p.jdk.tadmc@tadmc26.august.net> <Pine.LNX.4.30.0103101559070.5062-100000@lxplus003.cern.ch> <3fikatgk39rijtklb7ufgggvsctlrm32op@4ax.com>
NNTP-Posting-Host: lxplus003.cern.ch
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: sunnews.cern.ch 984246223 7874 (None) 137.138.161.124
X-Complaints-To: news@sunnews.cern.ch
X-Sender:  <flavell@lxplus003.cern.ch>
In-Reply-To: <3fikatgk39rijtklb7ufgggvsctlrm32op@4ax.com>
X-Disclaimer: speaking for myself only - and not for CERN
Comments: I hate unsolicited commercial email - boycott companies that use it
Xref: nntp.stanford.edu comp.lang.perl.misc:375968

On Sat, 10 Mar 2001, Bart Lateur wrote:

> >Don't forget that "DOS" format implies a DOS 8-bit coding (CP850 in
> >most Latin-1 locales - apparently it's often still CP437 in the USA,
> >which I discovered to my surprise recently), whereas (again in Latin 1
> >locales) unix would use iso-8859-1.
>
> But the OP said he wants to read it on NT.

Yes: I had, after all, remarked later in the posting that it's
different under Windows.

> NT is mostly ISO-8859-1 compatible.

Their Latin-1 coding is registered at IANA as Windows-1252.  It
coincides exactly[1] with the displayable characters of iso-8859-1,
but it uses the range 128-159 decimal (which the iso-8859-* codes
reserve for control functions) for additional displayable characters.

> So actually, he doesn't really want a DOS file, but only a
> text file with DOS line endings.

Yes, it seems you're right in this case.  Which adds another reason
for being careful ;-)

all the best


[1] Note that this property is a peculiarity of Latin-1 Windows.
When you compare iso-8859-7 with Greek Windows coding, or the iso
Baltic codings with Windows Baltic, etc., it turns out that they
aren't exactly the same, not even in the range 160-255 decimal.
Sorry for that digression...


Path: nntp.stanford.edu!newsfeed.stanford.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news-ge.switch.ch!cern.ch!lxplus003.cern.ch!flavell
From: "Alan J. Flavell" <flavell@mail.cern.ch>
Newsgroups: comp.lang.perl.misc
Subject: Re: create a text file in DOS format.
Date: Mon, 12 Mar 2001 19:42:59 +0100
Organization: Knights of the Round Tuit
Lines: 27
Message-ID: <Pine.LNX.4.30.0103121936350.31827-100000@lxplus003.cern.ch>
References: <3AA9A73D.296821E6@home.com> <takgke8aubrmfa@corp.supernews.com> <qbupatkgutstuusmv2vc3j52krkii8g2gs@4ax.com>
NNTP-Posting-Host: lxplus003.cern.ch
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: sunnews.cern.ch 984422580 22719 (None) 137.138.161.124
X-Complaints-To: news@sunnews.cern.ch
X-Sender:  <flavell@lxplus003.cern.ch>
In-Reply-To: <qbupatkgutstuusmv2vc3j52krkii8g2gs@4ax.com>
X-Disclaimer: speaking for myself only - and not for CERN
Comments: I hate unsolicited commercial email - boycott companies that use it
Xref: nntp.stanford.edu comp.lang.perl.misc:376248

On Mon, 12 Mar 2001, Philip Newton wrote:

> On Sat, 10 Mar 2001 15:04:46 -0000, Chris Stith
> <mischief@velma.motion.net> wrote:
>
> > End all your lines with qq{\r\l}
>
> ITYM qq{\r\n}.

  A common misconception in socket programming is that \n eq \012 everywhere.
  When using protocols such as common Internet protocols, \012 and \015 are
  called for specifically, and the values of the logical \n and \r (carriage
  return) are not reliable.

      print SOCKET "Hi there, client!\r\n";      # WRONG
      print SOCKET "Hi there, client!\015\012";  # RIGHT

(quoted from perlport).

My vote goes for perlport, _if_ DOS/Windows newlines are precisely
what you're trying to achieve (from any platform).  This corresponds
very closely to the socket scenario that perlport was discussing.

Don't forget binmode() for portability.

cheers



the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu