dbi-csv_to_sql_conversion

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



To: dbi-users@perl.org
From: Scott Taylor <scott@dctchambers.com>
Subject: CSV-SQL convertion
Date: Wed, 17 Oct 2001 14:13:43 -0700

Hello,

Does anyone have a routine to convert from a csv file to an sql database, 
that works.  I'm having a hard time with it.

here is what I have so far: (almost works)

#!/usr/bin/perl -w
# <csv2sql.pl>

use DBI;
require Text::CSV_XS;

my $csv = Text::CSV_XS->new;
# bunch of my$ stuff and dbi:InterBase:db connection stuff
# dialect=3
...

open(InFile, $infile) || die "Can not open text file: $!\n";

while ($line = <InFile>){
     if ($csv->parse($line)) {
         $newrec = '';
         my @field = $csv->fields;
         $SQL = qq[INSERT INTO rdx_data VALUES (
                   '$field[0]',  '$field[1]',  '$field[2]',  '$field[3]',
                   '$field[4]',  '$field[5]',  '$field[6]',  '$field[7]',
                   '$field[8]',  '$field[9]',  '$field[10]', '$field[11]',
                   '$field[12]', '$field[13]', '$field[14]', '$field[15]',
                   '$field[16]', '$field[17]', '$field[18]', '$field[19]',
                   '$field[20]')];
         $cursor = $dbh->prepare($SQL) or die print $SQL, "\n";
         $cursor->execute;
         }
}
$cursor->finish;
$dbh->disconnect;
close(InFile);

#EOF

most of the data gets in the Interbase DB but some is missing, and I get a 
bunch of these error messages:
DBD::InterBase::st execute failed: Arithmetic overflow or division by zero 
has occurred.
-arithmetic exception, numeric overflow, or string truncation

Is my punctuation bad or what?  I thought it would put text into text 
fields.  Why would it be doing any math or string manipulation at this point?

Flamage for being a lousy, newbie, Perl wannabe hacker, and posting in this 
forum, is acceptable.  But any help is really appreciated.

Oh, also having a hard time with the Text::CSV_XS module not quite 
following all the rules, is there a better way, or another module to use 
instead (it's most up-to-date from CPAN v0.23).  What I can't get it to do 
is recognize a multi-line field, ie:
field1_text, field2_text, "field3_line1^M
field3_line2^M
field3_line3",field4_text,...
I even tried stripping out the doze <CR> (^M).  However, I'm not so 
concerned about that part, just in case someone has some insight on that.

===

To: "Scott Taylor" <scott@dctchambers.com>,
<dbi-users@perl.org>
From: "Dan Horne" <dhorne@xtra.co.nz>
Subject: RE: CSV-SQL convertion
Date: Thu, 18 Oct 2001 10:18:47 +1300

Try DBD::CSV. You can "select" from the text file using SQL

Dan

===
To: dbi-users@perl.org
From: Sam Roberts <sroberts@certicom.com>
Subject: Re: CSV-SQL convertion
Date: Wed, 17 Oct 2001 18:17:44 -0400

Quoting Scott Taylor <scott@dctchambers.com>, who wrote:
> Hello,
> 
> Does anyone have a routine to convert from a csv file to an sql database, 
> that works.  I'm having a hard time with it.

Below is what I've done, don't know how it would work with other
database types, but works well for me so far.

It's execerpted code, so might not run standalone.

Good luck,
Sam

  -- 
Sam Roberts <sroberts@certicom.com>

$debug = 0;
$db = "DB";
$cmd = "import";

$dbh = DBI->connect("DBI:CSV:f_dir=$dbn", "user", "passwd",
		{
			PrintError => 0,	# Don't print errors,
			RaiseError => 1		# die instead.
		}
	);

[...]

# Import a CSV file.

$cmd =~ /^import$/ && do {

	local $table = shift;
	local $file = shift;

	if( defined($file) && $file ne "-" ) {
		close STDIN;
		open( STDIN, $file ) or die "open( $file ) failed: $!\n";
	}

	local $sth;

	while( <STDIN> ) {

		chomp;

		# Strip whitespace around the "," seperators.

		$_ =~ s/\s*,\s*/,/g;

		local @_ = split(/,/, $_);
		local $sth;
		local $sql;

		# Create $sth based on the number of values in the first line.
		if( ! defined($sth) ) {
			# We need VALUES to look like VALUES( ?,?,?...) with one ? for every argument we
			# will be inserting. Thus, @_ "?"s, split into an array, and joined with a column.

			$sql = "INSERT INTO $table VALUES ( " . join(" , ", split(//, "?" x @_) ) . " )";

			$sth = $dbh->prepare($sql);
		}

		local $" = ",";

		$debug && print "$sql @_\n";

		$sth->execute( @_ );
	}

	$dbh->disconnect();
};


===

To: Sam Roberts <sroberts@certicom.com>
Subject: Re: CSV-SQL convertion 
In-Reply-To: Your message of "Wed, 17 Oct 2001 18:17:44 EDT." <20011017181744.B11185@certicom.com> 
--------

Sam Roberts <sroberts@certicom.com> wrote: 

> Quoting Scott Taylor <scott@dctchambers.com>, who wrote:

> > Does anyone have a routine to convert from a csv file to an sql database, 
> > that works.  I'm having a hard time with it.
> 
> Below is what I've done, don't know how it would work with other
> database types, but works well for me so far.


>   # Strip whitespace around the "," seperators.
>
>   $_ =~ s/\s*,\s*/,/g;
>
>   local @_ = split(/,/, $_);

You appear to be assuming that there are no commas inside
the data itself.  

CSV is in my experience a very annoying "standard".  It
looks so simple that everyone just hacks there own
conversions, and consequently there are minor variations
floating around... e.g. I had to deal with files that had
whitespace after the comma separators, and Text::CSV_XS does
not seem to allow that.  

DBD::CSV just uses Text::CSV_XS, by the way, so you don't
gain anything by switching to the DBD version, except
possibly a syntax consistent with what you're already
familiar with as a database programmer.

Going back to Scott's original questions: 

>    What I can't get it to do 
>    is recognize a multi-line field, ie:
>    field1_text, field2_text, "field3_line1^M
>    field3_line2^M
>    field3_line3",field4_text,...
>    I even tried stripping out the doze <CR> (^M).  However, I'm not so 
>    concerned about that part, just in case someone has some insight on that.


The following is from 
   http://www.perldoc.com/cpan/Text/CSV_XS.html

      binary

                If this attribute is TRUE, you may use
                binary characters in quoted fields,
                including line feeds, carriage returns and
                NUL bytes. (The latter must be escaped as
                "0.) By default this feature is off.

    [...]

    To sum it up,

    $csv = Text::CSV_XS->new();
   
             is equivalent to
   
    $csv = Text::CSV_XS->new({
        'quote_char'  => '"',
        'escape_char' => '"',
        'sep_char'    => ',',
        'binary'      => 0
    });


So just doing *this* is Scott's mistake: 

  my $csv = Text::CSV_XS->new;

You should probably be doing something like this:

   my $csv = Text::CSV_XS->new({ 'binary'      => 1 });

Note that "binary" is also necessary if you've got text with
iso8859 extended characters in it.

===

To: dbi-users@perl.org
From: Sam Roberts <sroberts@certicom.com>
Subject: Re: CSV-SQL convertion
Date: Thu, 18 Oct 2001 10:22:12 -0400

Quoting Joe Brenner <doom@kzsu.stanford.edu>, who wrote:
> Sam Roberts <sroberts@certicom.com> wrote: 
> >   # Strip whitespace around the "," seperators.
> >
> >   $_ =~ s/\s*,\s*/,/g;
> >
> >   local @_ = split(/,/, $_);
> 
> You appear to be assuming that there are no commas inside
> the data itself.  

Sort of, except it's not an assumption, it is a requirement of
our test suites.

> DBD::CSV just uses Text::CSV_XS, by the way, so you don't
> gain anything by switching to the DBD version, except
> possibly a syntax consistent with what you're already
> familiar with as a database programmer.

But no! I gain EVERYTHING by using an SQL database. Putting data into
any particular format is easy, flexibly querying it is harder. This is
an excerpt from a generic tool to generate graphs and html and latex
reports of the benchmark and interoperability test results for our
toolkits, and learning SQL has allowed me to not have to invent my own
data querying language.

Cheers,
Sam

===

To: "dbi-users" <dbi-users@perl.org>
From: "Jeff Zucker" <jeff@vpservices.com>
Subject: Re: CSV-SQL convertion
Date: 18 Oct 2001 07:25:51 -0700

Joe Brenner wrote:
> 
> CSV is in my experience a very annoying "standard".  It
> looks so simple that everyone just hacks there own
> conversions, and consequently there are minor variations
> floating around... e.g. I had to deal with files that had
> whitespace after the comma separators, and Text::CSV_XS does
> not seem to allow that.

DBD::AnyData does, just set trim=>1 and do everything else as you would
with DBD::CSV.

> DBD::CSV just uses Text::CSV_XS, by the way, so you don't
> gain anything by switching to the DBD version, except
> possibly a syntax consistent with what you're already
> familiar with as a database programmer.

And you can use SQL, and it supports flock on systems that have it, and
you don't have to write your own file-opening error code, and you can
address things by field names rather numbers... 

===


the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu