This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
To: dbi-users@perl.org From: Scott Taylor <scott@dctchambers.com> Subject: CSV-SQL convertion Date: Wed, 17 Oct 2001 14:13:43 -0700 Hello, Does anyone have a routine to convert from a csv file to an sql database, that works. I'm having a hard time with it. here is what I have so far: (almost works) #!/usr/bin/perl -w # <csv2sql.pl> use DBI; require Text::CSV_XS; my $csv = Text::CSV_XS->new; # bunch of my$ stuff and dbi:InterBase:db connection stuff # dialect=3 ... open(InFile, $infile) || die "Can not open text file: $!\n"; while ($line = <InFile>){ if ($csv->parse($line)) { $newrec = ''; my @field = $csv->fields; $SQL = qq[INSERT INTO rdx_data VALUES ( '$field[0]', '$field[1]', '$field[2]', '$field[3]', '$field[4]', '$field[5]', '$field[6]', '$field[7]', '$field[8]', '$field[9]', '$field[10]', '$field[11]', '$field[12]', '$field[13]', '$field[14]', '$field[15]', '$field[16]', '$field[17]', '$field[18]', '$field[19]', '$field[20]')]; $cursor = $dbh->prepare($SQL) or die print $SQL, "\n"; $cursor->execute; } } $cursor->finish; $dbh->disconnect; close(InFile); #EOF most of the data gets in the Interbase DB but some is missing, and I get a bunch of these error messages: DBD::InterBase::st execute failed: Arithmetic overflow or division by zero has occurred. -arithmetic exception, numeric overflow, or string truncation Is my punctuation bad or what? I thought it would put text into text fields. Why would it be doing any math or string manipulation at this point? Flamage for being a lousy, newbie, Perl wannabe hacker, and posting in this forum, is acceptable. But any help is really appreciated. Oh, also having a hard time with the Text::CSV_XS module not quite following all the rules, is there a better way, or another module to use instead (it's most up-to-date from CPAN v0.23). What I can't get it to do is recognize a multi-line field, ie: field1_text, field2_text, "field3_line1^M field3_line2^M field3_line3",field4_text,... I even tried stripping out the doze <CR> (^M). However, I'm not so concerned about that part, just in case someone has some insight on that. === To: "Scott Taylor" <scott@dctchambers.com>, <dbi-users@perl.org> From: "Dan Horne" <dhorne@xtra.co.nz> Subject: RE: CSV-SQL convertion Date: Thu, 18 Oct 2001 10:18:47 +1300 Try DBD::CSV. You can "select" from the text file using SQL Dan === To: dbi-users@perl.org From: Sam Roberts <sroberts@certicom.com> Subject: Re: CSV-SQL convertion Date: Wed, 17 Oct 2001 18:17:44 -0400 Quoting Scott Taylor <scott@dctchambers.com>, who wrote: > Hello, > > Does anyone have a routine to convert from a csv file to an sql database, > that works. I'm having a hard time with it. Below is what I've done, don't know how it would work with other database types, but works well for me so far. It's execerpted code, so might not run standalone. Good luck, Sam -- Sam Roberts <sroberts@certicom.com> $debug = 0; $db = "DB"; $cmd = "import"; $dbh = DBI->connect("DBI:CSV:f_dir=$dbn", "user", "passwd", { PrintError => 0, # Don't print errors, RaiseError => 1 # die instead. } ); [...] # Import a CSV file. $cmd =~ /^import$/ && do { local $table = shift; local $file = shift; if( defined($file) && $file ne "-" ) { close STDIN; open( STDIN, $file ) or die "open( $file ) failed: $!\n"; } local $sth; while( <STDIN> ) { chomp; # Strip whitespace around the "," seperators. $_ =~ s/\s*,\s*/,/g; local @_ = split(/,/, $_); local $sth; local $sql; # Create $sth based on the number of values in the first line. if( ! defined($sth) ) { # We need VALUES to look like VALUES( ?,?,?...) with one ? for every argument we # will be inserting. Thus, @_ "?"s, split into an array, and joined with a column. $sql = "INSERT INTO $table VALUES ( " . join(" , ", split(//, "?" x @_) ) . " )"; $sth = $dbh->prepare($sql); } local $" = ","; $debug && print "$sql @_\n"; $sth->execute( @_ ); } $dbh->disconnect(); }; === To: Sam Roberts <sroberts@certicom.com> Subject: Re: CSV-SQL convertion In-Reply-To: Your message of "Wed, 17 Oct 2001 18:17:44 EDT." <20011017181744.B11185@certicom.com> -------- Sam Roberts <sroberts@certicom.com> wrote: > Quoting Scott Taylor <scott@dctchambers.com>, who wrote: > > Does anyone have a routine to convert from a csv file to an sql database, > > that works. I'm having a hard time with it. > > Below is what I've done, don't know how it would work with other > database types, but works well for me so far. > # Strip whitespace around the "," seperators. > > $_ =~ s/\s*,\s*/,/g; > > local @_ = split(/,/, $_); You appear to be assuming that there are no commas inside the data itself. CSV is in my experience a very annoying "standard". It looks so simple that everyone just hacks there own conversions, and consequently there are minor variations floating around... e.g. I had to deal with files that had whitespace after the comma separators, and Text::CSV_XS does not seem to allow that. DBD::CSV just uses Text::CSV_XS, by the way, so you don't gain anything by switching to the DBD version, except possibly a syntax consistent with what you're already familiar with as a database programmer. Going back to Scott's original questions: > What I can't get it to do > is recognize a multi-line field, ie: > field1_text, field2_text, "field3_line1^M > field3_line2^M > field3_line3",field4_text,... > I even tried stripping out the doze <CR> (^M). However, I'm not so > concerned about that part, just in case someone has some insight on that. The following is from http://www.perldoc.com/cpan/Text/CSV_XS.html binary If this attribute is TRUE, you may use binary characters in quoted fields, including line feeds, carriage returns and NUL bytes. (The latter must be escaped as "0.) By default this feature is off. [...] To sum it up, $csv = Text::CSV_XS->new(); is equivalent to $csv = Text::CSV_XS->new({ 'quote_char' => '"', 'escape_char' => '"', 'sep_char' => ',', 'binary' => 0 }); So just doing *this* is Scott's mistake: my $csv = Text::CSV_XS->new; You should probably be doing something like this: my $csv = Text::CSV_XS->new({ 'binary' => 1 }); Note that "binary" is also necessary if you've got text with iso8859 extended characters in it. === To: dbi-users@perl.org From: Sam Roberts <sroberts@certicom.com> Subject: Re: CSV-SQL convertion Date: Thu, 18 Oct 2001 10:22:12 -0400 Quoting Joe Brenner <doom@kzsu.stanford.edu>, who wrote: > Sam Roberts <sroberts@certicom.com> wrote: > > # Strip whitespace around the "," seperators. > > > > $_ =~ s/\s*,\s*/,/g; > > > > local @_ = split(/,/, $_); > > You appear to be assuming that there are no commas inside > the data itself. Sort of, except it's not an assumption, it is a requirement of our test suites. > DBD::CSV just uses Text::CSV_XS, by the way, so you don't > gain anything by switching to the DBD version, except > possibly a syntax consistent with what you're already > familiar with as a database programmer. But no! I gain EVERYTHING by using an SQL database. Putting data into any particular format is easy, flexibly querying it is harder. This is an excerpt from a generic tool to generate graphs and html and latex reports of the benchmark and interoperability test results for our toolkits, and learning SQL has allowed me to not have to invent my own data querying language. Cheers, Sam === To: "dbi-users" <dbi-users@perl.org> From: "Jeff Zucker" <jeff@vpservices.com> Subject: Re: CSV-SQL convertion Date: 18 Oct 2001 07:25:51 -0700 Joe Brenner wrote: > > CSV is in my experience a very annoying "standard". It > looks so simple that everyone just hacks there own > conversions, and consequently there are minor variations > floating around... e.g. I had to deal with files that had > whitespace after the comma separators, and Text::CSV_XS does > not seem to allow that. DBD::AnyData does, just set trim=>1 and do everything else as you would with DBD::CSV. > DBD::CSV just uses Text::CSV_XS, by the way, so you don't > gain anything by switching to the DBD version, except > possibly a syntax consistent with what you're already > familiar with as a database programmer. And you can use SQL, and it supports flock on systems that have it, and you don't have to write your own file-opening error code, and you can address things by field names rather numbers... ===