dbi_untainting_philosophy

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Newsgroups: comp.infosystems.www.authoring.cgi

From: James Taylor <james@nospam.demon.co.uk>
Subject: Re: Untainting with DBI
Date: 22 Sep 2000 03:43:41 -0600

In article <8qeake$qs5$1@nntp.Stanford.EDU>, Joe Brenner wrote:
: 
: My main question is: What kind of patterns would be
: recommended to do the untainting?  As far as I know, this
: information isn't being passed through the shell, so is
: there any reason to worry about shell metacharacters?  Are
: there any characters that are dangerous to put in a 
: database? 

I'm not a security expert but my understanding is that many of the
databases you might use (such as MySQL) can take arbitrary binary data
if need be, so there is no need to worry about the data at the time
you store it (always assuming you use the correct DBI mechanisms for
quoting and inserting the data, of course). However, you must be
mindful of all the other uses the data might be put to later, and
ensure that it will be safe in those situations.

So for example, in the case of an email address that will later be
sent to a mailer on the command line, make sure your untainting regex
also ensures that absolutely nothing could go wrong.

As another example, lets say you have a plain text message that is to
be included in the body of an email sent to an external mailer. You
might think that there is nothing to worry about, but some mailers
will interpret tilda escapes, and I know that sendmail will stop if it
sees a dot on a line by itself. So either make sure you check for such
things in your untainting regex, or use the appropriate flags to
disable these dangerous features in your chosen mailer.

===

mod_perl list:

Subject: Re: Untainting with DBI
From: Jim Britain <jbritain@home.com>
Date: Thu, 05 Oct 2000 02:00:14 -0700

mailed and posted]
On 21 Sep 2000 17:55:18 -0600, doom@kzsu.stanford.edu (Joe Brenner)
wrote:

>It's pretty common to have a CGI script, with some submitted
>data that you want to put into a database, and if you're
>using perl, you're probably going to talk to your database
>using DBI.  Now of course, if you're using taint mode, perl
>will complain about every commit, unless you've untainted
>the data first, which you do by using a regexp to
>extract what you want to put in your database.
>
>My main question is: What kind of patterns would be
>recommended to do the untainting?  As far as I know, this
>information isn't being passed through the shell, so is
>there any reason to worry about shell metacharacters?  Are
>there any characters that are dangerous to put in a 
>database? 
>
>(I realize that the recommended untainting philosophy is to
>look for what you want rather than to try and screen out
>what you don't want, but it seems to me that there are many
>cases where you'd really rather avoid limiting the system
>unless there's a good reason to do so.)

Any data entered to any database needs to be filter limited to the
legal set of characters allowed in that field, or variable, and
ideally the minimal set necessary to express the full range of values
for that field.

It makes sense to design accordingly in the CGI, or Perl stuff too.

"Wider range of characters for unforseen circumstances" is your enemy.
A wider range of unbounded values leads to a wider variety of bugs.

All that, is your good reason to do so.  It seems like more work in
the beginning, but ultimately you become more familiar with the data,
and more trusting in what's actually contained in the variables.

Filter the data when it first enters the program -- that way you're
fixing the program when something goes wrong -- rather than trying to
fix the program when it has bad input... (spent two weeks doing that
once, because the original author didn't check the input properly).

Bogus data needs to be caught before you spend a week crunching on it.


===


the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu