This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.
Newsgroups: comp.infosystems.www.authoring.cgi From: James Taylor <james@nospam.demon.co.uk> Subject: Re: Untainting with DBI Date: 22 Sep 2000 03:43:41 -0600 In article <8qeake$qs5$1@nntp.Stanford.EDU>, Joe Brenner wrote: : : My main question is: What kind of patterns would be : recommended to do the untainting? As far as I know, this : information isn't being passed through the shell, so is : there any reason to worry about shell metacharacters? Are : there any characters that are dangerous to put in a : database? I'm not a security expert but my understanding is that many of the databases you might use (such as MySQL) can take arbitrary binary data if need be, so there is no need to worry about the data at the time you store it (always assuming you use the correct DBI mechanisms for quoting and inserting the data, of course). However, you must be mindful of all the other uses the data might be put to later, and ensure that it will be safe in those situations. So for example, in the case of an email address that will later be sent to a mailer on the command line, make sure your untainting regex also ensures that absolutely nothing could go wrong. As another example, lets say you have a plain text message that is to be included in the body of an email sent to an external mailer. You might think that there is nothing to worry about, but some mailers will interpret tilda escapes, and I know that sendmail will stop if it sees a dot on a line by itself. So either make sure you check for such things in your untainting regex, or use the appropriate flags to disable these dangerous features in your chosen mailer. === mod_perl list: Subject: Re: Untainting with DBI From: Jim Britain <jbritain@home.com> Date: Thu, 05 Oct 2000 02:00:14 -0700 mailed and posted] On 21 Sep 2000 17:55:18 -0600, doom@kzsu.stanford.edu (Joe Brenner) wrote: >It's pretty common to have a CGI script, with some submitted >data that you want to put into a database, and if you're >using perl, you're probably going to talk to your database >using DBI. Now of course, if you're using taint mode, perl >will complain about every commit, unless you've untainted >the data first, which you do by using a regexp to >extract what you want to put in your database. > >My main question is: What kind of patterns would be >recommended to do the untainting? As far as I know, this >information isn't being passed through the shell, so is >there any reason to worry about shell metacharacters? Are >there any characters that are dangerous to put in a >database? > >(I realize that the recommended untainting philosophy is to >look for what you want rather than to try and screen out >what you don't want, but it seems to me that there are many >cases where you'd really rather avoid limiting the system >unless there's a good reason to do so.) Any data entered to any database needs to be filter limited to the legal set of characters allowed in that field, or variable, and ideally the minimal set necessary to express the full range of values for that field. It makes sense to design accordingly in the CGI, or Perl stuff too. "Wider range of characters for unforseen circumstances" is your enemy. A wider range of unbounded values leads to a wider variety of bugs. All that, is your good reason to do so. It seems like more work in the beginning, but ultimately you become more familiar with the data, and more trusting in what's actually contained in the variables. Filter the data when it first enters the program -- that way you're fixing the program when something goes wrong -- rather than trying to fix the program when it has bad input... (spent two weeks doing that once, because the original author didn't check the input properly). Bogus data needs to be caught before you spend a week crunching on it. ===