Chapter 10: Security Issues

Google

Filtering String Data

Here's a place where PHP borrows some of Perl's magic. In Perl it was easy to use "regular expressions" to filter unwanted data and check for complex matches. PHP provides several functions that handle regular expressions that Perl recognizes. Here's the one that I find to be the most useful:

preg_replace(pattern, replacement, text)

Example:

<?php
# assume the user typed in some "garbage" after their name on a form...so that...
$name = "James S. Humphrey ;This stuff: ?~!@#$%^&*(_)+| should not be here";

# call the preg_replace function and assign its output back to $name
$name = preg_replace("|;.*|","",$name);

# show off your results
print("$name");
?>

Explanation: Look at the definition of the preg_replace function. The argument list (everything between the parentheses after the function's name) has 3 items. The first 2 arguments have double-quotes, so that PHP will know they are supposed to be strings. The 3rd argument is a variable. PHP already knows what type of data it holds (a string).

The first argument is pattern. This is the regular expression for what you want to find and replace. Expressions are surrounded by "delimiters." Here I've used the "pipe" character ( | ) as the delimiter. You can use any printable character except a letter, a number or a backslash ( \ ) as a delimiter. Be sure to use the same delimiter at the start AND end of the regular expression.

|;.*| is the regular expression. It represents a semicolon followed by any number of characters and they can be any characters at all. Anything after (and including) a semicolon should be removed because your web server could interpret it as a command and execute that command (very bad mojo!).

The second argument is replacement. Here it's an empty string. What we're doing is removing some characters by replacing them with nothing.

The third argument is text. This is the string of text you are going to clean up. In this example it's $name which is the name that a visitor typed into your form.

After preg_replace() does its thing, its output (the filtered string) is assigned to $name. This replaces the original (unsafe) string with the filtered (safe) string produced by preg_replace.

Note: Other strings such as the value of Address, Zip/Postal Code or City can be cleaned up in a similar way. But for State and Country (or any other field where there's a limited number of options), use a drop-down list instead of a text input. It's less work in the long run and should be perfectly safe.

Tip: More info about Perl regular expressions can be found at this web page:
http://piglet.uccs.edu/~cs301/perl/re.htm and in this book:
Mastering Regular Expressions by Jeff Friedl, O'reilly & Associates


Filtering numeric data

We don't want any scripts, commands or other "junk" coming in through a numeric item. We can use the first filter and add another to remove anything that isn't a letter, space or dash from this Zip Code. That will work with many non-US postal codes, too. Here's the code I use:

<?php
$zip = preg_replace("|;.*|","",$zip);
$zip = preg_replace("|[^0-9A-Za-z\-]+|","",$zip);
?>

To filter a phone or fax number, you could use this regular expression. It matches anything that's not a digit, a dash, a space or parenthesis:

<?php
$phone = preg_replace("|;.*|","",$phone);
$phone = preg_replace("|[^0-9\-\s\(\)]+|","",$phone);
?>

Filtering street addresses

Here we have to accept numbers, letters, spaces and possibly a period. We still reject the commas, semicolons, angle brackets (because they could contain a JavaScript) and most other symbols. Here's a filter for them:

<?php
$address = preg_replace("|;.*|","",$address);
$address = preg_replace("|[^0-9A-Za-z\.\s]+|","",$address);
?>


Filtering email addresses

Email addresses can contain a lot of characters you don't usually see in other strings but they do follow a well-defined pattern. Using what is known about them in general, it's possible to make a filter that will reject anything that simply can't be part of an email address. Here's one you can use:

<?php
$email = preg_replace("|;.*|","",$email);
$email = preg_replace("|,.*|","",$email);
$email = preg_replace("|[^\-\.\@]+|","",$email);
?>

This filter will remove a comma and all the characters that follow it. That will prevent people from entering more than one email address. They can type them in if there's room but the filter will just throw the extra ones away.

There are millions and millions of possible combinations of letters and symbols that would pass this filter but are not legitimate addresses. Sometimes the domain doesn't exist. Other times the username is bogus. The account may have existed at one time but is now closed or inactive. Still others may be addresses you want to block for some reason. In Appendix C, I'll show you a filter I use to get rid of email addresses I don't want to process and in Chapter 12, I'll explain how it was developed.

Previous Page   Table of Contents   Next Page

Copyright © 2004 Steve Humphrey