javod.com
Jul
2009
06
Sanitize PHP input for MySQL queries

Sanitizing data in PHP is nothing short of an art form. I love the language, don't get me wrong, and while some people rail (see ruby ::wink wink::) against the non-conformity of naming conventions, and auto casting of variables as evidence of poor design, I think there's no better language for beginners to start producing viable code quickly.

I wanted to present a simple easy to understand sanitizing function for PHP strings that will be inserted into a MySQL database. This particular function serves a very specific purpose: it escapes the user input, strips out html tags except for those we specify, and then attempts to find any malicious code within our allowed html tags.

If any code is found that we don't want, we can either exit out, or insert some generic message into our database. I explain my reasons why I don't just simply strip out malicious code in the comments of the code block. Many of the sanitize functions you'll see online recommend using htmlentities() to remove the risk of malicious code running, and that's fine if you don't want to have any usable tags for an end user. Otherwise, it doesn't do us much good, because when we decode the entities, the scripts will remain.


	// sanitize helps clean a given variable before it is passed to a mysql query, with allowable tags
	
	function sanitize($var){
		// First let's escape the content. Because we are using mysql_real_escape_string() 
		// we need to make sure that we santize our variable AFTER we have connected to 
		// our database. Otherwise we will get an error. We are prepending backslashes
		// to a certain subset of characters here.
		$var = mysql_real_escape_string($var); 
		
		// Let's strip out any tags we don't want, and leave the ones 
		// that are okay for the user to place in. (This call is yours to make)
		$var = strip_tags($var, '<p><a>');
		
		// There's a weakness in strip_tags that we need to address. If we allow
		// any tags, there's a chance that a malicious user could use XSS to exploit 
		// the code. We search the string for malicious code and kill the input if 
		// found. Some people choose to strip the offending material and still insert, but
		// my thoughts on the subjet are: if someone is trying to inject XSS, why
		// would we want them to post anything? 

		if(preg_match("/(<.*?(javascript|script|style|onmouseover|onmousedown).*?>)/i", $var)){
        	$var = "Illegal input, data removed";
			
			// Instead of 'return $var', we can also exit, preventing our script from running any further.
			// We simply uncomment the following line:
			// exit();
			return $var;

        }
		else{
        	return $var;
        }	
	}

Let's take a quick look at the REGEX statement in preg_match() to see what we're doing.

  /(<.*?(javascript|script|style|onmouseover|onmousedown).*?>)/i

First, we're looking for a < in the submitted variable.
Then, using .*? we basically ignore everything inside the tag UNLESS it contains javascript OR script OR style OR...
We ignore anything following those keywords, and check for a closing tag.
/i means that we're ignoring case. So if someone types JAVAscript of javaScRiPt, we don't care.

Let's look at an examle:

	// Remember, we need to connect to our database first because we use mysql_real_escape_string().
	mysql_connect($host,$user,$pass) or die("Could not make a connection");
	mysql_select_db($database_name) or die("Could not select database"); 	
	
	// In this example, we've saved our sanitize function in a separate file called sanitize.php
	include ("sanitize.php");
	
	// We declare our first variable with offending code.
	$var1 = "<a href=\"http://www.javod.com\" onmousedown=\"javascript:alert('XSS');\">XSS Attack Alert!</a>";
	
	// Our second variable 
	$var2="<a href=\"http://www.javod.com\">No XSS here!</a>";
	
	// Display our variables before sanitizing.
	echo "$var1 - (var1 before sanitizing)";
	echo "$var2 - (var2 before sanitizing)";
	
	// Display our variables after sanitizing.
	$var1 = sanitize($var1);	
	echo "$var1 - (var1 after sanitizing)";
	
	$var2 = sanitize($var2);
	echo "$var2 - (var2 after sanitizing)";

This will output the following:

XSS Attack Alert! - (var1 before sanitizing)
No XSS here! - (var2 before sanitizing)

Illegal input, data removed - (var1 after sanitizing)
No XSS here! - (var2 after sanitizing)

Is this impenetrable... ABSOLUTELY NOT! This function is a good start, and gives an idea of what we're needing to do, but by no means is it unbreakable. Once again, feel free to leave any questions or comments you have.

 
your name: 
url: 
comment: 
  required: copy the following code to post
test