Sanitation Class for PHP

Everyone loves to loves a clean environment... but no one loves to clean. 

Sanitation

Everyone loves to loves a clean environment... but no one loves to clean.  I am no exception. But I understand the need.  After working on PHP for years, and discovering no really good libraries out there for data sanitation... I have written my own.  (I tried to use OWASP's ESAPI... but gave up...)

Definition:

According to google Sanitation is "Conditions relating to public health, esp. the provision of clean drinking water and adequate sewage disposal" or "the state of being clean and conducive to health".  Well I'm talking about computers, and data here.  So I would like to modify the definition for "data" to mean this: "The removal of unwanted segments of data for the purpose of application health and wellbeing."

Purpose:

This class was created to remove unwanted "parts" from data to leave only wanted parts.  This DOES NOT PERFORM ANY VALIDATION WHAT SO EVER!!!!! This simply removes unwanted "CHARACTERS" from data. Leaving you with data "hopefully" suitable for you needs. This does not prevent all forms of attack, like injection or overflows.  This does however only allow data you specify to pass. 

Donate

This class has taken a great deal of effort to develop, design, test, and maintain.  Please donate to keep this going:

 

Download:

Current Version 10: TaggedZi's Sanitation Library v10 (md5: 39589e0dab35cbe0815fa8b589388fde)

Install:

This library/class was created to function as a pure php library/class.  It CAN be very easily plugged into other frameworks (like Codeigniter).  To use this library you would extract the library, and place it in your path, then call it in your PHP like so:

include_once("Sanitize.php");
$sanitize = new Sanitize;

Usage in Codeiginter:

If you wanted to use this in Codeigniter you would extract the library to you "application/libraries" folder and then call it like this:

$this->load->library('sanitize');

All of the examples given here are using Standard PHP, to use them in Codeigniter simply replace "$sanitize" with "$this->sanitize".

 

Documentation and Examples:

I will start with the Most important function in this class:

/**
* white_list_cleaner
*
* This is a "white list" cleaner. Given an input string it removes ALL
* characters that are not on the "white list". Thereby cleaning the text
* from unwanted/expected characters.
* @param string $input
* @param string $allowable
* @param string $replacement
* @access public
* @return mixed (Boolean FALSE on failure, Clean String on success)
*/
public function white_list_cleaner($input = '', $allowable = self::ALPHA_DASH, $replacement = '')

This function/method walks character by character through the $input, and ONLY allows characters that are in the $allowable string. If a character is found that is not on the allowable list... it replaces that character with what ever is in the $replacement field. So as an example, lets say I wanted to sanitize a text feild from an HTML form that is supposed to collect a HEX color value.  I want the values to be in upper case ONLY. This is what I would do:

try
{
    $clean_input = $sanitize->white_list_cleaner($_POST['hex_color'], '0123456789ABCDEF');
    var_dump($clean_input);
}
catch (Exception $e)
{
    echo "There was a programing Error: ' . $e->getMessage();
}

This would return a "string" the only had the characters specified.  This does not perform any range checking or validation.  This does however clean all unwanted characters out.

Because this would be a pain to do for every type, I have created helper function to make this easier and return a valid type:

clean_integer

/**
* clean_integer
*
* This takes an input string, runs it through the white list
* filer for "integers" and removed all non-integer characters.
* If it passed the cleaner, it then forces a typecast of Integer.
*
* @param string $input
* @access public
* @return integer
*/
public function clean_integer($input = "0")

This function takes a "string" input and returns an integer if possible. It uses the "white_list_cleaner" from above only allowing the following characters "0123456789-".  Once all invalid characters are removed, it attempts to convert what is left over to an integer and return the results.

So as an example lets say I have a form field that I want to clean that is supposed to collect an integer, here is what I would do:

try
{
    $integer = $sanitize->clean_integer($_POST['int_form']);
    var_dump($integer);
}
catch (Exception $e)
{
   echo $e->getMessage();
}

Given that example if I input string("1254a") it would return int(1254);  If I input string('how to skin a cat') it would return int(0). If I input string('1357-9a') it would return int(1357);  This may be unexpected to some people, but the reason is this.  "-" is allowed in "type integer" variables.  So the sterilizing function allows it (remember this does not check to see if it is valid, it only forces what it is given to contain valid characters AND then try to type cast (force) it to be of that type.  So"white_list_cleaner" returns "1357-9", THEN PHP tries to convert the string into an integer, PHP does NOT evaluate the math, it only goes to the first "string" character and stops, leaving "1357".  Please understand, This function is to return an integer... not to find out if it is presented how you want it (that is the job of validation, You can use the validation class on this site to perform your validation.)

clean_hex

/**
* clean_hex
*
* @param string $input
* @param string $case (Allowed: MIXED, UPPER, LOWER)
* @access public
* @return string This returns a STRING that only contains VALID hex characters.
*/
public function clean_hex($input, $case = 'MIXED')

Back to our original example, we could use this function instead of the white_list_cleaner, to make sure we had a hex input. This function strips all non "hex" characters, and converts what is left over into a "hex" string. 

clean_octal

/**
* clean_octal
*
* This function takes an input string and REMOVES all invalid octal chars.
* @param mixed $input
* @access public
* @return string This returns a STRING representation of an octal value.
*/
public function clean_octal($input)

This takes an input string and removes all non-octal characters, then returns a string representation of that number. (it would return "0352")

clean_float

/**
* clean_float
*
* This method takes a string any removes any characters that are not
* valid float characters from the string.
* @param string $input
* @access public
* @return string.
*/
public function clean_float($input = "0.0")

This takes in a string strips all non-float characters, and then performs a typecast to "force" it to be a float.

clean_string

/**
* clean_string
*
* @param string $input
* @access public
* @return void
*/
public function clean_string($input = '')

This is a shortcut to "white_list_cleaner" with 0-9, a-z, A-Z, -, _, and " ". Characters allowed.

black_list_cleaner

/**
* black_list_cleaner
*
* This function takes a string input, and goes character by character through
* the string. IF a character is on the "black_list" then it is removed and
* replaced with a specified character (or string).
*
* Note: It is not recommended to use this function for USER input data.
* When ever possible it is recommended to use the white_list_cleaner function
* above. "black_list_cleaner" was added because there are cases where this behavior
* is desirable. But under most circumstances it is not the most secure method.
* Make sure you understand the difference, and security implications of these
* methods BEFORE you use them.
*
* @param string $input
* @param string $banned_chars
* @param string $replace
* @access public
* @return string
*/
public function black_list_cleaner($input = '', $banned_chars = '', $replacement = '')

This is the opposite of the "white_list_cleaner"... rather than requiring a character be on the "allowed" list, it check to see if a character is on the "banned" list.  So if you want to allow "everything except a few characters" this is the function you want to use. 

Batch Processing

I have created a function to process entire batches of strings at once using the white list cleaner:

/**
* white_list_array
*
* This method is a "helper" to allow for input of arrays of strings to be "batch
* processed" at a single time. This does PRESERVER the original array keys.
* However BECAUSE of that, keys are not cleaned in any way. Be aware if you
* are using keys that can be produced by a user... they do not get filtered HERE.
* You must manually clean your keys.
* @param array $input
* @param mixed $allowed_chars
* @param string $replacement
* @access public
* @return array
*/
public function white_list_array($input = array(), $allowed_chars = self::ALPHA_DASH, $replacement = '')

To use this you would input an array of strings instead of a single string.  The allowed chars are re-used for each string in the array.

Page Information:
  • Tags: Sanitation, Data Sterilization, Input Cleaning, Variable cleaning,
  • Description: This article is about how to use TaggedZi's Sanitation library for PHP, including a link to download.