пятница, 11 июня 2010 г.

The need for differentiation of filter routines in multi-module taint analysis

Last time we talked about three limitations of the taint analysis vulnerability model. The first two of those limitations can be overcome by introduction of classes for tainted data and by tracking data dependencies through persistent storage (e.g. database).
Let us assume that we have implemented such multi-module class-aware taint analysis. Let us consider the following example of the inter-module data dependency:

addpost.php
1: $text = $_GET['text'];
2: $text = addslashes($text);
3: $author = $_GET['author'];
4: $author = addslashes($author);
5: mysql_connect();
6: mysql_select_db("myDB");
7: $query = "INSERT INTO posts (author, text) VALUES ('".$author."', '".$text."')";
8: mysql_query($query);
// do redirect


getstats.php
1: mysql_connect();
2: mysql_select_db("myDB");
3: $result = mysql_query("SELECT author FROM posts ORDER by author");
4: while ($row = mysql_fetch_assoc($result)) {
5: $author = $row['author'];
6: $count = mysql_query("SELECT count(text) FROM posts WHERE author = '".$author."'");
7: $row = mysql_fetch_array($count)
8: echo "Author ".$author." has ".$row[0]." posts; ";
}

//suboptimal schema, suboptimal queries, I know...

Let us apply taint analysis to this code. There are several paths that input data take to reach critical operations. Let us consider one of them:

addpost.php:3
$author = $_GET['author'];
taint analysis: data is received from the untrusted source, so we will associate all taint classes with it; $author is marked with (sqli, xss, osi, rfi).

addpost.php:4
$author = addslashes($author);
taint analysis: data is passed through filter function, so we will remove corresponding taint flag from it; $author is marked with (xss, osi, rfi).

addpost.php:7
$query = "INSERT INTO posts (author, text) VALUES ('".$author."', '".$text."')";
taint analysis: data is used to initialize another variable, so we will copy taint flags as well; $query is marked with (xss, osi, rfi).

addpost.php:8
mysql_query($query);
taint analysis: critical operation receives data with flags (xss, osi, rfi). These flags do not contain sqli flag, so everything is fine, no vulnerability. Also, database field posts.author is marked to have a dependency upon variable $author.

getstats.php:3
$result = mysql_query("SELECT author FROM posts ORDER by author");
taint analysis: critical operation is called with constant value, everything is fine. Also, variable $result is marked to have a dependence upon database field posts.author and, consequently, upon variable $author from module addpost.php.

getstats.php:4
while ($row = mysql_fetch_assoc($result)) {
taint analysis: Variable $row is marked to have a dependence upon variable $result and, consequently, upon variable $author from module addpost.php.

getstats.php:5
$author = $row['author'];
taint analysis: Variable $author is initialized from database field, which depends upon variable $author from module addpost.php. Thus, local variable $author is associated with taint flags (xss, osi, rfi).

getstats.php:6
$count = mysql_query("SELECT count(text) FROM posts WHERE author = '".$author."'");
taint analysis: critical operation receives data with flags (xss, osi, rfi). These flags do not contain sqli flag, so everything is fine, no vulnerability.

getstats.php:7
$row = mysql_fetch_array($count)
taint analysis: not interesting

getstats.php:8
echo "Author ".$author." has ".$row[0]." posts; ";
taint analysis: critical operation is called with flags (xss, osi, rfi). These flags do contain xss flag, so vulnerability (Stored XSS) is detected.

Alas, we missed a multi-module (second order) SQL injection vulnerability. This happened because taint analysis by default does not take care of the semantics of filter functions. Indeed, if we analyze methods of input validation, we might come to the 5 types of it:
1. Typecasting.
2. Encoding.
3. Removing of bad chars/keywords.
4. Escaping.
5. Branching. Cannot be addressed by taint analysis (see limitation #3).
The first three types of validation remove injection pattern once and for all. This mean that the common workflow get input -> validate input -> store in database -> get from database -> use in critical operation will not contain vulnerability.
But escaping of database queries is different. Escaping removes injection pattern only for the first query. In database data is stored as-is and consequently this data will contain injection pattern when retrieved from database. Hence, the workflow get input -> escape input -> store in database -> get from database -> use in sql query will contain a multi-module vulnerability.
As such, multi-module taint analysis should take account for the types of filter functions used in web applications, and assign taint flags accordingly.