вторник, 20 апреля 2010 г.

Limitations of taint propagation vulnerability model

In the previous post we discussed the two most popular vulnerability models used by static and runtime analyzers to detect security flaws in web applications. At this time I decided to discuss some limitations of the models. I'd like to emphasize that these limitations come from the assumptions and definitions made at the basis of the models. Thus we will not consider limitations imposed by the approach (static or dynamic analysis) chosen to implement a model.

Let us start from the taint analysis vulnerability model. Here is the vulnerability definition used in the model.
1. All data originating from web application users is untrusted. To track this data most analyzers associate a special "taint" mark with it.
2. All local data (file system, database, etc.) is trusted (we do not want to address local threats).
3. Untrusted data can be made trusted through special kinds of processing, which we will call sanitization. Thus, if analyzer detects that certain data marked with "taint" flag is passed through a sanitization routine, the flag will be removed.
4. Untrusted data cannot reach security critical operations like database queries, HTTP response generation, evals, etc. The violation of this rule is called a vulnerability.

A security analyst who is going to use this model will be required to:
1. Compile a list of language constructs that return user input. In some technologies (i.e. PHP) these constructs are built-in, in other technologies (Python) they are framework-dependent (consider obtaining HTTP parameters in mod_python vs WSGI).
2. Compile a list of sanitization constructs (built-in, 3rd-party libraries, or even implemented by web application developer).
3. Compile a list of critical operations (these are mostly built-in).
This becomes a configuration for the taint analysis.

Let us point out some limitations of the model.
1. The fisrt is a minor limitation, which can be simply overcome. The basic definition does not support classes of untrusted data. This means that the following code snippet will not produce a vulnerability warning:
$skip = $_GET['skip'];
$skip = htmlspecialchars($skip);
$query = "SELECT text FROM news LIMIT 10 OFFSET ".$skip;
$result = mysql_query($query);

Indeed, htmlspecialchars should be listed as a sanitization routine, so the "taint" flag associated with it at line 1 will be removed. But the the program does have an SQL injection vulnerability. This undetected vulnerability adds to false negatives and affects completeness of the analysis.
The limitation can be overcome by introduction of "taint" classes: "SQLI-untrusted", "XSS-untrusted", "Shell-injection untrusted", etc.

2. The second limitation lies within the assumption that all local data is trusted. This results in inability to detect multi-module vulnerabilities (aka second order injections). Let us consider the following example:

$text = $_GET['text'];
$text = addslashes($text);
$query = "INSERT INTO posts (text) VALUES ('".$text."')";
// do redirect

$skip = $_GET['skip'] + 0;
$query = "SELECT text FROM posts LIMIT 10 OFFSET ".$skip;
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)) {
   echo $row['text'];

This code demonstrates the simplest stored XSS vulnerability. However, due to the second assumption of the model, the analyzer will treat text data returned from the database as trusted. This limitation also leads to undetected vulnerabilities and affects the completeness of the analysis.
The limitation can be overcome by introduction of inter-module data dependency analysis. This approach was described in some papers:
[2007] Multi-Module Vulnerability Analysis of Web-based Applications
[2008] Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis with Penetration Testing

It is likely that most commercial static analyzers have adopted this approach by now.

3. The third limitation lies within the third rule, which states that sanitization should be performed through special routines that always return "good" data. But what about input validation through conditioning? This limitation cannot be overcome without integration with other vulnerability models. Let us consider the following example:
$email = $_GET['email'];
$valid_email_pattern = "..."; //complex reg exp from RFC 822
if (preg_match($valid_email_pattern, $email)) {
   // do processing
} else {
   echo "You have entered an invalid email address: ".$email; //XSS

It is unclear how to "untaint" variables, which are sanitized via such checks. In the code above variable $email should be "untainted" if the call to preg_match returns true. Could we use this as a rule to untaint all variables passed through preg_match? Obviously no! Let's take a look at the other example:
$email = $_GET['email'];
$invalid_email_pattern = "..."; //negated complex reg exp from RFC 822
if (preg_match($invalid_email_pattern, $email)) {
   echo "You have entered an invalid email address: ".$email; //XSS
} else {
   // do processing

So, in general we cannot determine in which branch we should "untaint" variables validated via conditional statements. Let us examine how analyzers could handle this issue. Basically, there are two options:
- preserve "taint" flag in both branches. This leads to false positives and affects precision of the analysis.
- remove "taint" flag from both branches. This leads to false negatives and affects completeness of the analysis.

4. The last minor drawback is the implicit trust laid on sanitization routines and on operator compiling the configuration. If any sanitization routine contains an error (i.e. is incomplete, does not perform normalization, susceptible to bypassing techniques, etc.) the inherent vulnerability will not be detected. Also, analysis can become either incomplete or imprecise if configuration lists with sanitization, input and critical routines were compiled with errors/omissions.

I hope you have found this blog post useful and I’m always interested in hearing any feedback you have.

p.s. In the next post we will discuss limitations of the other vulnerability model, that is parse-tree model.

вторник, 13 апреля 2010 г.

Vulnerability models in web applications

I've seen much confusion in understanding and using such terms as "taint propagation", "static analysis for vulnerabilities", "dynamic analysis". So I decided to clarify some important thing here (particularly with respect to web applications).
First of all, we should distinguish a model of program behavior and a model of vulnerability.
Every model of program behavior introduces its own terms for behavior specification. Depending on the level of abstraction, those terms could specify low-level (i.e. how program uses CPU registers) or high-level (i.e. dataflow dependencies) behavior.
The trivial models of program behavior are specifications in terms of source and byte- or binary code. Here are some other popular models: CFG (Control flow graph), various dependence graphs, program slices and pre- and postconditions, a set of possible values for each variable at every program point (hello, so-called string analysis).
What is most important is that these models have nothing to do with static or dynamic analysis. Static and dynamic analysis are the means to construct the model of program behavior. Of course, static analysis tends to yield complete but imprecise models while dynamic analysis yields precise but incomplete models. You can find concepts of both static and dynamic slices; there are concepts of pre and post conditions in static and runtime.

Now let us move to vulnerability models. Every vulnerability model is tightly bound to the model of program behavior. Indeed, the existence of vulnerability should be specified in terms of program behavior!
There is widely accepted model of vulnerabilities at the time - the non-interference model and its extensions. In simple words it tells us that untrusted input should not interfere (e.g. change the intended behavior) with critical operations. Okey, we have this requirement, so how do we check it in our program?

The simplest way is to follow the very definition of non-interference: let us mark untrusted input as taint source and critical operations as taint sinks. Now if a taint sink depends on a taint source, and input data is not sanitized (i.e. made trusted) we shall raise a vulnerability warning. We can see that this re-formulation of non-interference is perfectly expressed in terms of dependencies. Hence, automated tools that utilize this definition should be able to make program slices and/or build dependence graphs (does not matter in runtime or in static).

A more subtle (and in my opinion beautiful) way to express interference formally was proposed in early 2000s. Let us survey what SQL (and any other) injection is. It's when a user can mix control and data channels of a query, or in other terms, change its syntactic structure. So, the definition of non-interference could be re-formulated as follows: we should raise a vulnerability warning if syntactic structure of a critical query depends on the user input. We can see that this re-formulation of non-interference is perfectly expressed in terms of parse trees of string values, which program variables may hold at every execution point. Hence, automated tools that utilize this definition should be able to perform string analysis in static (at runtime there is no difficulty to obtain the values of the needed variable).

And finally, we can see now that taint propagation is not a method to detect vulnerabilities. Taint propagation is one of the means to determine data dependencies at runtime.

Some references:
Basic dependency model (aka taint propagation)
[2006] Pixy - Technical Report
[2006] Static Detection of Security Vulnerabilities in Scripting Languages
[2009] TAJ Effective Taint Analysis of Web Applications

[2004] Analysis of Perl Taint Mode
[2005] Automatically hardening web applications using precise tainting
[2005] Dynamic Taint Propagation for Java

Parse tree validation model:
[2005] Static Approximation of Dynamically Generated Web Pages
[2007] Sound and precise analysis of web applications for injection vulnerabilities

[2005] Combining Static Analysis and Runtime Monitoring to Counter SQL-Injection Attacks
[2005] Using Parse Tree Validation to Prevent SQL Injection Attacks
[2006] The Essence of Command Injection Attacks in Web Applications
[2006] Using Positive Tainting and SyntaxAware Evaluation to Counter SQL Injection Attacks

In the next post we will discuss some limitations of these models. Stay tuned!

воскресенье, 4 апреля 2010 г.

Bushwhackers won bronze at RusCrypto CTF!

Congratulations! The team of students from our «Information Security» profile workshop at Moscow State University won bronze at RusCrypto CTF.
The final standings are:
1. CIT (Saint Petersburg State University of Information Technologies, Mechanics and Optics)
2. HackerDom (Ural State University)
3. Bushwhackers (Moscow State University)
4. SiBears (Tomsk State University)
5. [Censored] (Immanuel Kant State University of Russia)
6. Huge Ego Team (Moscow Engineering Physics Institute)