среда, 11 августа 2010 г.

Enhancing black-box scanning with dynamic analysis

As its very name suggests a black-box scanner doesn't know how each input http parameter is handled by web application. From an attacker's perspective for each input parameter we would like to know:
- the list of sensitive operations, which are called with arguments derived from the input value (whether parameter "name" is passed to echo, to mysql_query or to both?)
- the syntactic structure of sensitive queries
- URLs, which return http-response with input parameter processing results (there we will check for evidences of successful exploitation).
In general case it is impossible to precisely derive this kind of information with black-box analysis. That's why in order to demonstrate good results black-box scanners have to:
- sequentially inject in each input parameter attacks of every type (XSS, SQLI, command injection, XPath injection, RFI, and many more)
- within a given attack type (e.g. SQLI) we have to guess the control characters (for SQL - quotes, number of brackets, etc), which we need to prepend to the injected attack vector so that the resulting query is left syntactically correct. * assuming web application handles exceptions properly *
- in order to detect multi-module vulnerabilities (which make second order attacks possible) we have to crawl every web application interface after each injection. Indeed, in general case we do not know which web application interface contains the results of poisoned parameter processing.

Let us call this way of testing - "undirected fuzzing". If we just could gain data indicated in the first list above we could perform a "directed fuzzing". We would inject in each input parameter only those attack vectors that are relevant to the processing path of the data. Besides, we would inject only such attack vectors that leave queries generated by web application syntactically correct. As such, we might focus on filter bypassing techniques. Finally, after each injection we will know which URLs to check for evidences of successful exploitation.

And all this kind of useful information is possible to gather by means of dynamic analysis. Let us suppose, we have two components: a scanner and a server-side module (implemented either as a part of an interpreter (Python, Perl, PHP, Ruby) or as a web application instrumentation/hooks (.NET, Java)). Here's one of the possible workflows:
1. A scanner component performs crawling with form submission and authentication. At the same time server-side module builds traces of web application execution for each http-request.
2. Once crawling completes, execution traces are sent to the scanner for further analysis. At this stage scanner builds dependence graphs and analyses them. The goal is to infer:
- the list of sensitive operations, which are called with arguments derived from the input value
- the syntactic structure of sensitive queries
- URLs, which return http-response that show input parameter processing results.
This information is used to create a plan for directed fuzzing.
3. Scanner performs directed fuzzing, while server-side module collects execution traces. Once fuzzing completes, collected traces are sent to the scanner for the final analysis. Scanner uses different checks to detect whether each attack was successful:
- response analysis (good for XSS detection);
- parse tree analysis (see explanation here);
- control flow analysis (detect exceptional conditions if any).

This is just an illustration of how one could leverage dynamic analysis to substantially enhance black-box scanning results (and time!).

I hope you have found this blog post useful and I'm always interested in hearing any feedback you have.

пятница, 11 июня 2010 г.

The need for differentiation of filter routines in multi-module taint analysis

Last time we talked about three limitations of the taint analysis vulnerability model. The first two of those limitations can be overcome by introduction of classes for tainted data and by tracking data dependencies through persistent storage (e.g. database).
Let us assume that we have implemented such multi-module class-aware taint analysis. Let us consider the following example of the inter-module data dependency:

addpost.php
1: $text = $_GET['text'];
2: $text = addslashes($text);
3: $author = $_GET['author'];
4: $author = addslashes($author);
5: mysql_connect();
6: mysql_select_db("myDB");
7: $query = "INSERT INTO posts (author, text) VALUES ('".$author."', '".$text."')";
8: mysql_query($query);
// do redirect


getstats.php
1: mysql_connect();
2: mysql_select_db("myDB");
3: $result = mysql_query("SELECT author FROM posts ORDER by author");
4: while ($row = mysql_fetch_assoc($result)) {
5: $author = $row['author'];
6: $count = mysql_query("SELECT count(text) FROM posts WHERE author = '".$author."'");
7: $row = mysql_fetch_array($count)
8: echo "Author ".$author." has ".$row[0]." posts; ";
}

//suboptimal schema, suboptimal queries, I know...

Let us apply taint analysis to this code. There are several paths that input data take to reach critical operations. Let us consider one of them:

addpost.php:3
$author = $_GET['author'];
taint analysis: data is received from the untrusted source, so we will associate all taint classes with it; $author is marked with (sqli, xss, osi, rfi).

addpost.php:4
$author = addslashes($author);
taint analysis: data is passed through filter function, so we will remove corresponding taint flag from it; $author is marked with (xss, osi, rfi).

addpost.php:7
$query = "INSERT INTO posts (author, text) VALUES ('".$author."', '".$text."')";
taint analysis: data is used to initialize another variable, so we will copy taint flags as well; $query is marked with (xss, osi, rfi).

addpost.php:8
mysql_query($query);
taint analysis: critical operation receives data with flags (xss, osi, rfi). These flags do not contain sqli flag, so everything is fine, no vulnerability. Also, database field posts.author is marked to have a dependency upon variable $author.

getstats.php:3
$result = mysql_query("SELECT author FROM posts ORDER by author");
taint analysis: critical operation is called with constant value, everything is fine. Also, variable $result is marked to have a dependence upon database field posts.author and, consequently, upon variable $author from module addpost.php.

getstats.php:4
while ($row = mysql_fetch_assoc($result)) {
taint analysis: Variable $row is marked to have a dependence upon variable $result and, consequently, upon variable $author from module addpost.php.

getstats.php:5
$author = $row['author'];
taint analysis: Variable $author is initialized from database field, which depends upon variable $author from module addpost.php. Thus, local variable $author is associated with taint flags (xss, osi, rfi).

getstats.php:6
$count = mysql_query("SELECT count(text) FROM posts WHERE author = '".$author."'");
taint analysis: critical operation receives data with flags (xss, osi, rfi). These flags do not contain sqli flag, so everything is fine, no vulnerability.

getstats.php:7
$row = mysql_fetch_array($count)
taint analysis: not interesting

getstats.php:8
echo "Author ".$author." has ".$row[0]." posts; ";
taint analysis: critical operation is called with flags (xss, osi, rfi). These flags do contain xss flag, so vulnerability (Stored XSS) is detected.

Alas, we missed a multi-module (second order) SQL injection vulnerability. This happened because taint analysis by default does not take care of the semantics of filter functions. Indeed, if we analyze methods of input validation, we might come to the 5 types of it:
1. Typecasting.
2. Encoding.
3. Removing of bad chars/keywords.
4. Escaping.
5. Branching. Cannot be addressed by taint analysis (see limitation #3).
The first three types of validation remove injection pattern once and for all. This mean that the common workflow get input -> validate input -> store in database -> get from database -> use in critical operation will not contain vulnerability.
But escaping of database queries is different. Escaping removes injection pattern only for the first query. In database data is stored as-is and consequently this data will contain injection pattern when retrieved from database. Hence, the workflow get input -> escape input -> store in database -> get from database -> use in sql query will contain a multi-module vulnerability.
As such, multi-module taint analysis should take account for the types of filter functions used in web applications, and assign taint flags accordingly.

вторник, 20 апреля 2010 г.

Limitations of taint propagation vulnerability model

In the previous post we discussed the two most popular vulnerability models used by static and runtime analyzers to detect security flaws in web applications. At this time I decided to discuss some limitations of the models. I'd like to emphasize that these limitations come from the assumptions and definitions made at the basis of the models. Thus we will not consider limitations imposed by the approach (static or dynamic analysis) chosen to implement a model.

Let us start from the taint analysis vulnerability model. Here is the vulnerability definition used in the model.
1. All data originating from web application users is untrusted. To track this data most analyzers associate a special "taint" mark with it.
2. All local data (file system, database, etc.) is trusted (we do not want to address local threats).
3. Untrusted data can be made trusted through special kinds of processing, which we will call sanitization. Thus, if analyzer detects that certain data marked with "taint" flag is passed through a sanitization routine, the flag will be removed.
4. Untrusted data cannot reach security critical operations like database queries, HTTP response generation, evals, etc. The violation of this rule is called a vulnerability.

A security analyst who is going to use this model will be required to:
1. Compile a list of language constructs that return user input. In some technologies (i.e. PHP) these constructs are built-in, in other technologies (Python) they are framework-dependent (consider obtaining HTTP parameters in mod_python vs WSGI).
2. Compile a list of sanitization constructs (built-in, 3rd-party libraries, or even implemented by web application developer).
3. Compile a list of critical operations (these are mostly built-in).
This becomes a configuration for the taint analysis.

Let us point out some limitations of the model.
1. The fisrt is a minor limitation, which can be simply overcome. The basic definition does not support classes of untrusted data. This means that the following code snippet will not produce a vulnerability warning:
$skip = $_GET['skip'];
$skip = htmlspecialchars($skip);
mysql_connect();
mysql_select_db("myDB");
$query = "SELECT text FROM news LIMIT 10 OFFSET ".$skip;
$result = mysql_query($query);

Indeed, htmlspecialchars should be listed as a sanitization routine, so the "taint" flag associated with it at line 1 will be removed. But the the program does have an SQL injection vulnerability. This undetected vulnerability adds to false negatives and affects completeness of the analysis.
The limitation can be overcome by introduction of "taint" classes: "SQLI-untrusted", "XSS-untrusted", "Shell-injection untrusted", etc.

2. The second limitation lies within the assumption that all local data is trusted. This results in inability to detect multi-module vulnerabilities (aka second order injections). Let us consider the following example:

addpost.php
$text = $_GET['text'];
$text = addslashes($text);
mysql_connect();
mysql_select_db("myDB");
$query = "INSERT INTO posts (text) VALUES ('".$text."')";
mysql_query($query);
// do redirect

viewpost.php
$skip = $_GET['skip'] + 0;
mysql_connect();
mysql_select_db("myDB");
$query = "SELECT text FROM posts LIMIT 10 OFFSET ".$skip;
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)) {
   echo $row['text'];
}

This code demonstrates the simplest stored XSS vulnerability. However, due to the second assumption of the model, the analyzer will treat text data returned from the database as trusted. This limitation also leads to undetected vulnerabilities and affects the completeness of the analysis.
The limitation can be overcome by introduction of inter-module data dependency analysis. This approach was described in some papers:
[2007] Multi-Module Vulnerability Analysis of Web-based Applications
[2008] Detecting Security Vulnerabilities in Web Applications Using Dynamic Analysis with Penetration Testing

It is likely that most commercial static analyzers have adopted this approach by now.

3. The third limitation lies within the third rule, which states that sanitization should be performed through special routines that always return "good" data. But what about input validation through conditioning? This limitation cannot be overcome without integration with other vulnerability models. Let us consider the following example:
$email = $_GET['email'];
$valid_email_pattern = "..."; //complex reg exp from RFC 822
if (preg_match($valid_email_pattern, $email)) {
   // do processing
} else {
   echo "You have entered an invalid email address: ".$email; //XSS
   exit;
}


It is unclear how to "untaint" variables, which are sanitized via such checks. In the code above variable $email should be "untainted" if the call to preg_match returns true. Could we use this as a rule to untaint all variables passed through preg_match? Obviously no! Let's take a look at the other example:
$email = $_GET['email'];
$invalid_email_pattern = "..."; //negated complex reg exp from RFC 822
if (preg_match($invalid_email_pattern, $email)) {
   echo "You have entered an invalid email address: ".$email; //XSS
   exit;
} else {
   // do processing
}


So, in general we cannot determine in which branch we should "untaint" variables validated via conditional statements. Let us examine how analyzers could handle this issue. Basically, there are two options:
- preserve "taint" flag in both branches. This leads to false positives and affects precision of the analysis.
- remove "taint" flag from both branches. This leads to false negatives and affects completeness of the analysis.

4. The last minor drawback is the implicit trust laid on sanitization routines and on operator compiling the configuration. If any sanitization routine contains an error (i.e. is incomplete, does not perform normalization, susceptible to bypassing techniques, etc.) the inherent vulnerability will not be detected. Also, analysis can become either incomplete or imprecise if configuration lists with sanitization, input and critical routines were compiled with errors/omissions.

I hope you have found this blog post useful and I’m always interested in hearing any feedback you have.

p.s. In the next post we will discuss limitations of the other vulnerability model, that is parse-tree model.

вторник, 13 апреля 2010 г.

Vulnerability models in web applications

I've seen much confusion in understanding and using such terms as "taint propagation", "static analysis for vulnerabilities", "dynamic analysis". So I decided to clarify some important thing here (particularly with respect to web applications).
First of all, we should distinguish a model of program behavior and a model of vulnerability.
Every model of program behavior introduces its own terms for behavior specification. Depending on the level of abstraction, those terms could specify low-level (i.e. how program uses CPU registers) or high-level (i.e. dataflow dependencies) behavior.
The trivial models of program behavior are specifications in terms of source and byte- or binary code. Here are some other popular models: CFG (Control flow graph), various dependence graphs, program slices and pre- and postconditions, a set of possible values for each variable at every program point (hello, so-called string analysis).
What is most important is that these models have nothing to do with static or dynamic analysis. Static and dynamic analysis are the means to construct the model of program behavior. Of course, static analysis tends to yield complete but imprecise models while dynamic analysis yields precise but incomplete models. You can find concepts of both static and dynamic slices; there are concepts of pre and post conditions in static and runtime.

Now let us move to vulnerability models. Every vulnerability model is tightly bound to the model of program behavior. Indeed, the existence of vulnerability should be specified in terms of program behavior!
There is widely accepted model of vulnerabilities at the time - the non-interference model and its extensions. In simple words it tells us that untrusted input should not interfere (e.g. change the intended behavior) with critical operations. Okey, we have this requirement, so how do we check it in our program?

The simplest way is to follow the very definition of non-interference: let us mark untrusted input as taint source and critical operations as taint sinks. Now if a taint sink depends on a taint source, and input data is not sanitized (i.e. made trusted) we shall raise a vulnerability warning. We can see that this re-formulation of non-interference is perfectly expressed in terms of dependencies. Hence, automated tools that utilize this definition should be able to make program slices and/or build dependence graphs (does not matter in runtime or in static).

A more subtle (and in my opinion beautiful) way to express interference formally was proposed in early 2000s. Let us survey what SQL (and any other) injection is. It's when a user can mix control and data channels of a query, or in other terms, change its syntactic structure. So, the definition of non-interference could be re-formulated as follows: we should raise a vulnerability warning if syntactic structure of a critical query depends on the user input. We can see that this re-formulation of non-interference is perfectly expressed in terms of parse trees of string values, which program variables may hold at every execution point. Hence, automated tools that utilize this definition should be able to perform string analysis in static (at runtime there is no difficulty to obtain the values of the needed variable).

And finally, we can see now that taint propagation is not a method to detect vulnerabilities. Taint propagation is one of the means to determine data dependencies at runtime.

Some references:
Basic dependency model (aka taint propagation)
-Static
[2006] Pixy - Technical Report
[2006] Static Detection of Security Vulnerabilities in Scripting Languages
[2009] TAJ Effective Taint Analysis of Web Applications

-Runtime
[2004] Analysis of Perl Taint Mode
[2005] Automatically hardening web applications using precise tainting
[2005] Dynamic Taint Propagation for Java

Parse tree validation model:
-Static
[2005] Static Approximation of Dynamically Generated Web Pages
[2007] Sound and precise analysis of web applications for injection vulnerabilities

-Runtime
[2005] Combining Static Analysis and Runtime Monitoring to Counter SQL-Injection Attacks
[2005] Using Parse Tree Validation to Prevent SQL Injection Attacks
[2006] The Essence of Command Injection Attacks in Web Applications
[2006] Using Positive Tainting and SyntaxAware Evaluation to Counter SQL Injection Attacks

In the next post we will discuss some limitations of these models. Stay tuned!

воскресенье, 4 апреля 2010 г.

Bushwhackers won bronze at RusCrypto CTF!

Congratulations! The team of students from our «Information Security» profile workshop at Moscow State University won bronze at RusCrypto CTF.
The final standings are:
1. CIT (Saint Petersburg State University of Information Technologies, Mechanics and Optics)
2. HackerDom (Ural State University)
3. Bushwhackers (Moscow State University)
4. SiBears (Tomsk State University)
5. [Censored] (Immanuel Kant State University of Russia)
6. Huge Ego Team (Moscow Engineering Physics Institute)

воскресенье, 7 марта 2010 г.

четверг, 14 января 2010 г.

Thoughts on building a taxonomy of vulnerabilities, pt.1

Classifications, taxonomies... Web security...
I decided to make a number of posts discussing these topics.
The important questions are:
- what is a classification; what is a taxonomy;
- what to classify: flaws or vulnerabilities; and what for;
- what are the existing taxonomies/classifications;
- why it is easier to build a taxonomuy of attacks/threats than of flaws or vulnerabilities;
- finally, is it possible to build a useful taxonomy of web application vulnerabilities.

First of all, I'd like to introduce the notion of taxonomy, just to make sure those readers that might come across these posts are synchronized with me.
In common life the terms taxonomy and classification are usually used interchangeably. Further, often by word “classification” people refer to some process of breaking the set of items into groups. Let us introduce some order here.
Marradi (“Classification, typology, taxonomy”) defines the process of classification as one of the following:

  • an intellectual operation, whereby the extension of a concept at a given level of generality is subdivided into several narrower extensions corresponding to as many concepts at a lower level of generality;

  • an operation whereby the objects or events in a given set are divided into two or more subsets according to the perceived similarities of one or several properties; or

  • an operation whereby objects or events are assigned to classes or types that have been previously defined.


As we see, there are two approaches to building a classification. I will call them a model-based (from the general to the particular, or some might say deductive) approach and an instance-based (from the particular to the general, or some might say inductive) approach respectively.
Ok, what's then a taxonomy anyways?
According to Simpson (“Principles of animal taxonomy”), a taxonomy is a “classification, including bases, principles, procedures and rules”. Thus, a taxonomy is more than a classification – it states the principles, according to which the classification is done, and procedures to be followed in order to classify new objects.
Taxonomies should have taxonomic categories with the following characteristics:

  1. Mutually exclusive: the categories do not overlap.

  2. Exhaustive: taken together, the categories include all the possibilities.

  3. Unambiguous: clear and precise so that classification is not uncertain, regardless of who is classifying.

  4. Repeatable: repeated applications result in the same classification, regardless of who is classifying.

  5. Accepted: logical and intuitive so that categories could become generally approved.

  6. Useful: could be used to gain insight into the field of inquiry.


As the authors of “A Vulnerability Taxonomy Methodology applied to the Web Services” point out, there is no such thing as the ultimate taxonomy. Rather, each taxonomy is designed for the specific intended usage. Hence, the value of each taxonomy and its usefulness should be considered along with the viewpoint and the scope that the authors thereof had intended. Moreover, the authors of the certain taxonomy should explicitly state the intended usage, the scope and the viewpoint thereof.

In the next post I am going to say a few words on taxonomies in application to web security. The important questions is: what to classify: flaws or vulnerabilities? And what for?

понедельник, 11 января 2010 г.

NULL pointer dereference flaw -> vulnerability

NULL pointer dereference is a common implementation-time flaw. Sometimes this flaw becomes a vulnerability. Strangely, but this is because of a bad design.


To illustrate this, let us consider a network application. Suppose, it dereferences a NULL pointer under certain circumstances. In order for this flaw to become a vulnerability, a single user should have an opportunity to interrupt application services exploiting NULL pointer dereference. Obviously, this is a bad design decision.


This is the way implementation-time flaw becomes a design-time vulnerability.

среда, 6 января 2010 г.

Hello, World!

I am going to post here some thoughts on web application security and oil prices (ha-ha). All comments are really welcome!