Skip to content

XSS Defense: No Silver Bullets

Kevin W. Wall edited this page Jul 30, 2022 · 12 revisions

DRAFT - Work in Progress - this is an opinion piece of Kevin Wall

Note: The opinions expressed herein are my own and may not reflect the rest of OWASP or other AppSec professionals. -kevin wall

Introduction

Cross-Site Scripting (XSS) defense is hard. That is one reason why it is so pervasive. Unfortunately, for those who have not done a deep dive into XSS vulnerabilities, it appears alarmingly simple to solve. But these naive "solutions" inevitably turn out to be either incomplete at best or completely wrong. This wiki page explores some of these naive approaches and examines why there is "No Silver Bullet" as Frederick Brooks so aptly said (all the way back in 1986) for XSS defense.

I shall not discuss here how to defend against XSS on this wiki page. The section "How to Avoid Cross-site scripting Vulnerabilities" on the aforementioned OWASP XSS page already provides some excellent resources. Take time to read them!

Instead, I am going to explore two common [anti-patterns]{https://en.wikipedia.org/wiki/Anti-pattern) that I've observed several times a year. I will start out with the less common and less egregious one.

Content-Security-Policy (CSP) as an Anti-pattern

First, let me be clear, I am a strong proponent of CSP when it is used properly. What I am against is a blanket CSP policy for the entire enterprise. Generally this fails because of the following reasons:

  • There is an assumption that all the customer browsers support all the CSP constructs that your blanket CSP policy is using and generally this is done without testing the User-Agent request header to see if the customer browser supports it. Because, let's face it, most businesses don't want to turn away customers because they are using an outdated browser that doesn't support some CSP Level 2 or Level 3 constructs. (Almost all support CSP Level 1 unless you are worried about Grandpa pulling out his old Windows 98 laptop and using some ancient version of Internet Explorer to access your site.)
  • Mandatory universal enterprise-wide CSP response headers are inevitably going to break some web applications, especially legacy ones. This causes the business to push-back against AppSec guidelines and inevitably results in AppSec issuing waivers and/or security exceptions until the application code can be patched up. But these security exceptions allow cracks in your XSS armor, and even if the cracks are temporary they still can impact your business.

What works, and works well, is when CSP request headers are customized for each web application and reviewed by your AppSec team for effectiveness. What works even better though is when CSP is used as a defense-in-depth mechanism along with appropriate contextual output encoding.

Interceptors as an Anti-pattern

The other common anti-pattern that I have observed is the attempt to deal with validation and/or output encoding in some sort of interceptor such as a Spring Interceptor that generally implements org.springframework.web.servlet.HandlerInterceptor or as a JavaEE servlet filter that implements javax.servlet.Filter. While this can be successful for very specific applications (for instance, if you validate that all the input requests that are ever rendered are only alphanumeric data), it violates the major tenet of XSS defense where perform output encoding as close to where the data is rendered is possible. Generally, the HTTP request is examined for query and POST parameters but other things HTTP request headers that might be rendered such as cookie data, are not examined. The common approach that I've seen is someone will call either ESAPI.validator().getValidSafeHTML() or ESAPI.encoder.canonicalize() and depending on the results will redirect to an error page or call something like ESAPI.encoder().encodeForHTML(). Aside from the fact that this approach often misses tainted input such as request headers or "extra path information" in a URI, the approach completely ignores the fact that the output encoding is completely non-contextual. For example, how does a servlet filter know that an input query parameter is going to be rendered in an HTML context (i.e., between HTML tags) rather than in a JavaScript context such as within a <script> tag or used with a JavaScript event handler attribute? It doesn't. And because JavaScript and HTML encoding are not interchangeable, you leave yourself still open to XSS attacks.

Unless your filter or interceptor has full knowledge of your application and specifically an awareness of how your application uses each parameter for a given request, it can't succeed for all the possible edge cases. And I would contend that it never will be able to using this approach because providing that additional required context is way too complex of a design and accidentally introducing some other vulnerability (possibly one whose impact is far worse than XSS) is almost inevitable if you attempt it.

This naive approach usually has three main problems. One problem is the improper encoding that can still allow exploitable XSS in some URLs. An example might be a 'lastname' form parameter from a POST that normally is displayed between HTML tags so that HTML encoding is sufficient, but there may be an edge case or two where lastname is actually rendered as part of a JavaScript block where the HTML encoding is not sufficient and thus it is vulnerable to XSS attacks.

A second problem with this approach can be the application can result in incorrect or double encoding. E.g., suppose in the previous example, a developer has done proper output encoding for the JavaScript rendering of lastname. But if it is already been HTML output encoded too, when it is rendered, a legitimate last name like "O'Hara" might come out rendered like "O&#39;Hara".

While this second case is not strictly a security problem, if it happens often enough, it can result in business push-back against the use of the filter and thus the business may decide on disabling the filter or a way to specify exceptions for certain pages or parameters being filtered, which in turn will weaken any XSS defense that it was providing.

The third problem with this is that it is not effective against DOM-based XSS. To do that, one would have to have an interceptor or filter scan all the JavaScript content going as part of an HTTP response, try to figure out the tainted output and see if it it is susceptible to DOM-based XSS. That simply is not practical.

One final note: If deploying interceptors / filters as an XSS defense was a useful approach against XSS attacks, don't you think that it would be incorporated into all commercial Web Application Firewalls (WAFs) and be an approach that OWASP recommends in the OWASP Cross-Site Scripting Prevention Cheat Sheet ?