CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic

Description
This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.
Extended Description

A URL may contain special character that need special syntax handling in order to be interpreted. Special characters are represented using a percentage character followed by two digits representing the octet code of the original character (%HEX-CODE).

For instance US-ASCII space character would be represented with %20. This is often referred as escaped ending or percent-encoding. Since the server decodes the URL from the requests, it may restrict the access to some URL paths by validating and filtering out the URL requests it received. An adversary will try to craft an URL with a sequence of special characters which once interpreted by the server will be equivalent to a forbidden URL.

It can be difficult to protect against this attack since the URL can contain other format of encoding such as UTF-8 encoding, Unicode-encoding, etc. The adversary could also subvert the meaning of the URL string request by encoding the data being sent to the server through a GET request. For instance an adversary may subvert the meaning of parameters used in a SQL request and sent through the URL string (See Example section).

Severity :

High

Possibility :

High

Type :

Detailed
Prerequisites

This table shows the other attack patterns and high level categories that are related to this attack pattern.

  • The application's UTF-8 decoder accepts and interprets illegal UTF-8 characters or non-shortest format of UTF-8 encoding.
  • Input filtering and validating is not done properly leaving the door open to harmful characters for the target host.
Skills required

This table shows the other attack patterns and high level categories that are related to this attack pattern.

  • Low An attacker can inject different representation of a filtered character in UTF-8 format.
  • Medium An attacker may craft subtle encoding of input data by using the knowledge that they have gathered about the target host.
Taxonomy mappings

Mappings to ATT&CK, OWASP and other frameworks.

Visit http://capec.mitre.org/ for more details.