CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic

Attack Pattern Details

Description

This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.

Extended Description

A URL may contain special character that need special syntax handling in order to be interpreted. Special characters are represented using a percentage character followed by two digits representing the octet code of the original character (%HEX-CODE).

For instance US-ASCII space character would be represented with %20. This is often referred as escaped ending or percent-encoding. Since the server decodes the URL from the requests, it may restrict the access to some URL paths by validating and filtering out the URL requests it received. An adversary will try to craft an URL with a sequence of special characters which once interpreted by the server will be equivalent to a forbidden URL.

It can be difficult to protect against this attack since the URL can contain other format of encoding such as UTF-8 encoding, Unicode-encoding, etc. The adversary could also subvert the meaning of the URL string request by encoding the data being sent to the server through a GET request. For instance an adversary may subvert the meaning of parameters used in a SQL request and sent through the URL string (See Example section).

Severity :

High

Possibility :

High

Type :

Detailed

Relationships with other CAPECs

This table shows the other attack patterns and high level categories that are related to this attack pattern.

CAPEC-64: Using Slashes and URL Encoding Combined to Bypass Validation Logic Using Slashes and URL Encoding Combined to Bypass Validation Logic CAPEC-71: Using Unicode Encoding to Bypass Validation Logic Using Unicode Encoding to Bypass Validation Logic CAPEC-267: Leverage Alternate Encoding Leverage Alternate Encoding

Prerequisites

This table shows the other attack patterns and high level categories that are related to this attack pattern.

The application's UTF-8 decoder accepts and interprets illegal UTF-8 characters or non-shortest format of UTF-8 encoding.
Input filtering and validating is not done properly leaving the door open to harmful characters for the target host.

Skills required

This table shows the other attack patterns and high level categories that are related to this attack pattern.

Low An attacker can inject different representation of a filtered character in UTF-8 format.
Medium An attacker may craft subtle encoding of input data by using the knowledge that they have gathered about the target host.

Taxonomy mappings

Mappings to ATT&CK, OWASP and other frameworks.

Related CWE

A Related Weakness relationship associates a weakness with this attack pattern. Each association implies a weakness that must exist for a given attack to be successful.

CWE-20: Improper Input Validation

CWE-73: External Control of File Name or Path

CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')

CWE-172: Encoding Error

CWE-173: Improper Handling of Alternate Encoding

CWE-180: Incorrect Behavior Order: Validate Before Canonicalize

CWE-181: Incorrect Behavior Order: Validate Before Filter

CWE-692: Incomplete Denylist to Cross-Site Scripting

CWE-697: Incorrect Comparison

Visit https://capec.mitre.org/ for more details.

Browse by Apps