CWE-20: Improper Input Validation


The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

Submission Date :

July 19, 2006, midnight

Modification Date :

2023-10-26 00:00:00+00:00

Organization :

Extended Description

Input validation is a frequently-used technique for checking potentially dangerous inputs in order to ensure that the inputs are safe for processing within the code, or when communicating with other components. When software does not validate input properly, an attacker is able to craft the input in a form that is not expected by the rest of the application. This will lead to parts of the system receiving unintended input, which may result in altered control flow, arbitrary control of a resource, or arbitrary code execution.

Input validation is not the only technique for processing input, however. Other techniques attempt to transform potentially-dangerous input into something safe, such as filtering (CWE-790) - which attempts to remove dangerous inputs - or encoding/escaping (CWE-116), which attempts to ensure that the input is not misinterpreted when it is included in output to another component. Other techniques exist as well (see CWE-138 for more examples.)

Input validation can be applied to:

  • raw data - strings, numbers, parameters, file contents, etc.
  • metadata - information about the raw data, such as headers or size

    Data can be simple or structured. Structured data can be composed of many nested layers, composed of combinations of metadata and raw data, with other simple or structured data.

    Many properties of raw data or metadata may need to be validated upon entry into the code, such as:

    • specified quantities such as size, length, frequency, price, rate, number of operations, time, etc.
    • implied or derived quantities, such as the actual size of a file instead of a specified size
    • indexes, offsets, or positions into more complex data structures
    • symbolic keys or other elements into hash tables, associative arrays, etc.
    • well-formedness, i.e. syntactic correctness - compliance with expected syntax
    • lexical token correctness - compliance with rules for what is treated as a token
    • specified or derived type - the actual type of the input (or what the input appears to be)
    • consistency - between individual data elements, between raw data and metadata, between references, etc.
    • conformance to domain-specific rules, e.g. business logic
    • equivalence - ensuring that equivalent inputs are treated the same
    • authenticity, ownership, or other attestations about the input, e.g. a cryptographic signature to prove the source of the data

      Implied or derived properties of data must often be calculated or inferred by the code itself. Errors in deriving properties may be considered a contributing factor to improper input validation.

      Note that "input validation" has very different meanings to different people, or within different classification schemes. Caution must be used when referencing this CWE entry or mapping to it. For example, some weaknesses might involve inadvertently giving control to an attacker over an input when they should not be able to provide an input at all, but sometimes this is referred to as input validation.

      Finally, it is important to emphasize that the distinctions between input validation and output escaping are often blurred, and developers must be careful to understand the difference, including how input validation is not always sufficient to prevent vulnerabilities, especially when less stringent data types must be supported, such as free-form text. Consider a SQL injection scenario in which a person's last name is inserted into a query. The name "O'Reilly" would likely pass the validation step since it is a common last name in the English language. However, this valid name cannot be directly inserted into the database because it contains the "'" apostrophe character, which would need to be escaped or otherwise transformed. In this case, removing the apostrophe might reduce the risk of SQL injection, but it would produce incorrect behavior because the wrong name would be recorded.

Example Vulnerable Codes

Example - 1

This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.

...public static final double price = 20.00;int quantity = currentUser.getAttribute("quantity");double total = price * quantity;chargeUser(total);...

The user has no control over the price variable, however the code does not prevent a negative value from being specified for quantity. If an attacker were to provide a negative value, then the user would have their account credited instead of debited.

Example - 2

This example asks the user for a height and width of an m X n game board with a maximum dimension of 100 squares.

// /* board dimensions */// 
die("No integer passed: Die evil hacker!\n");
die("No integer passed: Die evil hacker!\n");
die("Value too large: Die evil hacker!\n");
...#define MAX_DIM m,n, error;board_square_t *board;printf("Please specify the board height: \n");error = scanf("%d", &m);if ( EOF == error ){}printf("Please specify the board width: \n");error = scanf("%d", &n);if ( EOF == error ){}if ( m > MAX_DIM || n > MAX_DIM ) {}board = (board_square_t*) malloc( m * n * sizeof(board_square_t));...

While this code checks to make sure the user cannot specify large, positive integers and consume too much memory, it does not check for negative values supplied by the user. As a result, an attacker can perform a resource consumption (CWE-400) attack against this program by specifying two, large negative values that will not overflow, resulting in a very large memory allocation (CWE-789) and possibly a system crash. Alternatively, an attacker can provide very large negative values which will cause an integer overflow (CWE-190) and unexpected behavior will follow depending on how the values are treated in the remainder of the program.

Example - 3

The following example shows a PHP application in which the programmer attempts to display a user's birthday and homepage.

$birthday = $_GET['birthday'];$homepage = $_GET['homepage'];echo "Birthday: $birthday<br>Homepage: <a href=$homepage>click here</a>"

The programmer intended for $birthday to be in a date format and $homepage to be a valid URL. However, since the values are derived from an HTTP request, if an attacker can trick a victim into clicking a crafted URL with ©

Latest DB Update: Jul. 16, 2024 9:52
Theme Customizer

Choose your layout

Two Column
Color Scheme

Choose Light or Dark Scheme.

Layout Width

Choose Fluid or Boxed layout.

Layout Position

Choose Fixed or Scrollable Layout Position.

Topbar Color

Choose Light or Dark Topbar Color.


Choose a preloader.