Server-Side Validation (Web Database Applications with PHP & MySQL)

7.2.1. Case Study: Customer Validation in the Winestore

In this section, we show how to validate selected winestore customer <form> data, including examples of the validation checks required for mandatory fields, field lengths, and data types. Many functions—including the regular expression and string functions—are discussed in detail in Chapter 2.

Our system requirements in Chapter 1 note the following validation requirements:

A user must provide a surname, first name, one address line, a city, a state, a zip code, a country, a birth date, an email address, and a password.
The user may also optionally provide a middle initial, a title, two additional address lines, a state, a telephone number, and a fax number.

Testing whether mandatory fields have been entered is straightforward, and we have implemented this in our examples in Chapter 6. For example, to test if the user's surname has been entered, use the following approach:

// Validate the Surname
if (empty($formVars["surname"]))
    // the user's surname cannot be a null string
    $errorString .=
        "\n<br>The surname field cannot be blank.";

For optional fields, omit this check.

While it isn't specified in the brief system requirements, it's reasonable to assume that the fields provided by the user should be validated using additional checks. For example, telephone and fax numbers should be numeric and conform to a well-known template. Email addresses should meet the requirements of the RFC-2822 document available from http://www.ietf.org or at least a reasonable approximation; moreover, the domain part of the email address—such as webdatabasebook.com—should be an actual, existing domain. We describe additional validation steps in this section; the complete code for the customer <form> validation is listed in Chapter 10.

7.2.1.1. Validating dates

Dates of birth, expiry dates, order dates, and other dates are commonly entered by users. Most dates require checks to see if the date is valid and also if it's in a required range. In the customer <form>, the user is required to provide a date of birth. We validate this date of birth to check it has been entered, and to check its format, its validity, and whether it's within a range; the range of valid dates in the example begins with the user being alive—we assume alive users are born after 1890—and ends with the user being at least 18 years of age.

Date-of-birth checking is implemented with the following code fragment:

// Validate Date of Birth
if (empty($formVars["dob"]))
      // the user's date of birth cannot be a null string
      $errorString .= "You must supply a date of birth.";

  elseif (!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$",
          $formVars["dob"], $parts))
      // Check the format
      $errorString .=
        "The date of birth is not a valid date in the " .
        "format DD/MM/YYYY";

elseif (!checkdate($parts[2],$parts[1],$parts[3]))
      $errorString .= "The date of birth is invalid. " .
    "Please check that the month is between 1 and 12, " .
    "and the day is valid for that month.";

  elseif (intval($parts[3]) < 1890)
      // Make sure that the user has a reasonable birth year
      $errorString .= 
         "You must be alive to use this service.";

  // Check whether the user is 18 years old.
  // If all the following are NOT true, 
  // then report an error.
elseif 
         // Were they born more than 19 years ago?
       (!((intval($parts[3]) < (intval(date("Y") - 19))) ||

         // No, so were they born exactly 18 years ago, and
         // has the month they were born in passed?
       (intval($parts[3]) == (intval(date("Y")) - 18) &&
       (intval($parts[2]) < intval(date("m")))) ||

         // No, so were they born exactly 18 years ago 
         // in this month, and was the day today or earlier
         // in the month?
       (intval($parts[3]) == (intval(date("Y")) - 18) &&
       (intval($parts[2]) ==  intval(date("m"))) &&
       (intval($parts[1]) <= intval(date("d"))))))

       $errorString .= 
        "You must be 18+ years of age to use this service.";

If any date test fails, an error string is appended to the $errorString, and no further checks of the date are made. A valid date passes all the tests.

The first check tests if a date has been entered. The second check uses a regular expression to check whether the date consists of numbers and if it matches the template DD/MM/YYYY:

(!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$",
$formVars["dob"], $parts))

Whatever the result of this check, the expression also explodes the date into the array $parts so that the component that matches the first grouped expression ([0-9{2}) is found in $parts[1], the second grouped expression in $parts[2], and the third grouped expression in $parts[3]. The ereg( ) function stores the string matching the complete expression in $parts[0]. The overall result of processing a date that matches the template is that the day of the month is accessible as $parts[1], the month as $parts[2], and the year as $parts[3].

The third check uses the exploded data stored in the array $parts and the function checkdate( ) to test if the date is a valid calendar date. For example, the date 31/02/1970 would fail this test.

The fourth check tests if the year is greater than 1890. The function intval( ) converts a string to an integer. A test such as if ($parts[3] < 1890) may not work as desired, because $parts[3] is a string—which can be unreliably converted to an integer, as discussed in Chapter 2—and 1890 is an integer. Both the PHP functions intval( )—to convert strings to integers for comparisons—and strval( )—to convert integers to strings—are useful tools in range-checking <form> fields.

The fifth and final check tests if the user is 18 years of age or older. There are many ways to do this, with perhaps the most obvious being finding the difference between the date of birth and the current date using library functions, and checking that this difference is more than 18 years. The strtotime( ) function converts a date string in the format MM/DD/YYYY to a large numeric Unix timestamp value that represents the number of seconds since January 1, 1970. This can be cast to a float to ensure reliable comparison as discussed in Chapter 2.

However, our approach here to validating if a user is over 18 years of age uses only logic, and the intval( ) and date( ) functions:

// Check whether the user is 18 years old.

// If all the following are NOT true, 
  // then report an error.
  elseif 
         // Were they born more than 19 years ago?
       (!((intval($parts[3]) < (intval(date("Y") - 19))) ||

         // No, so were they born exactly 18 years ago, and
         // has the month they were born in passed?
       (intval($parts[3]) == (intval(date("Y")) - 18) &&
       (intval($parts[2]) < intval(date("m")))) ||

         // No, so were they born exactly 18 years ago 
         // in this month, and was the day today or earlier
         // in the month?
       (intval($parts[3]) == (intval(date("Y")) - 18) &&
       (intval($parts[2]) ==  intval(date("m"))) &&
       (intval($parts[1]) <= intval(date("d"))))))
       $errorString .= 
        "You must be 18+ years of age to use this service.";

First, we check if the user's date of birth is 19 or more years ago; if this is the case, there is no error. Second, we check if the user was born exactly 18 years ago in a month earlier than the current month; if this is the case, again there is no error. Last, we check if the user was born exactly 18 years ago, in the current month, and on a day less than or equal to the current day; yet again, if this is true, there is no error. The parameters to the function date( ) are discussed in Chapter 2.

There are other approaches to checking differences between dates. For example, one approach is to use the MySQL functions described in Chapter 3 through an SQL query. The query need not use a database; that is, SQL can be used as a simple calculator. This approach is perhaps less desirable than the approach we have described, because there is no database activity involved in our example, and database activity adds unnecessary overhead. However, if one or more dates are extracted in the script from a database, MySQL date and time functions are a useful alternative.

7.2.1.2. Validating numeric fields

Checking that values are numeric, are within a range, or have the correct format is another common validation task. For winestore customers, there are three numeric fields: the zip code, and the fax and telephone numbers.

We validate zip codes using a regular expression:

// Validate Zipcode
if (!ereg("^([0-9]{4,5})$", $formVars["zipcode"]))
   $errorString .= 
      "The zipcode must be 4 or 5 digits in length";

This permits a zip code of either four or five digits in length; this works for both U.S. zip codes and Australian postcodes, but it's unsuitable for many other countries. Another common validation check with zip codes is to check that they match the city or state using a database table, but we don't consider this approach here.

The optional phone and fax numbers are also validated using regular expressions:

// Phone is optional, but if it is entered it must have
// correct format
$validPhoneExpr = "^([0-9]{2,3}[ ]?)?[0-9]{4}[ ]?[0-9]{4}$";
                
if (!empty($formVars["phone"]) && 
    !ereg($validPhoneExpr, $formVars["phone"]))
   $errorString .= 
      "The phone number must be 8 digits in length, " .
      "with an optional 2 or 3 digit area code";

The if statement contains two clauses: a check as to whether the field contains data and, if that is true, a check of the contents of the field using ereg( ). As discussed in Chapter 2—as in many other programming languages—the second clause is checked only if the first clause is true when an AND (&&) expression is evaluated. If the variable is empty, the ereg( ) expression isn't evaluated.

The ereg( ) expression works as follows:

The expression ^([0-9]{2,3}[ ]?)? matches either zero or one occurrence of the bracketed expression at the beginning of the value. Inside the brackets, the expression that is matched is either two or three digits and an optional single space character (represented as [ ]?). For example, a string "03 " matches, as does "013 ", "03", and "013".
The rest of the expression [0-9]{4}[ ]?[0-9]{4}$ matches exactly four digits, followed by an optional space, followed by another four digits, and then the end of the string is expected. For example, the strings 1234 1234 and 12341234 both match the expression.
The entire expression matches the following classes of strings: 03 1234 1234, 013 1234 1234, 1234 1234, 0312341234, 01312341234, 03 12341234, 013 12341234, 12341234, 0131234 1234, and 031234 1234.

7.2.1.3. Validating email addresses

Email addresses are another common data entry item that requires field organization checking. There is a standard maintained by the Internet Engineering Task Force (IETF) called RFC-2822 that defines what a valid email address can be, and it's much more complex than might be expected. For example, an address such as the following is valid:

" <test> "@webdatabasebook.com

We use the following complex regular expression and network functions to validate an email address:

$validEmailExpr = 
    "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*" .
    "@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";

if (empty($formVars["email"]))
    // the user's email cannot be a null string
    $errorString .= "You must supply an email address.";

elseif (!eregi($validEmailExpr, $formVars["email"]))
    // The email must match the above regular expression
    $errorString .= 
    "The email address must be in the name@domain format.";

elseif (strlen($formVars["email"]) > 50)
    // The length cannot exceed 50 characters
    $errorString . = 
  "The email address can be no longer than 50 characters.";

elseif (!(getmxrr(substr(strstr($formVars["email"], '@'), 1), $temp)) ||
  checkdnsrr(gethostbyname(substr(strstr($formVars["email"], '@'), 1)),"ANY"))
    // There must be a Domain Name Server (DNS) record 
    // for the domain name
    $errorString .= "The domain does not exist.";

If any email test fails, an error string is appended to the $errorString, and no further checks of the email value are made. A valid email passes all tests.

The first check tests to make sure that an email address has been entered. If it's omitted, an error is generated. It then uses a regular expression to check if the email address matches a template. It isn't RFC-2822-compliant but works reasonably for most email addresses:

It uses eregi( ), so either upper- or lowercase are matched by the use of a-z.
It expects the string to begin with a character from the set 0-9, a-z, and ~!#$%&_-. There has to be at least one character from this set at the beginning of the email address for it to be valid.
After the first character matches, there is an optional bracketed expression:
```
([.]?[0-9a-z~!#$%&_-])*
```
This expression is optional since it's suffixed with the * operator. However, if it does match, it matches any number of the characters specified. There can only be one consecutive full-stop if a full-stop occurs, as determined by the expression [.]?. The expression, for example, matches the string fred.williams.test% but not fred..williams.
After the initial part of the email address, an @ character is expected. The @ has to occur after the first word for the string to be valid; our regular expression rejects email addresses that have only the initial or local component such as fred.
Our validation expects there to be another word of at least length 1 after the @ symbol, and this can be followed by any combination of the permitted characters. Strings of permitted characters can be separated by a single full-stop.
The function is imperfect. It allows several illegal email addresses and doesn't allow many that are legal but unusual.

The third step is to check the length of the email address. If it exceeds 50 characters, an error is generated. The fourth and final step is to check whether the domain of the email address actually exists:

elseif (!(getmxrr(substr(strstr($formVars["email"], '@'), 1), $temp)) ||
  checkdnsrr(gethostbyname(substr(strstr($formVars["email"], '@'), 1)),"ANY"))
    // There must be a Domain Name Server (DNS) record 
    // for the domain name
    $errorString .= "The domain does not exist.";

The function getmxrr( ) queries an Internet domain name server (DNS) to check if there is a record of the email domain as a mail exchanger (MX). If the domain isn't an MX, the domain is checked with the DNS using the checkdnsrr( ) function, after converting the domain name to a numeric IP address with the gethostbyname( ) function. The second parameter to checkdnsrr( ) is the type of records to check, and ANY record is specified valid. If both tests fail, the domain of the email address isn't valid and we reject the email address.

7.2.2. Processing <form> Data on the Server Side

In this section, we discuss the validation peculiarities of the HTML <form> environment and what is actually submitted from a <form> in an HTTP request.

7.2.2.1. Processing <form> controls with the MULTIPLE attribute

Simple <form> elements, such as the <input> element, allow only one value to be associated with them. For example, an <input> element with the name attribute surname may have an associated value of Smith; in a URL query string, this association is represented as surname=Smith. Indeed, all the controls included in <form> examples in previous chapters have only one associated value. However, the <select multiple> element allows the association of more than one value with a variable in a <form>.

The <select multiple> element allows users to select zero or more items from a list. When the selected values are sent through using the GET or POST methods, each selected item has the same variable name but a different value. For example, consider what happens when the user selects options b and c in the following HTML <form>:

<form method="GET" action="click.php"> 
<select multiple name="choice">
<option>a</option>
<option>b</option>
<option>c</option>
<option>d</option>
</select>
<br><input type="submit">
</form>

When the user clicks Submit, the following URL is requested:

http://localhost/click.php?choice=b&choice=c

From a PHP perspective, this means that the variable $choice—which has the same name as that of the <select multiple>—is overwritten as the request is decoded, and an echo $choice prints the last value that was selected. In this case, echo $choice outputs c.

There are at least two solutions to this problem in PHP. First, it's possible to add more complex processing of the two automatically initialized arrays, HTTP_GET_VARS or HTTP_POST_VARS, to detect duplicate variable names and handle these for generic processing. Second, more elegantly and simply, you can use a PHP feature, which is described next.

The second approach works as follows. You modify the <form> and replace the name of the <select multiple> with an array-like structure, name="choice[]". The PHP interpreter then treats the variable as an array and stores the multiple values into $choice[0], $choice[1], etc. In the previous example, the <select multiple> element is renamed as choice[]:

<html><form method="GET"> 
<select multiple name="choice[]">
<option>a</option>
<option>b</option>
<option>c</option>
<option>d</option>
</select>
<br><input type="submit">
</select></form></html>

If the user selects options b and c, the following PHP fragment prints out all selected values, in this case both b and c:

foreach($choice as $value)
  echo $value;

The bracket array notation in a <form> can cause some problems with client-side scripts—such as those written in JavaScript—and such <form> elements should be referenced wrapped in single quotes in a JavaScript script. Client-side JavaScript for validation is discussed later in this chapter.

Interestingly, <textarea> and <input> elements can also be suffixed with brackets to put values into an array, should the need arise.

7.2.2.2. Other <form> issues

Checkbox elements in a <form> have the following format:

<form method="GET" action="click.php">
<input type="checkbox" name="check">
<input type="submit">
</form>

A checkbox has two states, on and off, and is usually rendered as a small clickable square in a graphical web browser. If the checkbox in the example is clicked, and the <form> submitted, the following URL is requested:

http://localhost/click.php?check=on

However, if the checkbox isn't clicked, the URL requested is as follows:

http://localhost/click.php

The important difference is that a checkbox is never off from the server perspective. If the checkbox isn't clicked, no variable or value is submitted to the server. Therefore, in a PHP script, a checkbox should be tested with the following fragment:

if ($check == "on")
   echo "Checkbox is on";
else
   echo "Checkbox is off";

Additionally, in the previous example, if the checkbox isn't clicked, it isn't possible to determine whether the <form> has been submitted or has never been displayed. An easy solution is to add a name attribute to the submit <input> element as follows:

<form method="GET" action="click.php">
<input type="checkbox" name="check">
<input type="submit" name="submit" value="Submit Query">
</form>

If this <form> is submitted with the checkbox in the off state, the following URL is requested:

http://localhost/click.php?submit=Submit+Query

Testing whether the variable $submit is empty( ) can then distinguish between the initial display of the <form> and a subsequent submission of the <form> with the checkbox in the off state. The following script skeleton performs this check:

if (!empty(submit))
  // carry out processing
else
  // display the <form>

In addition, the naming of submit <input> elements permits more than one submit button to be added to a <form>. This allows two or more different types of submission that may have different validation or other behavior. For example, both Save and Cancel buttons may be present in the <form> as two different types of submission process. We use this approach in the winestore and discuss it further in Chapter 11.

Multiple <select> elements have the same property as checkboxes; if no item in the list is selected, no variable or value is submitted to the server.