Testing Your Server Setup

Started by sukishan, Aug 22, 2009, 06:52 PM

Previous topic - Next topic

sukishan

Testing Your Server Setup
Some hosts do not have mod_rewrite enabled (by default it is not enabled). You can find out if your server has mod_rewrite enabled by creating a PHP script with one simple line of PHP code:

phpinfo();

If you load the script with a browser, look in the Apache Modules section. If mod_rewrite isn't listed there, you'll have to ask your host to enable it -- or find a "good host". Most hosts will have it enabled, so you'll be good to go.

The Magic of mod_rewrite
Here's a simple example for you: create three text files named test.html, test.php, and .htaccess.

In the test.html file, enter the following:

<h1>This is the HTML file.</h1>

In the test.php file, add this:

<h1>This is the PHP file.</h1>

Create the third file, .htaccess, with the following:

RewriteEngine on
RewriteRule ^/?test\.html$ test.php [L]

Upload all three files (in ASCII mode) to a directory on your server, and type:

http://www.example.com/path/to/test.html

into the location box -- using your own domain and directory path of course! If the page shows "This is the PHP file", it's working properly! If it shows "This is the HTML file," something's gone wrong.

If your test worked, you'll notice that the test.html URI has remained in the browser's location box, yet we've seen the contents of the test.php file. You've just witnessed the magic of mod_rewrite!

mod-rewrite Regular Expressions
Now we can begin rewriting your URIs! Let's imagine we have a web site that displays city information. The city is selected via the URI like this:

http://www.example.com/display.php?country=USA&state=California&city=San_Diego

Our problem is that this is way too long an unfriendly to users. We'd much prefer it if visitors could use:

http://www.example.com/USA/California/San_Diego

We need to be able to tell Apache to rewrite the latter URI into the former. In order for the display.php script to read and parse the query string, we'll need to use regular expressions to tell mod_rewrite how to match the two URIs. If you're not familiar with regular expressions (regex), many sites provide excellent tutorials. At the end of this article, I've listed the best pages I've found on the topic. If you're not able to follow my explanations, I recommend reviewing the first two of those links.

A very common approach is to use the expression (.*). This expression combines two metacharacters: the dot character, which means ANY character, and the asterisk character, which specifies zero or more of the preceding character. Thus, (.*) matches everything in the {REQUEST_URI} string. {REQUEST_URI} is that part of the URI which follows the domain up to but not including the ? character of a query string, and is the only Apache variable that a rewrite rule attempts to match.

Wrapping the expression in brackets stores it in an "atom," which is a variable that allows the matched characters to be reused within the rule. Thus, the expression above would store USA/California/San_Diego in the atom. To solve our problem, we'd need three of these atoms, separated by the subdirectory slashes (/), so the regex would become:

(.*)/(.*)/(.*)

Given the above expression, the regex engine will match (and save) three values separated by two slashes anywhere in the {REQUEST_URI} string. To solve our specific problem, though, we'll need to restrict this somewhat -- after all, the first and last atoms above could match anything!

To begin with, we can add the start and end anchor characters. The ^ character matches matching characters at the start of a string, and the $ character matches characters at the end of a string.

^(.*)/(.*)/(.*)$

This expression specifies that the whole string must be matched by our regex; there cannot be anything else before or after it.

However, this approach still allows too many matches. We're storing our matches as atoms, and will be passing them to a query string, so we have to be able to trust what we match. Matching anything with (.*) is too much of a potential security hazard, and, when used inappropriately, could even cause mod_rewrite to get stuck in a loop!

To avoid unnecessary problems, let's change the atoms to specify precisely the characters that we will allow. Because the atoms represent location names, we should limit the matched characters to upper and lowercase letters from A to Z, and because we use it to represent spaces in the name, the underscore character (_) should also be allowed. We specify a set using square brackets, and a range using the - character. So the set of allowed characters is written as [a-zA-Z_]. And because we want to avoid matching blank names, we add the + metacharacter, which specifies a match only on one or more of the preceding character. Thus, our regex is now:

^([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$

The {REQUEST_URI} string starts with a / character. Apache changed regex engines when it changed versions, so Apache version 1 requires the leading slash while Apache 2 forbids it! We can satisfy both versions by making the leading slash optional with the expression ^/? (? is the metacharacter for zero or one of the preceding character). So now we have:

^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$

With regex in hand, we can now map the atoms to the query string:

display.php?country=$1&state=$2&city=$3

$1 is the first (country) atom, $2 is the second (state) atom and $3 is the third (city) atom. Note that there can only be nine atoms created, in the order in which the opening brackets appear -- $1 ... $9 in a regular expression.

We're almost there! Create a new .htaccess file with the text:

RewriteRule ^/?([a-zA-Z_]+)/([a-zA-Z_]+)/([a-zA-Z_]+)$ display.php?country=$1&state=$2&city=$3 [L]

Save this to the directory in which display.php resides. The rewrite rule must go on one line with one space between the RewriteRule statement, the regex, and the redirection (and before any optional flags). We've used the [L], or 'last' flag, which is the terminating flag (more on flags later).

Our rewrite rule is now complete! The atom values are being extracted from the request string and added to the query string of our rewritten URI. The display.php script will likely extract these values from the query string and use them in a database query or something similar.

If, however, you have only a short list of allowable countries, it might be best to avoid potential database problems by specifying the acceptable values within the regex. Here's an example:

^/?(USA|Canada|Mexico)/([a-zA-Z_]+)/([a-zA-Z_]+)$

If you're concerned about capitalization because the values in your database are strictly lowercase, you can make the regex engine ignore the case by adding the No Case flag, [NC], after the rewritten URI. Just don't forget to convert the values to lowercase in your script after you obtain the $_GET array.

If you want to use numbers (0, 1, ... 9) for, say, Congressional Districts, then you'll need to change an atom's specification from ([a-zA-Z_]+) to ([0-9]) to match a single digit, ([0-9]{1,2}) to match one or two digits (0 through 99), or ([0-9]+) for one or more digits, which is useful for database IDs.
A good beginning makes a good ending

dhoni

this server setup is easy to gather from this site
this should be able to get information in server setup