Since I didn’t really delve into how to actually use mod_rewrite to do something useful in my last post I will now.
The main reason I got mod_rewrite going is to improve my search ranking for a few sites so I’ll show how to do this with a dynamic php site.
What I primarily aimed at was to get $_GET variables to look like directories.
The .htaccess file from the previous post was
1. RewriteEngine on
2. RewriteRule ^/?test\.html$ test.php [L]
First line obviously turns on the .htaccess Rewrite Engine. .htaccess files can be used for other stuff. Check here for an example of a different use of a .htaccess file.
The second line is rewriting test.html to test.php. The text before the space represents the string to search and replace using regular expressions. The text after the space specifies what to replace the aforementioned string with.
This rule is really simple. The special characters that denote the regular expression are:
‘^’ This symbol, the caret, signifies the start of the URL.
This is under the current directory. Think of it like the ‘~’ character on the command line.
If your site is http://www.quick-content.com then
the ‘^’ in these regular expressions are the equivalent of the that URL
(as long as the htaccess file is in the root directory for that site).
‘$’ This symbol, the dollar sign, signifies the end of the URL.
“\.’ This is just a period. There is nothing special about this because it is ‘escaped’ by the slash.
For the search this will look just tell apache to treat it like a normal period.
This is necessary because the period has a special meaning and in this case we just
want to look for a period and not any character (which is the ‘.’s normal meaning).
It’s great and all to redirect from one file to another using mod_rewrite and apache but it’s not that helpful for SEO.
Here is another search and replace rewrite rule:
RewriteRule ^posts/([^/\.]+)\.html$ single_post.php?post_name=$1 [L]
This rule isn’t that simple.
First we want to match all urls that start with ‘posts/’.
Then we want to capture the characters that come after ‘post/’ but before ‘.html’ and use them as the get variable for the single_post php script.
This will rewrite pages like :
The ‘()’ brackets tell apache to take whatever is inside of them and put it in a temporary location that can be acessed by the replacement string.
In this case the string ‘somepost’ is stored as $1. If you had multiple parenthesis then the $2,$3,.. would be used also.
The square brackets signify an expression of a sort. [0-9] will match any digit between 0 and 9. [^0-9] will match any character that is not between 0 and 9.
So [^/\.]+ matches one or more characters that are not a slash or a dot.
After that the ‘\.html$’ searches for the .html file extension so that url will look like a basic old html file.
A general overview of the structure of a mod_rewrite RewriteRule:
RewriteRule Pattern Substitution [OptionalFlags]
Fairly simple right? RewriteRules are dissected as follows:
This is just the name of the command for apache.
This is a regular expression which will be applied to the current URL.
Substitution occurs in the same way as it does in Perl or PHP.
You can include backreferences and server variable names in the substitution. Backreferences to this RewriteRule should be written as $N, whereas backreferences to the previous RewriteCond should be written as %N.
A special substitution is -. This substitution tells Apache to not perform any substitution.
Any flags should be surrounded in square brackets and comma separated. The most useful are:
F – Forbidden. The user will receive a 403 error.
L – Last Rule. No more rules will be proccessed if this one was successful.
R[=code] – Redirect. The user’s web browser will be visibly redirected to the substituted URL.
If you use this flag, you must prefix the substitution with http://www.site.com/
There are more flags but I haven’t had a use for them yet.
I did up a few other more complex rewrites for one of my dynamic sites.
RewriteRule ^([0-9]+)/([0-9]+)/?$ /index.php?num_posts=$1&start=$2 [L]
This one is pretty straight-forward if you understood the previous example.
It is looking for two numerical strings separated by slashes. It then takes those two values and places them as the
$_GET[‘num_posts’] and $_GET[‘start’] variables. The ‘/?’ at the end allows for a possible slash.
[0-9]+ means: match one or more characters that are between 0 and 9.
The main problem that I ran into was that after a rewrite all of the images and includes would not work because I’m lazy and use relative paths.
So when I type a url like www.quick-content.com/10/11023/ I don’t get any images or style sheets because the browser is trying to find images
at www.quick-content.com/10/images/. I had to write two rules to make this work properly:
RewriteRule ^.*/?.+/images/(.+)$ /images/$1 [L]
RewriteRule ^.*/?.+/includes/(.+)$ /includes/$1 [L]
These rules check for one or more directories before the image or includes directory and if they exist then it rewrites the url to either /images or /includes.
The ‘.*/?’ means 0 or more characters before an optional slash (the ‘?’ following the slash means that it is optional).
‘.+/images/’ and ‘.+/includes/ tries to match one or more characters before the image or includes directory.
The ‘.’ means any character. So ‘.*’ means 0 or more any characters which is essentially any string.
Hopefully that wasn’t too confusing. There are a lot of great resources out there for learning regular expressions if you still don’t get it.