Hosting Corner / Understanding Rewrite Rules: Part I
Jeff Dunn is President of Pulse Web Ventures Inc, parent company of XXXstorage.com. This article is a snippet of his seemingly endless knowledge about Web sites, hosting, and Unix, that he passes to his clients on a daily basis.A Rewrite rule has the power to automatically funnel traffic from one Web page to another – all without the surfer’s knowledge. In this first article from a three part series on Rewrite rules, Jeff from xxxStorage.com explains how you can reduce hot-linking and bandwidth theft by adding a few simple lines to the appropriate text files on your Web server.
A successful adult Webmaster is always aware of the bandwidth his or her site uses and tries to reduce it any way possible to cut down on the amount of money paid to the host. One cause for using more bandwidth than necessary is hot-linking and other forms of bandwidth stealing, often perpetrated by other Webmasters, either evil or ignorant. Using “Rewrite” rules is the most common tip given to adult Webmasters for reducing hot-linking.
Note: Many novice Web users will just say “use .htaccess.” Although, you may use an .htaccess file to contain your rewrite rule, it is not the .htaccess file itself that helps block hot-linking… so that terminology is in error. It is the contents of the .htaccess file (which can be many different things), which in this case is the Rewrite rules, that perform the operation. In this tutorial I will not be referring to .htaccess files in general, as their usage is out of the scope of this article. But, to be short: All the examples in this tutorial can be placed into a file titled ‘.htaccess’ and uploaded to the directory where you want the Rewrite rules to be performed.
Let’s get right into it. These rewrite statements will block access to your files from any domain but your own. This is useful in situations where you want a certain directory to only be accessed directly by your own site. Note: I’m using this example because the syntax seems to be floating around the adult Webmaster community. It is overly precise and can be simplified, which I will do later.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://yourdomain.com.*$ [NC]
RewriteRule /* http://www.yourdomain.com/console.html [R,L]
This is about as simple as Rewrite gets. Rewrite is a mechanism for changing the URL in the surfer’s browser. If the surfer requests a URL that is wrong (or that you don’t want him to have), the Rewrite rule will change the URL for him. This code looks at the referring URL of the surfer, analyzes it, and then decides whether to leave the URL that the surfer has requested or change it. Let’s break this down:
RewriteEngine on
If this statement baffles you, don’t proceed! It just says, “Hey, let’s Rewrite some URL’s.”
RewriteRule /* http://www.yourdomain.com/console.html [R,L]
I’m explaining this one first because this is where the magic is done. The RewriteEngine statement and this statement are all you need to Rewrite some URL’s. ‘RewriteRule’ tells the server to look at the URL and adjust it accordingly. It accepts two arguments and then optionally some flags:
RewriteRule <test url> <new url> [flags]
The first argument test <url> is an expression where the URL the surfer is requesting is analyzed. If it matches this argument, then the surfer is rewritten to the <new url>. In our example, the test URL is ‘/*’. This matches any URL so we are saying, “Rewrite all URL’s to http://www.yourdomain.com/console.html” So why does ‘/*’ mean all URL’s? The ‘*’ does not mean what you might think. In Windows it is a wildcard, but in RewriteRule it means “match 0 or more of the proceeding characters.” So in this case, the surfer’s URL matches as long as it contains “0 or more” or ‘/’ which effectively matches everything.
The <new url> can be any full valid URL or relative URL or file. For simplicity’s sake, always write your URL out completely. I will delve into this a bit more in my advanced Rewriting article.
The flags at the end of the RewriteRule specify certain actions to be taken. Our example has these flags [R,L]. No, this does not mean right and left. The ‘R’ tells the server to “redirect” and specifies the code 302 (MOVED TEMPORARILY). However, the server would know to do this anyway, so having the R flag is not needed. There are more usages to the R flag which I will go into in the advanced article. The L flag tells the server “that is the Last rule.” In cases where you might have more Rewrite rules below, the server will drop out immediately at this point. I always recommend using the ‘L’ flag for novice adult Webmasters.
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com.*$ [NC]
So, you’ve seen the RewriteRule command and you know that it will Rewrite URL’s. That is useful, but what about situations where you want to rewrite some URL’s but not others, and you want to base the decision on properties of the surfer? The RewriteCond statement is used to say, “If this statement is true, follow the RewriteRule listed below.” The RewriteCond statement contains this format:
RewriteCond %{<test variable>} <test pattern> [flags]
The first argument is a variable that describes a property of the client or the server. This can be anything from cookies to time of day. Most adult Webmasters will only use the referrer variable, so I’ll stick to that for this article. The variable name of the referring Web site is always named “HTTP_REFERER.”
The test pattern is an expression that the statement looks for in the test variable. In the example above, we are looking for “!^http://www.yourdomain.com.*$” in “HTTP_REFERER.”
!^http://www.yourdomain.com.*$
Look like ancient hydrographic to you? It did to me too the first time I looked at it, but it is easy to break down. First, let’s consider the obvious, we are looking for occurrences of http://www.yourdomain.com. The ‘^’ before the URL means that must occur at the beginning. For example, if the HTTP_REFERER started with anything other than http://www.yourdomain.com (or had anything in front of it), it would NOT match. The ‘$’ means the opposite — the end of the variable. ‘.*’ is a little trickier. Remember what ‘*’ means? OK. But, we are not looking for 0 or more occurrences of ‘.’ since a period has a different meaning… it means “anything.” So, we are looking for 0 or more occurrences of anything. It is essentially a wildcard.
So far, we have:
^http://www.yourdomain.com.*$
It means: look for “http://www.yourdomain.com” at the beginning of the variable and anything following it until reaching the end of the variable. If it matches it, the statement is “true” and therefore should execute the RewriteRule.
I skipped over the ‘!’ on purpose. It basically says, “look for conditions where the following is NOT matched.” If ‘!’ wasn’t there, we would be looking for conditions that DO match.
So, this statement says in full: “If http://www.yourdomain.com and anything after it until reaching the end of the variable DOES NOT match the referring URL, then execute the RewriteRule statement.”
The effects of this should be quite obvious: If the referring URL is not from yourdomain.com, then rewrite the URL to something else.
The [NC] flag means “no case” or “ignore case.” This tells the server not to care if the URL is YOURDOMAIN.COM or yourdomain.com. This is important to have in all Rewrite conditions, as Unix servers don’t see a capitol letter the same as a lower-case letter. WARNING: The Rewrite statements are case sensitive as well. “RewriteCond” is correct, “rewritecond” is not.
RewriteCond %{HTTP_REFERER} !^http://yourdomain.com.*$ [NC]
This statement is the same as the statement above, except that it omits the “www” in case someone is trying to access your site without it.
These two RewriteCond statements are not really necessary to accomplish this task. As I said earlier, I just listed it because that code seems to be floating around adult Webmaster sites. We can do away with the ‘^’ and the ‘$’ and the ‘http://www’ and combine the RewriteCond statements into one:
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [NC]
Can you read this? It says “Look for yourdomain.com in the referring URL no matter what is before or after it, and if it DOES NOT match, execute the RewriteRule statement. Now, the full example above has been reduced to (which does the same thing and is more understandable):
RewriteEngine on
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [NC]
RewriteRule /* http://www.yourdomain.com/console.html [L]
Most Webmasters use this Rewrite code for their image or movie galleries. It is placed in the directory with your images and will keep hot-linking at bay. If you are posting your galleries to gallery post sites, you can add a RewriteCond statement with their domain to your Rewrite code. For example:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [NC]
RewriteCond %{HTTP_REFERER} !.*al4a.com.* [NC]
RewriteCond %{HTTP_REFERER} !.*thehun.net.* [NC]
RewriteRule /* http://www.yourdomain.com/console.html [L]
NOTE: Unfortunately, you can’t usually do this with gallery post sites as then their spider will be unable to access your site. If their spider can’t get your gallery, you probably won’t get listed. A more likely use for adding multiple domains is if you have multiple sites yourself.
WARNINGS:
This rewrite code is NOT file dependant. It will work on all files, images, html, scripts, etc.
Never put this code in your top-level directory, as it will block every hit except those in the list.
Never use the RewriteRule statement to send the surfer to a URL that is covered by the same RewriteRule. You will end up in a loop as the surfer is continuously looped around, and you will load down the server, slowing your site down and upsetting your host.
Click here to comment on this article or to read what others have said.
Jeff Dunn is President of Pulse Web Ventures Inc, parent company of XXXstorage.com. This article is a snippet of his seemingly endless knowledge about Web sites, hosting, and Unix, that he passes to his clients on a daily basis.