Posted by Chris Roos Mon, 14 Jan 2008 10:14:00
I wasted a few hours of my life, over the weekend, playing with mod_rewrite. I had, what I thought to be, a fairly simple problem to solve: I wanted requests for index.html within any subdirectory of my site to be redirected to the root of that subdirectory. So, a request for example.com/subdir/index.html would be (301) redirected to example.com/subdir/.
I thought I could do this without using RewriteCond. I tried something like this.
RewriteRule ^(.*)index\.html$ $1 [R=301]I would expect this to match requests for URLs that end with index.html and redirect them to the URL minus index.html. Although this correctly redirected /subdir/index.html to /subdir/ it would also attempt to redirect /subdir/ to /subdir/ resulting in an infinite loop. It took me a very long time to work out, what I believe, is going on. If I understand correctly, apache will send every request (i.e. not just the request you issue from the browser) through mod_rewrite. When we request /subdir/, apache will try, and fail, to match that URL with our regex (/subdir/ doesn’t end with index.html). Internally, apache then ‘requests’ index.html (since index.html is the DirectoryIndex). This request matches our regex which results in apache sending the 301 redirect and us ending up with an infinite loop. If I mis-understand what’s going on please feel free to correct me.
Based on this assumption, I figured I’d need to use RewriteCond to somehow ignore the internal request for index.html. I initially added a condition that checked for the REQUEST_URI ending in index.html (RewriteCond %{REQUEST_URI} index\.html$). This suffered the same problem as my initial attempt. A little more reading led me to the IS_SUBREQ special variable. I thought this might allow me to ignore the ‘internal request’ for index.html (RewriteCond %{IS_SUBREQ} false) but I couldn’t get it to have any effect.
In searching for answers to this problem I kept seeing THE_REQUEST special variable used in the solution, although never an explanation as to why it’s used (specifically in preference to REQUEST_URI). I ended up using THE_REQUEST in my solution (not technically my solution as I stole it from a site that I failed to bookmark) and it seems to work OK.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ $1 [R=301,L]I’m guessing that this is because THE_REQUEST contains the full details (the stuff that appears in the log file) of the request from the client (browser) and will therefore be empty for any ‘internal requests’. Anyone know if my assumptions are correct?
Matt B said on Wed, 16 Jan 2008 17:55:30:
The thing about mod rewrite is that it is voodoo, damn cool voodoo but voodoo nevertheless (some one more impressive than me said that ages ago but I’m to lazy to look up the actual quote).
I had a similar situation when I moved a wiki from a sub directory to a sub domain. The sub domain required that the folder of the SD be in the ht root. Common cPanel hackishness. So all requests for lordmatt.co.uk/wiki and wiki.lordmatt.co.uk were exactly the same.
I came up with:
RewriteCond %{HTTP_HOST} !^wiki\.lordmatt\.co\.uk(.*)$ [NC] RewriteCond %{REQUEST_URI} ^/wiki/ [NC] RewriteRule ^wiki(.*)$ http://wiki.lordmatt.co.uk$1 [R=301,NC,L]The problem I had was similar in that if I redirected for the “wrong” urls then the internal requests looped forever. The result is http code 500 – internal server error.
As you can gather there is more than one way to get something to work. As with all voodoo that it works is important and a congratulations is awarded to anyone able to brew up a solution to a problem.