Perishable Press 3G Blacklist and WP Super Cache

I’ve been fol­low­ing the Build­ing the 3G Black­list series on Per­ish­able Press for the last week or two and have been imple­ment­ing each of the rules as they were released. For the most part, there have been no prob­lems. I’ve seen a huge increase in 403 errors (For­bid­den Access) in my logs, which has been good. Judg­ing from my access.log, all of the requests have been bogus.

After the final list came out, I imple­mented any changes to the rules, tested it in my default browser (Safari) and called it good. Sev­eral days later how­ever, I tried to pull up this site on my home PC using Fire­fox and was greeted with a big fat 403. Uh oh. I switched over to IE and got the same results. After some cur­sory check­ing, I switched over to using my laptop and Safari and noticed that there was no prob­lem there. Weird. Even weirder because I’m using Ver­sion DSL with router, so as far as my server is con­cerned, both com­put­ers have the same IP.

Most weird: when I actu­ally checked my access.log, I could see my own requests that had been served 403 errors. But instead of the normal 403, the requests actu­ally showed a single request with a 200 status for each time I tried to load a page.

IP - - [30/May/2008:10:20:02 -0700] "GET /about/ HTTP/1.1" 200 363 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 
IP - - [30/May/2008:10:20:07 -0700] "GET / HTTP/1.1" 200 364 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 

That’s a request for my about page, and then my main page. Both recieved a “Forbiden” in my browser, but both show a status of OK in my log. Addi­tion­ally, if the request had actu­ally been suc­cess­ful, a bunch of other files would have been requested as well.

I quickly decided that it was more impor­tant to get the site up and run­ning again rather than spend a bunch of time trying to figure out what the prob­lem was and how to fix it. After some selec­tive com­ment­ing in my .htaccess file, I dis­cov­ered that the cul­prit rule was the fol­low­ing one from the 3G Blacklist:

RedirectMatch 403 \/\/

I com­mented out the rule for the time-​being so that I could test fur­ther at a later point in time.

This par­tic­u­lar rule redi­rects all requests that con­tain a double slash after the http:// sec­tion. I thought that this was very odd that this rule should break my site because I can’t see any reason why a legit­i­mate request would need to uti­lize a double slash. I also was con­cerned, because judg­ing from my access.log, this is the rule that does the bulk of the work con­cern­ing 403 errors.

I did some more scan­ning of my .htaccess folder and arrived at the con­clu­sion that the cul­prit must be within the rules for the WP Super Cache plugin I had recently installed. This plugin cre­ates a static html page to serve instead of the normal Word­Press PHP pages. Here’s an expla­na­tion from their site:

When a vis­i­tor who is not logged in, or who has not left a com­ment, visits they will be served a static HTML page out of the super­cache sub­di­rec­tory within the Word­Press cache direc­tory. If you nav­i­gate to that direc­tory you can view an exact replica of your perma­link struc­ture as well as the HTML files within the direc­to­ries. To deter­mine if a page has been served out of the Super Cache, view the source and the last line on the page should read or .

Hmm, I’d say we’re get­ting closer now. The sec­tion that WP Super Cache adds to my .htaccess file looks like this:

# WP SUPER CACHE
<IfModule mod_rewrite.c>
AddDefaultCharset UTF-8
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
</IfModule>
# END WPSuperCache

Now, per­haps a mod_rewrite ninja can see imme­di­ately what the prob­lem is, but I was having trou­ble actu­ally fig­ur­ing out what was going on. Since I’m using a shared host, I don’t have access to an httpd.conf and there­for cannot use the RewriteLog direc­tive to actu­ally see what’s going on in the rewrites.

After some research I dis­cov­ered that by adding an R flag to each of WP Super Caches RewriteRule direc­tives, it would force a tem­po­rary redi­rect and there­fore allow me to see in the browser what was actu­ally being requested. I changed each RewriteRule to the following:

RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [R,L]
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [R,L]

Now after run­ning the same tests again, I could see in my browser how things were get­ting screwed up. I typed the address http://blog.nerdstargamer.com into Fire­fox and sure enough, got the 403 error. This time though, when I looked at the URL it showed the fol­low­ing redirect:

http://blog.nerdstargamer.com/wp-content/cache/supercache/blog.nerdstargamer.com//index.html

There’s the cul­prit right there. The double-​slash right before index.html. So, basi­cally, every time WP Super Cache serves a cached page, it’s serv­ing a URL with a double slash before the file name. I did a quick check by delet­ing the cache fold­ers of WP Super Cache and con­firmed that pages not cached loaded fine while cached pages always got redi­rected to a 403 error. Bingo.

So, now to fix the prob­lem. Why on earth the .htaccess code for WP Super Cache does this in the first place, I’m not sure. It seems wrong to me, but I’ll defer to the experts on this one. Basi­cally what’s hap­ping is that the vari­able $1 is being replaced with the path name that was requested which includes a trail­ing slash. The next part of the rewrite starts with a slash, thus the double slash prob­lem.

I was able to fix the con­flict in the WP Super Cache code by remov­ing one of the slashes like so:

# WP SUPER CACHE
<IfModule mod_rewrite.c>
AddDefaultCharset UTF-8
# not post
RewriteCond %{REQUEST_METHOD} !=POST
# not a search
RewriteCond %{QUERY_STRING} !.*s=.*
# not an attachment page
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz [L]

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html [L]
</IfModule>
# END WPSuperCache

This cer­tainly looks funny but at least it works. I’m sure there is a more ele­gant way to do this, like say, rewrit­ing the orig­i­nal request to remove the trail­ing slash and then apply­ing the cache rules. Per­haps this is really a prob­lem with the way Word­Press is doing its perma­links (I’m on 2.5.1 by the way). Who knows? Ninjas chime in.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Comments

1. Jeff Starr

Hi Alissa :) This is cer­tainly one way to resolve the con­flict (and an excel­lent diag­no­sis, btw). Given that you are work­ing with three dif­fer­ent com­po­nents (Word­Press, Super Cache, and the 3G Black­list), I would say that your solu­tion is prac­ti­cal and effec­tive. Get­ting all three com­po­nents to play nicely with the removal of only two char­ac­ters is about as effi­cient as it gets, imo. As you say, a more formal solu­tion would involve remov­ing the trail­ing slash from the perma­links before they are processed by Super Cache, how­ever the result would be essen­tially the same: omis­sion of the dupli­cate trail­ing slash. In either case, you are cor­rect that dupli­cate slashes have no busi­ness in legit­i­mate URLs. Great work! :)

2. Alissa Miller

Thanks Jeff, I appre­ci­ate the insight. Great work on the Per­ish­able Press 3G Black­list also.

3. Louis

Thank you for this fine infor­ma­tion Alissa; very clear expla­na­tion, and as Jeff said — and it sounds a little “Dr. House”-ish — excel­lent diagnosis.

Leave a Reply