Perishable Press 3G Blacklist and WP Super Cache

I’ve been following the Building the 3G Blacklist series on Perishable Press for the last week or two and have been implementing each of the rules as they were released. For the most part, there have been no problems. I’ve seen a huge increase in 403 errors (Forbidden Access) in my logs, which has been good. Judging from my access.log, all of the requests have been bogus.

After the final list came out, I implemented any changes to the rules, tested it in my default browser (Safari) and called it good. Several days later however, I tried to pull up this site on my home PC using Firefox and was greeted with a big fat 403. Uh oh. I switched over to IE and got the same results. After some cursory checking, I switched over to using my laptop and Safari and noticed that there was no problem there. Weird. Even weirder because I’m using Version DSL with router, so as far as my server is concerned, both computers have the same IP.

Most weird: when I actually checked my access.log, I could see my own requests that had been served 403 errors. But instead of the normal 403, the requests actually showed a single request with a 200 status for each time I tried to load a page.

IP - - [30/May/2008:10:20:02 -0700] "GET /about/ HTTP/1.1" 200 363 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 
IP - - [30/May/2008:10:20:07 -0700] "GET / HTTP/1.1" 200 364 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 

That’s a request for my about page, and then my main page. Both recieved a “Forbiden” in my browser, but both show a status of OK in my log. Additionally, if the request had actually been successful, a bunch of other files would have been requested as well.

I quickly decided that it was more important to get the site up and running again rather than spend a bunch of time trying to figure out what the problem was and how to fix it. After some selective commenting in my .htaccess file, I discovered that the culprit rule was the following one from the 3G Blacklist:

RedirectMatch 403 \/\/

I commented out the rule for the time-being so that I could test further at a later point in time.

This particular rule redirects all requests that contain a double slash after the http:// section. I thought that this was very odd that this rule should break my site because I can’t see any reason why a legitimate request would need to utilize a double slash. I also was concerned, because judging from my access.log, this is the rule that does the bulk of the work concerning 403 errors.

I did some more scanning of my .htaccess folder and arrived at the conclusion that the culprit must be within the rules for the WP Super Cache plugin I had recently installed. This plugin creates a static html page to serve instead of the normal WordPress PHP pages. Here’s an explanation from their site:

When a visitor who is not logged in, or who has not left a comment, visits they will be served a static HTML page out of the supercache subdirectory within the WordPress cache directory. If you navigate to that directory you can view an exact replica of your permalink structure as well as the HTML files within the directories. To determine if a page has been served out of the Super Cache, view the source and the last line on the page should read or .

Hmm, I’d say we’re getting closer now. The section that WP Super Cache adds to my .htaccess file looks like this:

# WP SUPER CACHE
<IfModule mod_rewrite.c>
AddDefaultCharset UTF-8
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]
</IfModule>
# END WPSuperCache

Now, perhaps a mod_rewrite ninja can see immediately what the problem is, but I was having trouble actually figuring out what was going on. Since I’m using a shared host, I don’t have access to an httpd.conf and therefor cannot use the RewriteLog directive to actually see what’s going on in the rewrites.

After some research I discovered that by adding an R flag to each of WP Super Caches RewriteRule directives, it would force a temporary redirect and therefore allow me to see in the browser what was actually being requested. I changed each RewriteRule to the following:

RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [R,L]
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [R,L]

Now after running the same tests again, I could see in my browser how things were getting screwed up. I typed the address http://blog.nerdstargamer.com into Firefox and sure enough, got the 403 error. This time though, when I looked at the URL it showed the following redirect:

http://blog.nerdstargamer.com/wp-content/cache/supercache/blog.nerdstargamer.com//index.html

There’s the culprit right there. The double-slash right before index.html. So, basically, every time WP Super Cache serves a cached page, it’s serving a URL with a double slash before the file name. I did a quick check by deleting the cache folders of WP Super Cache and confirmed that pages not cached loaded fine while cached pages always got redirected to a 403 error. Bingo.

So, now to fix the problem. Why on earth the .htaccess code for WP Super Cache does this in the first place, I’m not sure. It seems wrong to me, but I’ll defer to the experts on this one. Basically what’s happing is that the variable $1 is being replaced with the path name that was requested which includes a trailing slash. The next part of the rewrite starts with a slash, thus the double slash problem.

I was able to fix the conflict in the WP Super Cache code by removing one of the slashes like so:

# WP SUPER CACHE
<IfModule mod_rewrite.c>
AddDefaultCharset UTF-8
# not post
RewriteCond %{REQUEST_METHOD} !=POST
# not a search
RewriteCond %{QUERY_STRING} !.*s=.*
# not an attachment page
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html.gz [L]

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*s=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1index.html [L]
</IfModule>
# END WPSuperCache

This certainly looks funny but at least it works. I’m sure there is a more elegant way to do this, like say, rewriting the original request to remove the trailing slash and then applying the cache rules. Perhaps this is really a problem with the way WordPress is doing its permalinks (I’m on 2.5.1 by the way). Who knows? Ninjas chime in.

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Comments

1. Jeff Starr

Hi Alissa :) This is certainly one way to resolve the conflict (and an excellent diagnosis, btw). Given that you are working with three different components (WordPress, Super Cache, and the 3G Blacklist), I would say that your solution is practical and effective. Getting all three components to play nicely with the removal of only two characters is about as efficient as it gets, imo. As you say, a more formal solution would involve removing the trailing slash from the permalinks before they are processed by Super Cache, however the result would be essentially the same: omission of the duplicate trailing slash. In either case, you are correct that duplicate slashes have no business in legitimate URLs. Great work! :)

2. Alissa Miller

Thanks Jeff, I appreciate the insight. Great work on the Perishable Press 3G Blacklist also.

3. Louis

Thank you for this fine information Alissa; very clear explanation, and as Jeff said — and it sounds a little “Dr. House”-ish — excellent diagnosis.

4. Alex

Hi,

I am having the very same problem! I’ll be trying your fix! Thanks for sorting this pesky issue out.

Alex

BTW Where’s your about page??

5. Alissa Miller

@Alex,

Great to hear. I hope this fixes your problem too!

Ah, my about page. I set up some Directly Cached Files in WP Super Cache and it looks like it didn’t work quite right. Thanks for the catch, I’ve fixed it now.

Leave a Reply

Please note: I love to hear useful feedback from readers. However, if your comment does not include something more useful than "this is great!", it will be marked as spam and deleted.



appointive
appointive
appointive
appointive