Now serving dynamic and static pages

I was initially against using Blosxom to generate static pages as it would have meant duplication of information on my harddrive. Why should I have countless HTML and RSS files in addition to the TXT versions of my weblog entries? I then thought that it would be nice to generate static pages which played nicely with HTTP HEAD and the If-Modified-Since header. I still didn’t want the space overhead when, chances are, people were only interested in conditional downloads of the HTML and RSS pages in the category directories. Why not do both static and dynamic downloads?

I have wiped out my older rewrite rules in my Apache configuration for this weblog in favour of the following monster rule:

RewriteRule	
^(([a-zA-Z0-9]+/)*[0-9]+(/[0-9]+)*/?(index\..*)?$)
blosxom.cgi/$1	
[L]

That first part matches up archive entries by year, month and date in every category, possibly followed by an index page. (True, someone could post a URL which repeats a string of slash-separated numbers and it would still be processed, but it would have done so under the old rules as well. Maybe I will change that to force a “YYYY/MM/DD” format to the archives at a later date.) The “blosxom.cgi/$1” dictates than anyone submitting a request matching the first expression is served a dynamic request, while the “[L]” portion tells the rewrite engine not to process anymore rewrite requests.

The above rule means that anyone requesting category pages not archived by date will receive a statically generated page. (I now have a script which will statically generate any modified pages and then remove any pages archived by date in order to save space.) Anyone requesting archives by date will receive a dynamic page.

I’m certain the above rule can be further optimised, but it’s good enough for now. If you use it and optimise it, please let me know.

Update: Some Apache mod_rewrite hacking has resulted in the following new set of rules:

RewriteRule     
^(([a-zA-Z]+(/[a-zA-Z]+)*/)?(index\.(html|rss))?$)
-
[L]

RewriteRule
^(([a-zA-Z]+/)*200[2-9](/[0-1]?[0-9])?(/[0-3]?[0-9])?(/(index\.[a-zA-Z0-9]*)?)?$)
blosxom.cgi/$1
[L]

RewriteRule
^blosxom.cgi/(.*)$
-
[L]

RewriteRule
^(([a-zA-Z]+(/[a-zA-Z]+)*/)?[a-zA-Z0-9\-_\+]+.\html$)
blosxom.cgi/$1
[L]

That first rule allows HTML and RSS index pages for category URLs to escape rewriting and display the static page. The second rule is the same as my original rule above in that it serves dynamic pages for posts by date, but it also limits the valid values for the year, month and day. You can still enter an invalid date, it’s just a little more difficult. The third rule exists because Apache sometimes failed to stop the rewrite process after the previous rule. Finally, the last rule serves up dynamic single-entry permalinks.

You might notice that I have placed limitations on valid category names, post names and flavour names. It’s doubtful that I will have a category which contains numeric characters, or a post or flavour which contains characters other than alphanumeric.

Once again, if you use any of these rules and change them, I’d like to hear about the changes you make.