Following Tobias’ comment on my recent Zeitgeist’s result that my RSS file is by far the most requested and most bandwidth consuming file on our server, I tried to figure out how I could send those feeds in compressed form as well.
I have changed things now in a way that the feed should be sent in compressed form if the requesting application supports that. While I was at it I also replaced my completely unstyled RSS feed by the mildly styled version I made recently. None of these changes are supposed to cause trouble to the users and I didn’t see any in the tests I could do. But be sure to drop me a line if things go wrong on your end (i.e. if the feed is garbled, if you’re seeing server errors or if the server doesn’t give a 304 code when appropriate).
There are several ways of sending compressed files. One would be to gzip the file and use Apache’s content negotiation to send it instead of the original file if the browser indicates it can handle gzipped content. But that’s a bit cumbersome as you’ll end up having another copy of every file and you’ll need to make sure these stay in sync. To make that work you’ll have to do more clever scripting than I’m capable of and possibly need a more powerful account on your web server than our hosting company gives us anyway.
Then there seems to be the mod_gzip Apache module which can do the magic for you. The only problem with that seems to be that most hosting companies don’t have it installed. The saviour is PHP – included in most hosting contracts and with a neat option to simply compress all PHP pages you have. If you want to do that for a PHP file, all you need to do is add
ob_start("ob_gzhandler");at its start. That’s all you need for compression. Cool.
Once you start doing that a number of additional questions start coming up, regarding the topics of caching and file types. It seems like PHP pages always have the current date as their modification date by default. That probably means that you’ll miss loads of opportunities to avoid reloads and save bandwidth. I’m not an expert on these things, but it looks like having the following at the top of your php file does the trick for that problem:
<?php ob_start("ob_gzhandler"); $mod_gmt = gmdate("D, d M Y H:i:s", getlastmod())." GMT"; header("Last-Modified: " . $mod_gmt); ?>
And that was the easy part. It only involved looking up PHP related things. So let me use an opportunity to insert a little rant: While software like PHP or Apache (well, I guess you could add many other open source products to this) may be all right and get the job done, their documentation seems to be abysmal. Many people are using these tools. And I’m sure they’d like to look up how things work from time to time. But whenever I tried to look anything up anyway, the top hits on Google have never been helpful. And after navigating my way through the ‘proper’ help, that help wasn’t all that helpful after all. Lengthy in technical details, short in practical examples, and even with possibly helpful but completely random user comments added to them. Add to that that the programs’ versions can differ in both big and small details and you’re easily confused. – Just like the wonderful world of man pages…
Anyway, my aim still was to get compression working for all files on our server. A situation that involves the following components
|State / Quality|
|Apache||most likely good|
|PHP documentation||Huh? + User comments|
|ssp||confused to annoyed|
So I wasn’t off for a perfect start. In theory two little steps should be needed to make the magic work. First, tell the web server to pipe all of your files through PHP. Second, tell PHP to compress everything that comes along. Oh, and third, realising that those steps aren’t all that simple.
And that third step is the frustrating one. In theory, there seems to be a very elegant way to just prepend some PHP commands to ‘any’ file the server sends, letting you enable compression for many files without having to change them. All you should need to do is create a file with the necessary PHP commands and add
php_value auto_prepend_file compressor.phpto your .htaccess file. Unfortunately that technique doesn’t seem to work on all servers, and as it can be hard to get hold of error messages on shared hosting (there’s nothing in error.log anyway), it’s hard to tell what goes wrong. From comments of other people it seems that this problem isn’t quite uncommon and may have something to do with the server configuration.
And even if that technique worked, you’d still be in trouble for the file types you’re using. As far as I can tell, Apache determines the MIME type of the files it sends by your explicit settings depending on the ‘extension’ of the file’s name. So people like myself who don’t like having the ‘.html’ at the end of their file names that means they’ll have to set a default. And that default needs to be something like
DefaultType x-httpd-phpin your .htaccess file, so the files are processed by PHP. However, by doing this, you lose all the information on the file’s type and PHP will just send it’s default – ‘text/html’ as a MIME type. Usually that works well but as my specific aim was to compress RSS feeds, it’s no good as they are XML and should be sent as ‘text/xml’ as any validator will tell you. This is an extra complication to keep in mind.
To enable encryption for the RSS files, I had to add PHP to the file now. Unfortunately, PHP doesn’t like XML. Apparently that has to do with PHP not liking the XML element in the file as it also starts with ‘<?’ and PHP just pukes when it sees that. Fortunately using PHP in XML isn’t that rare in the days of XHTML and a bit of looking around revealed a discussion of the problem which included a solution: Make PHP write the ‘<?xml …’ element into the file. While that seems a bit ‘dirty’ to me, it does work.
So after all those steps, I have enabled compression, corrected http header modification date, worked around the ‘<?’ problem and manually corrected the MIME type and got a header for my RSS files looking like this:
<?php ob_start("ob_gzhandler"); echo "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?&rt;\n"; $mod_gmt = gmdate("D, d M Y H:i:s", getlastmod()) ." GMT"; header("Last-Modified: " . $mod_gmt); header("Content-type: text/xml; charset:iso-8859-1"); ?>Quite a lot of effort for what is achieved. But at least it looks like it works.
Received data seems to be invalid. The wanted file does probably not exist or the guys at last.fm changed something.