1375 words on Software
It’s just a sad fact that me and computers don’t go together to well. Whenever I touch the machines, things start breaking. Whenever I see the possibility of using them for something vaguely interesting I immediately hit a wall of bad bugs, poor documentation and the time-killing idiocy resulting from that. As a consequence I hate software.
I prepared a little tweak for our web site today which led me into at least two such undesirable reasons. So far I quite loved PHP. Because it’s a conceptually a cool thing to just be able to build your web pages with building blocks you can share throughout the site. It’s relatively hassle free and can be used for all sorts of handy things – like keeping the last changed dates of your pages correct, compressing your web pages for download or making sure the current version numbers of your applications are listed on the download pages without manual editing. That’s cool. And it’s so neat to just drop those few extra lines right into the HTML without needing any separate scripts.
This enthusiasm – of course – reveals that I never wrote more than three lines of PHP. And even those I copied from somewhere on the web. Trying to do my own thing, I – eek! – had to look at documentation and figure out how to accomplish my goals. And that wasn’t pretty. It seems that PHP consists of three gazillion randomly named functions. In the php.net documentation subjects are indexed alphabetically rather than sensibly, and some of those functions actually require plugins to be installed. Even by just browsing the documentation I also got the impression that they make a point of changing the way each function works at least once. Arrgh. I also wonder whether somebody tried to make a web page that lists all possible ways to do a search and replace in PHP. But I digress.
The next evildoer is the mighty Apache. I had been scared and scarred by its documentation (you have to picture little movies of geeks running up and down some office re-iterating lines like the name says it all: it’s for documenting, not for being helpful
) before, so I wasn’t really shocked. And my little project to investigate Apache’s content-negotiation and Multiview feature for an updated version of our web site started quite well. (The basic idea here being that instead of using a single page for both the German and English text, we may in future have a page for each language with URLs remaining mostly intact and everybody ending up a the desired page).
Things looked quite good at first. But only at first. Then things started getting a bit absurd. While I don’t fully understand what’s going on (having something like a usable debugger for Apache would be sweet, I guess), I’ll try to document the hassles here. To be fair, the Apache page on content negotiation lists most of the things you have to do and the problems you will run into, so once you’ve dug through that you at least know what you are facing.
Essentially you start off by adding Options +Multiview
to the appropriate .htaccess file and you are ready to start. Then, rather than naming your HTML files with names ending in .html, you add another extension specifying the language. So we might have test.html.de and test.html.en, for example. Interestingly, you can also revert the order of extensions and use names like test.en.html and test.de.html which may be preferable if you want your computer whose dumb-ass indexing-system loves to decide whether or not to index a file based on the end of its file name. But the behaviour of files named in such way is subtly worse than that from those with their name extensions the other way round, which forces you to go for the first option anyway. But that’s well documented at least.
Next come the finer points of content negotiation. You do start getting problems once you actually try things out. The simplest test for me is running Safari set to German. Where ‘set to German’ means that German is the top-most language in the list of languages in the International System Preferences pane. Simple as that. And it works rather well. I can enter ther URL ending in test.html and I get the file stored in test.html.de. Just as it should be. And it is the same for English. Next up was French. Now there is no file in French. Big question: what happens?
‘It depends’ is the big answer. Using Safari, for example, you will get a page with the rather unfriendly 406 Not Acceptable
error. But it tells you that it couldn’t match your choice of language and then gives you links to the two languages that it can offer. Of course that’s not exactly convenient. But it also stresses a possibly non-optimal fact about Safari: It only sends your current system language to the server. And if the server can’t match it, you are lost.
Historically, browsers offer a list of languages to the server and most of them still do it today. So even when you are using a French browser, you may specify that you also understand English and a bit of German. When negotiating the language with the server this will then be more or less reasonably used to give you a page in the language you know best – English in this example. But Safari doesn’t do that. It only sends a single language to the server, rather than making use of the whole ordered list of languages there is in the International preference pane. And thus the actual result you get at this stage will depend on your browser and your settings.
But we’re not done yet. Indeed, the Apache people had thought of this as well and they let you provide an extra file text.html.html to cover the cases where no good match for the language is found. Yup, it’s a silly filename. And in fact, I wanted that file to just be the same as the English version of the web page. As I can’t create aliases or symlinks on our server, I decided to just cram a bit of PHP into that file to include the English file. Problem solved?
Not at all. In fact this was the point when things started being really mysterious. So I am in this situation: A folder containing three files
With this setup, point a browser to load index.html. Take a guess what you will see with your preferred language being set to German, English and French, say. Hm? Exactly – you will get the English page in each case†. Needless to say that this was extremely puzzling and we had to have a rather extended chat before figuring out what’s going on. I’ll leave this as a little quiz for you before giving an answer tomorrow.
† In the default setup this applies to Safari at least but not to Firefox. Which is another indication that Safari’s language handling is a bit lame. Safari seems to send de-de as a language code which stands for German German (as opposed to Swiss German, say). But requesting de-de when the server only has general German de, means that the server cannot perfectly satisfy the request. Which leads to a different behaviour than Firefox has which starts off listing both de-de and de as acceptable languages (although this isn’t 100% clever)
In fact, while I think it’s all sweet to list those regional languages and I can see how it is important to distinguish Simplified and Traditional Chinese, it appears to me that these distinctions don’t make a big difference around here. In practice (how many sites have you actually seen using this – the fact that your language is de-de rather than de-at or en-us rather than en-gb, say?) I doubt it makes a significant difference. Still, Safari just sends de-de rather than both de-de and de which looks like a bad idea.