486 words on Software
Tag clouds are quite popular, although it's not really clear to my why. Anyhow, I might need one soon and though I'd look around whether there's a readymade script to make one. It turned out that there are dozens of scripts which give you HTML output for a tagcloud. Seeing that that's essentially a list with a few CSS classes thrown in this was the part which I considered a trivial afterthought. It was collecting the data which seemed non-trivial.
Then I figured that I should probably just write my own little script – giving that PyObjC monster another go. As a technical detail (callback functions for sorting) was holding back progress there, I dropped the Objective-C part and went with Python only. That lacks the rather convenient
NSDictionary provides, but at least it worked and it seemed quite a bit faster as well. After all I wanted a quick result more than a degree in computer science (incidentally the spirit most of today's web is built on I guess…). And after just a bit of fighting I got results more quickly than expected.
I have to say that I quite like Python's identation rich and bracket free syntax. For a scripting language it seems surprisingly sane.
Anyway, a quick test run on some classic literature gave that the results are mixed at best because there are just too many 'common' words which you need to filter out. It may be tricky to decide which words are just common and which contribute to the style or character of the text. Doing this should be much simpler in the classical 'tag cloud' environment, where the list of possible tags is given to you at the beginning.
A further little test run on a long post gave this:
At least it does the job – after I removed the embarassingly frequent words anyway.
Try the script for your amusement. Currently it reads stuff from files you pass to it and dumps the HTML to standard output.
Looks like you haven’t allowed read permission for the script.
Unfortunately it seems to be something weirder.
Perhaps the server wants to run the script but then fails. Changing the file name to have some other extension fixes the problem. But it’s kind of missing the point…
OK, it seems to work now. A ‘RemoveHandler cgi-script .py’ was needed in .htaccess. Even though the system is supposed to not even try using stuff outside cgi-bin. Odd.
Plus a ‘AddType text/x-python-script .py’ to make sure the file is displayed nicely in Safari as well.
It’s working now. BTW, if you have Python 2.4 or later you can use “key” instead of “cmp” when sorting, e.g. key=lambda x: x.lower()
I am too afraid to run a tag cloud on my blog, lest I find out that I’m just as ridiculous and irrelevant as I secretly suspect I am.
Just don’t filter out the occurences of ‘is’ and ‘the’ (and in your case probably ‘bitch’) &c and you should be spared of unpredictable outcomes…
Thanks for the key thing, did that now.
I wasn’t quite sure how to do that from the documentation and had assumed I just need to write .lower(), which didn’t work. So I just copy and pasted the sample code from the manual…
Is the lambda x: thing essentially defining a function?
Received data seems to be invalid. The wanted file does probably not exist or the guys at last.fm changed something.