Quarter Life Crisis

The world according to Sven-S. Porst

« WebKit quirk 1MainUrlaub in Polen vs. Plemo Live »

Tag Clouds

486 words on

Tag clouds are quite popular, although it's not really clear to my why. Anyhow, I might need one soon and though I'd look around whether there's a readymade script to make one. It turned out that there are dozens of scripts which give you HTML output for a tagcloud. Seeing that that's essentially a list with a few CSS classes thrown in this was the part which I considered a trivial afterthought. It was collecting the data which seemed non-trivial.

Then I figured that I should probably just write my own little script – giving that PyObjC monster another go. As a technical detail (callback functions for sorting) was holding back progress there, I dropped the Objective-C part and went with Python only. That lacks the rather convenient keysSortedByValue method NSDictionary provides, but at least it worked and it seemed quite a bit faster as well. After all I wanted a quick result more than a degree in computer science (incidentally the spirit most of today's web is built on I guess…). And after just a bit of fighting I got results more quickly than expected.

I have to say that I quite like Python's identation rich and bracket free syntax. For a scripting language it seems surprisingly sane.

Anyway, a quick test run on some classic literature gave that the results are mixed at best because there are just too many 'common' words which you need to filter out. It may be tricky to decide which words are just common and which contribute to the style or character of the text. Doing this should be much simpler in the classical 'tag cloud' environment, where the list of possible tags is given to you at the beginning.

A further little test run on a long post gave this:

At least it does the job – after I removed the embarassingly frequent words anyway.

Try the script for your amusement. Currently it reads stuff from files you pass to it and dumps the HTML to standard output.

April 18, 2008, 16:10

Tagged as software.

Comments

Comment by Michael Tsai: User icon

Looks like you haven’t allowed read permission for the script.

April 18, 2008, 19:49

Comment by ssp: User icon

Unfortunately it seems to be something weirder.

Perhaps the server wants to run the script but then fails. Changing the file name to have some other extension fixes the problem. But it’s kind of missing the point…

April 18, 2008, 20:14

Comment by ssp: User icon

OK, it seems to work now. A ‘RemoveHandler cgi-script .py’ was needed in .htaccess. Even though the system is supposed to not even try using stuff outside cgi-bin. Odd.

Plus a ‘AddType text/x-python-script .py’ to make sure the file is displayed nicely in Safari as well.

April 18, 2008, 20:22

Comment by Michael Tsai: User icon

It’s working now. BTW, if you have Python 2.4 or later you can use “key” instead of “cmp” when sorting, e.g. key=lambda x: x[0].lower()

April 18, 2008, 20:35

Comment by Dave2: User icon

I am too afraid to run a tag cloud on my blog, lest I find out that I’m just as ridiculous and irrelevant as I secretly suspect I am.

April 18, 2008, 21:55

Comment by ssp: User icon

@Dave2:
Just don’t filter out the occurences of ‘is’ and ‘the’ (and in your case probably ‘bitch’) &c and you should be spared of unpredictable outcomes…

April 19, 2008, 1:52

Comment by ssp: User icon

@Michael Tsai:
Thanks for the key thing, did that now.

I wasn’t quite sure how to do that from the documentation and had assumed I just need to write [0].lower(), which didn’t work. So I just copy and pasted the sample code from the manual…

Is the lambda x: thing essentially defining a function?

April 19, 2008, 12:06

Add your comment

« WebKit quirk 1MainUrlaub in Polen vs. Plemo Live »

Comments on

Photos

Categories

Me

This page

Out & About

pinboard Links

People

Ego-Linking