Quarter Life Crisis

The world according to Sven-S. Porst

« First impressions of EarthMainUnderground »


1039 words

Inspired by Mike’s 2005 Zeitgeist, I got to play a bit with the various web analysis tools and want to list some results here. Let me note that I don’t particularly trust those tools as each of them gives different results, say for November 2005:

requests pages bandwidth
analog 532652 42817 ?
awstats 373957 177690 5,03GB
webalizer 766927 282976 7,05GB

So I don’t trust those numbers too much. And I’m not sure how much they mean or what the term ‘pages’ is supposed to mean. Obviously there are different interpretations at work, as well as different skills at counting. Most of my images are hosted off-site, so I don’t know whether the ‘302’ requests are included in those numbers. In addition, I set up analog to ignore all traffic going to cgi-bin to avoid having all the comment spammers clutter the log analysis. But that’s ages ago and setting up analog was too painful to go through that and fix it ‘just for correctness’ again.

Unlike Mike, I can’t present consistent numbers for the whole year as we lived through a few generous cock-ups by our previous provider at the beginning of the year and Alf thankfully helped out hosting things for a while on his server. So the log-files we have aren’t exactly comprehensive at that time of the year. In what follows, I’ll try to pinpoint the mildly amusing things I could find in the statistics, rather than the big, meaningful ones.

To begin with a rather unsurprising point… the number of file requests rose throughout the year, giving a peak in November where a few popular posts came together:

Graphs of the numbers given by different tools

But let’s move to more ‘relevant’ data, like the distribution of accesses throughout the week:


Great, we learned a great deal from that!

Next up, http codes. Those reflect a number of things. First, that I really like off-loading traffic to other servers using 302 redirects. Second, that I like to 403 block people who directly link to my images – more on those in a second – and; Third, that the 304 status code exists, possibly keeping people with aggregators or search engines from wasting my bandwidth. And, fourth, that there are numerous typos in URLs, or, more likely, attempts to access certain files which might make certain servers vulnerable, giving quite a few 404s:

Popular http status codes

Other status codes we saw were 206, 500, 401, 405 and 416. If you know what the latter two mean without looking them up – erm – Congratulations!

The most popular files on the server are:

requests bandwidth
index.rdf 445K 5,2GB
mt-comments.cgi 336K 1,2GB
qlc.css 234K 2,3GB
qlc.gif qlc icon 213K 60MB
comments.js 213K 386MB
qlc_grey.gif qlc grey icon 203K 31MB
qlc’s favicon.ico qlc  icon 77K 25MB
robots.txt 52K 20MB
2003/06/referrers 52K 915MB
favicon.ico 39K 78MB
ssp/blog 33K 2,3GB

So, let me summarise… the most popular files are auxiliary ones or those abused by comment spammers. Isn’t that great? But after a few more of those, there are finally the various earthlingsoft applications and blog posts I care about.

The most popular (or most accessed, at least) posts are

I hope you’re bored by now… because I am, so I decide that I only list the posts which got more than 5000 hits. What can we learn from this? Not much. Not much that’s new, anyway. Google really directs people. Whether your site is relevant or not, doesn’t really matter. People like music. Perhaps I should consider getting an iTMS affiliate account. (Any experiences with that, particularly with how it works in a global frame?)

There’s nothing useful to report on referrer information. The bulk of people are referred by Google. Everybody else is dwarfed by their numbers. The most popular search terms were – unsurprisingly, for those who payed attention when I listed the popular pages above:

  1. how to make a virus
  2. quarter life crisis
  3. ballerburg
  4. itunes problems
  5. ipod generations
  6. patrick wolf
  7. wird sind helden
  8. gekommen um zu bleiben
  9. i wanna die
  10. redirectmatch
  11. religion sucks
  12. sigur ros
  13. itunes
  14. porn

I’m particularly proud of the last one in that list. But I think you have to use Google Germany to enjoy it.

The next topic to cover would be failed referrers. As I mentioned before, I set up the server to deny any request for images which have a referrer outside our site (and a selection of others). Basically I don’t want to host images for others and pay for their fun. The main culprits for wanting to use my bandwidth are at myspace.com (a site that’s apparently quite hip with the kids but which I didn’t even know about… getting old or so) and xanga.com.

The browser distribution I’m seeing is pretty similar to Mike’s results: Just over 40% IE, 16% Firefox, 15% Safari, 9% NetNewsWire and a number of smaller ones. Access by search engine spiders is excessive.

The most active commenters on these pages are the various vendors of erectile medicines and other drugs. Even with filtering hundreds, if not thousands, of their comments came through. Next up are probably me, d.w. and G. Other ‘known’ commenters who comment less regularly would be Dan, Sören or Scott.

January 4, 2006, 0:16


Comment by Tobias: User icon

Looks like you might save money by having the feed not contain complete posts. Just in case you thought about that: please don’t.

January 4, 2006, 10:28

Comment by ssp: User icon

Rest assured that I won’t. I don’t like those snippet feeds myself.

I should try to figure out how to get gzip compression running for the feeds instead. That helped a lot for the html pages. But I just can’t seem to manage to get PHP to work in XML files.

January 4, 2006, 10:47

Add your comment

« First impressions of EarthMainUnderground »

Comments on




This page

Out & About

pinboard Links