Bug Report Friday: Spotlight

Two Spotlight bugs that I’d like to write about today. Sure, Spotlight has failed to live up to expectations. But having a computer with double the speed and memory now, it’s performance has improved from ‘glacially slow’ to ‘slow’ and I am occasionally tempted to use it.

I see both problems I refer to when dealing with the digital version of German weekly Die Zeit. If you subscribe to their paper version, that subscription includes access to a full text and image PDF of the paper. Those are enormous files: slightly less than hundred 40cm×60cm pages each week, usually giving you a PDF file of about a hundred megabytes. It’s quite cool to have this as you can easily archive or copy articles you like this way without having to store huge amounts of paper. Spotlight’s indexing capabilities could be tremendously useful in this context to let you quickly find an article you’re looking for. But Spotlight fails in two ways here.

It’s first failure is that it’s extremely wasteful with your resources. Whenever any metadata of the file are changed, the file’s content which remains the same is re-indexed. Indexing such a 100MB file takes about half a minute on my machine with mdimport using about 100MB of physical memory in the process. While this is an acceptable performance for indexing such a large file, it’s nothing you want the computer to do over and over again whenever you happen to rename or move the file. Spotlight’s indexing needs to be more selective here. Even more so as Spotlight also runs on portable machines where the use of the hard drive and processor uses valuable energy and potentially burns the user’s lap. (#4308769)

The second failure I see is the PDF importer itself. It doesn’t provide all of the file’s text for the index. At least with those long and complex files it’s a problem I see regularly. And a problem that renders Spotlight’s indexing quasi-useless. And while those PDF files are a bit strange in places, I don’t think they are to blame here as Preview will find the desired strings in the file without problems. What’s really odd is that I can’t predict which words will be found and which won’t be. (#4309806)

I think both these bugs are a shame. For such specific searches, Spotlight’s lack of speed isn’t a big problem. It’s really an area where Spotlight could shine, even on today’s hardware. But due to these strange software quirks, dealing with big files is troublesome and doesn’t give the desired results.

October 21, 2005, 1:11

bug


Sören Kuklau:

“It doesn’t provide all of the file’s text for the index. At least with those long and complex files it’s a problem I see regularly.”

There is, unfortunately, a limitation as to how deep the importer goes into files. I believe the limitation is size-based, but don’t ask me for figures.

October 21, 2005, 1:54

ssp:

I thought I read something about such a limitation in the past but couldn’t find it again. Still, does that make sense? It always leaves you wondering whether or not a file is properly indexed. Meaning that you always have to assume it isn’t indexed properly.

October 21, 2005, 11:24

