Quarter Life Crisis
The world according to Sven-S. Porst
« Bread Rolls • Main • Rice »
1517 words on Mac OS X 10.5 Leopard
Time Machine certainly is one of the outstanding new features of Mac OS X.5. Of course it is mostly untested right now but the essential idea of making backups an automatic and simple feature of the operating system is excellent. It’s probably the only way to popularise backups enough that a non-trivial percentage of users will actually have some.
My view towards Time Machine is quite optimistic. While, historically, Apple isn’t the hero of data preservation it is still a fact that having some sort of backup is better than not having a backup at all. And the key question is how well Time Machine will do when it’s actually used for backing up and restoring data. My plan was to write about this in a few months when it has actually seen some usage and had a chance to prove itself. And I will give a more detailed look at some point in the future.
Most significantly, I hoped to be able to answer the question by then whether or not it can replace Retrospect, which, while still doing an admirable job in creating backups, still feels like an alien in OS X, and a somewhat clumsy alien at that. It’s not that I hate Retrospect. Far from it. I’ve used it at most times in some way or another for more than a decade and it never lost me a file. Rather, it brought some ‘lost’ files back in that time, so it’s a trusted friend if you wish. Still, I’d be happy to go for a more contemporary and, most importantly, more hassle-free and smooth alternative. Which, for my non-professional setup at least, Time Machine could well be.
Unfortunately my first experiences with Time Machine in the X.5 release version have been horrible. I set my computer up to create a backup of its internal drive minus a few folders. And I activated Time Machine before going to sleep, in the hope to find my fresh backup ready when waking up again. Unfortunately that was not to be. When I woke up I found that the machine had kernel panicked at some stage that night.
Thu Nov 8 07:34:19 2007 panic(cpu 1 caller 0x001A7BED): Kernel trap at 0x0041e1b5, type 14=page fault, registers: CR0: 0x8001003b, CR2: 0x0000000c, CR3: 0x00d51000, CR4: 0x00000660 EAX: 0x00000000, EBX: 0x03d2a920, ECX: 0x347bd216, EDX: 0x3609fbac CR2: 0x0000000c, EBP: 0x3609fb28, ESI: 0x03d2a928, EDI: 0x03d2a930 EFL: 0x00010206, EIP: 0x0041e1b5, CS: 0x00000008, DS: 0x36090010 Error code: 0x00000002 Backtrace, Format - Frame : Return Address (4 potential args on stack) 0x3609f918 : 0x12b0e1 (0x4555b4 0x3609f94c 0x133238 0x0) 0x3609f968 : 0x1a7bed (0x45e568 0x41e1b5 0xe 0x45dd24) 0x3609fa48 : 0x19e517 (0x3609fa60 0x0 0x3609fb28 0x41e1b5) 0x3609fa58 : 0x41e1b5 (0xe 0xbe80048 0x10 0x3d50010) 0x3609fb28 : 0x41da6d (0x3d2a920 0x3609fbac 0x0 0x0) 0x3609fb78 : 0x41e3ee (0x3c27f00 0x41e158 0x3609fbac 0x0) 0x3609fbc8 : 0x347b1cfb (0x3d2a920 0x0 0x0 0x3c51800) 0x3609fc38 : 0x347b63dc (0x3c51800 0xb67da80 0xb 0x4d0cd8c) 0x3609fc78 : 0x347b60f8 (0x4d0cd80 0xb67da80 0x7530 0x7530) 0x3609fcb8 : 0x355fc28c (0x4d0cd80 0xb67da80 0x7530 0x7530) 0x3609fd08 : 0x355fc5f9 (0x3cd5200 0x3cd52e4 0x9 0x0) 0x3609fd48 : 0x347b1350 (0x3cd5200 0x3cd52e4 0x0 0x0) 0x3609fda8 : 0x3491dd85 (0x3c51800 0x3d37f00 0x0 0x0) 0x3609fe48 : 0x3491e724 (0x3c51800 0x3d27788 0x0 0x0) 0x3609fec8 : 0x3491e779 (0x3c51800 0x0 0x0 0x408eb2) 0x3609fee8 : 0x34917998 (0x3c51800 0x0 0x3609ff18 0x34917a7c) Backtrace continues... Kernel loadable modules in backtrace (with dependencies): com.apple.iokit.IOUSBMassStorageClass(2.0.0)@0x355f9000->0x35602fff dependency: com.apple.iokit.IOUSBFamily(3.0.3)@0x347ac000 dependency: com.apple.iokit.IOStorageFamily(1.5.2)@0x346e1000 dependency: com.apple.iokit.IOSCSIArchitectureModelFamily(2.0.0)@0x34dce000 com.apple.driver.AppleUSBEHCI(3.0.3)@0x34914000->0x34925fff dependency: com.apple.iokit.IOUSBFamily(3.0.3)@0x347ac000 dependency: com.apple.iokit.IOPCIFamily(2.4)@0x2e31c000 com.apple.iokit.IOUSBFamily(3.0.3)@0x347ac000->0x347d3fff BSD process name corresponding to current thread: kernel_task
Well, looks like some USB problem to the untrained eye as that’s the only recognisable term that legible there. Never had those before.
The next day, I deleted the started backup which only included 10GB of files despite running for about three hours and started a new backup. Just to find the machine paniced once more the following morning. Bad, very bad. To make things more interesting, EyeTV was recording a show around the time the panic happened. The first day the recording finished successfully before the panic, the second day, the panic happened in mid-recording and ruined it that way. This may or may not be related to the kernel panic. Who knows. And who cares. Given Apple’s openness about these issues, we’ll probably never find out, so I’ll just leave the blame with them. It’s their kernel after all.
The next morning I thought I should just let the computer do whatever it sees fit to complete the backup. Surely some kind of handling for such situations with incomplete execution of backups will be in place. If not for kernel panics, then for power failures which will invariably happen to some people once you take into account a large user base.
And sure enough it looked like Time Machine just tried to pick up its work again. There wasn’t even a complaint about things having gone wrong over night. Things kicked off with some initial preparation. So I figured that everything was under control and on its way to be finished. Unfortunately Time Machine didn’t even make it beyond the initial preparation phase in that run and it just presented me with an unhelpful error message instead:
The message just informed me that an error occurred during copying. And it offered ‘OK’ as its only option. No button to fix the problem, no ‘Fuck off’ button either. Not being the type to give up, I just started the backup again – and ended up with the same error message once more. Still no ‘Fuck off and fix the damn thing’ button then. So, the OS screwed its own backup and apparently broke its own ability to create further backups while doing so. Well done.
Let’s say that this somewhat reduced my enthusiasm for Time Machine. If I manage to run into this kind of trouble within a day, what problems will I face when I actually need to retrieve some data from my backup? Admittedly, a lovely aspect of Time Machine is its simple-minded approach of just copying stuff over file by file (and doing the somewhat more black magic stuff of hard linking), so I’d hope that even if things break down on the front-end there should be some way to retrieve files from a backup if needed. But it’s a backup. And if you need your backup you’ll be nervous enough to not need the extra kick of needing to hope for things to work and fiddling around to get them to work.
After all that stress I decided to give things yet another try – removing the partial backup once more and running Time Machine only without sleeping myself while it ran. On that occasion the backup completed successfully (according to Time Machine) and it has been copying changed files over all right since.
I paste some log messages I found below in case those are informative to someone. The logs present on my machine seem somewhat spotty. In /var/log I see files
Lines in the logs which looked strange to my untrained eye follow. There was other stuff in between which I leave out. The next two appeared the second time I ran the backup. There were plenty (hundreds/thousands) of the second message with varying numbers:
Nov 9 01:08:03 Kalle fseventsd[47]: client (fci 0x81a600) ran out of room! (rd 281 wr 280 max 979) Nov 9 01:08:03 Kalle fseventsd[47]: client watching path / is out of space (659 658)When I tried to ‘complete’ the backup which was stopped by the over-night kernel panic, there were messages in the log which are more descriptive than the error message shown but which still fail to make much sense to me:
Nov 9 08:55:52 Kalle /System/Library/CoreServices/backupd[528]: Backup requested by user Nov 9 08:55:52 Kalle /System/Library/CoreServices/backupd[528]: Starting standard backup Nov 9 08:55:52 Kalle /System/Library/CoreServices/backupd[528]: Backing up to: /Volumes/Snowman/Backups.backupdb Nov 9 09:04:28 Kalle /System/Library/CoreServices/backupd[528]: Backup requested by automatic scheduler Nov 9 09:11:03 Kalle /System/Library/CoreServices/backupd[528]: No pre-backup thinning needed: 57.04 GB requested (including padding), 124.49 GB available Nov 9 09:34:22 Kalle mds[35]: (Error) Backup: doBackupIndexFile could not stat backup location /Applications/iPhoto.app/Contents/Resources/fi.lproj/iPhoto Help/gfx/hlp40.jpg /Volumes/Snowman/Backups.backupdb/Kalle/2007-11-09-012627.inProgress/DC04EE5A-C217-41C6-9516-B67931D9107C/Kalle/Applications/iPhoto.app/Contents/Resources/fi.lproj/iPhoto Help/gfx/hlp40.jpg. Error 20 with uid 0 Nov 9 09:34:22 Kalle /System/Library/CoreServices/backupd[528]: MDBackupIndexFile returned -1101 for: /Applications/iPhoto.app/Contents/Resources/fi.lproj/iPhoto Help/gfx/hlp40.jpg, /Volumes/Snowman/Backups.backupdb/Kalle/2007-11-09-012627.inProgress/DC04EE5A-C217-41C6-9516-B67931D9107C/Kalle/Applications/iPhoto.app/Contents/Resources/fi.lproj/iPhoto Help/gfx/hlp40.jpg Nov 9 09:34:22 Kalle /System/Library/CoreServices/backupd[528]: Aborting backup because MDBackupIndexFile failed Nov 9 09:34:22 Kalle /System/Library/CoreServices/backupd[528]: Canceling backup. Nov 9 09:34:24 Kalle /System/Library/CoreServices/backupd[528]: Copied 10852 files (58 KB) from volume Kalle. Nov 9 09:34:24 Kalle /System/Library/CoreServices/backupd[528]: Copy stage failed with error:11 Nov 9 09:34:24 Kalle /System/Library/CoreServices/backupd[528]: Backup failed with error: 11
Hi! This may be a known problem in 10.5.0. See http://docs.info.apple.com/article.html?artnum=306932.
Thanks for the hint. People already pointed me to that before. (And I have to say that it’s not a particularly helpful hint as it requires you to get another huge external drive and spend hours copying files back and forth, making sure everything ends up just the same as it was before as not to break existing aliases or links to the external drive.)
But I don’t think it’s applicable to my situation anyway. ‘Unfortunately’ my drive is not of the ‘MBR’ type. I repartitioned it with Disk Utility when I got it. And System Profiler reports it to be of ‘APM (Apple Partition Map)’ type.
The kernel panics and write failures are very indicative of the problem — I would bet very firmly that your external drive is dying.
Having lost many drives over the years myself, please do get another one before you are screwed.
Thanks for the warning Fred. Luckily I don’t need to be too concerned about the disk at the moment as, most importantly, I still have my Restrospect backups on another disk for the time being and, I do suspect that this has more to do with the software in that case. The disk has never caused problems so far and the machines paniced twice in the same way in similar situations.
That said, I’ll keep this in mind as well as an eye on the disk (which is only a few months old, so I’d rather see it broken while I can still get an exchange under warranty). Are there any tests I could run that check the disk rather than the directory structure?
It seems like your kernel panics are happening around the same section of the backup (and thus the same sectors of disk, since you’ve been deleting the failed backups).
http://www.gnu.org/software/ddrescue/ddrescue.html http://www.garloff.de/kurt/linux/ddrescue/
I’ve used the dd-rescue variants in the past on linux boxes for diagnosis and recovery of bad blocks. Vanilla dd will just freeze when it encounters badness, the rescue variants will tell you, and can be configured to do different things (like bisect the unknown area isolating the bad blocks).
http://www.macosxhints.com/article.php?story=20050720092514388 Apparently they build just fine on OS X, POSIX block devices being pretty universal.
I reproduce this as well. See also http://discussions.apple.com/thread.jspa?messageID=5941345. I disagree that a bad disk is a reasonable explanation for a kernel panic - the kernel should remain consistent and stable in this condition. I have not seen write failures and several posters to the linked thread report that their drives pass consistency checks.
I don’t buy the bad disk theory either. But just in case I ran a block level check on the backup disk which took an eternity and didn’t find any problems.
I have two Macs; one is a MacBook Pro. Time Machine backs this machine up to a firewire disk with no problems. My G4 Mini, which runs Leopard Server, can’t run time machine at all. I bought two brand new USB disks for the machine, and if time machine is switched on, the machine is guaranteed to kernel panic before the first backup is created. With Time Machine switched off, I get no kernel panics, and can quite happily back up from one disk to the other using Carbon Copy Cloner, with no errors.
This is definitely not a bad disk problem. It also doesn’t seem to be a Firewire-vs-USB problem; I’ve seen people report TM-related panics with both. A common thread does seem to be having more than one external disk attached to the machine. Did you only have a single external disk attached to the machine? Or did you have more?
Hmm - I spoke too soon. Even with TM switched off, the panic has occurred, but took more to tickle it. I think TM may not be at fault, but there’s an underlying external disk driver problem which TM may tend to expose.
Back in my series of kernel panics I managed to reproduce the problem when just the Time Machine USB drive was attached to the MacBook’s USB port without any hubs in-between.
I definitely think that Time Machine wasn’t the direct culprit here it just triggers a lower problem with the USB (or FireWire) storages by making good use of them.
« Bread Rolls • Main • Rice »