478 words on Software
I really do like AppleScript and have done so for years. But the pain that goes with using it seems to have no end. If only there were decent and rigorous documentation, rather than the old, not too enlightening and rather hidden Language Guide and a bunch of more or less working script dictionaries. (Does the dictionary part for the new Property List stuff in Standard Additions actually work for other people? It’s just empty on my computer)
Today I had another fun encounter with AppleScript. While I somehow like the delicious service, I still feel uncomfortable about it not storing the data on my computer but leaving it at the mercy of others (who tend to turn nasty, bankrupt or both after a while). So I want a backup of my links there. And with Buzz’ nice Cocoalicious tool and its AppleScript support this wasn’t too hard to achieve.
I made a little script that grabs all the data from Cocoalicious and writes them to a HTML file. It’s not a particularly good script, because AppleScript writes localised dates by default, for example, which may make reimporting the stuff somewhere a pain. But after another little tweak, making sure that ampersands in URLs a properly encoded for HTML and converting everything non-ASCII to HTML character entities, so we don’t have to worry about the strange AppleScript world of encodings, it worked.
But the HTML didn’t validate. Odd, I thought. And the problem turned out to be a bit hard to see. Somehow the file was written as UTF-16 encoded text. Quite an unusual format. And one I hadn’t seen from AppleScript before. To add injury to insult, the file didn’t even have a BOM – fun idea from a company who want to sell their system on computers of varying Endianness.
A simplified example for what I saw is this script:
set i to "abc" tell application "UnicodeChecker" to set i to XHTML representation of i set f to choose file name default name "test" set myFile to open for access f with write permission set eof of myFile to 0 write i to myFile close access myFileThis will give you a UTF-16 file. Remove the second line and you’ll get Mac-Roman encoding (you can’t tell by this example, but trust me…). Now try to find good documentation on what is happening here, why it happens and how to change it. In Apple’s documentation.
Luckily Steffen remembered that the Satimage site has a page on Unicode and AppleScript which solved the problem. The executive summary of which is that using a slightly modified command for saving will do the trick:
write i to myFile as «class utf8»Oh, and once you know what you’re looking for, you’ll also find the relevant documentation on Apple’s web site. Once. In three lines of the AppleScript 1.9 release notes.
Documentation isn’t really one of AppleScript’s strong points, is it?
Anyway, compared to figuring out what encoding AppleScript’s deciding to use today, getting it to handle malformed UTF8 (as seen in the odd ID3 tag) really makes your life difficult. If this was only for local usage I’d use the Satimage OSAX, but sadly I have to worry about distribution. I’m sure I’ll figure it out eventually, but it’s no fun in the meantime.
Nothing to do with Applescript, actually, but a handy Tiger-specific way of keeping a local copy of your del.icio.us bookmarks. Once they’re local, you can do all sorts of terribly clever things with smart folders…
Paul: Handling malformed UTF-8 can be a bit tricky, I suppose. If you’re using AppleScript Studio, you should be able to use all of Cocoa… but I’m not really sure whether that would actually help as I’ve never dealt with such problems.
Dave: Hm, looking at Cocoalicious this afternoon, I thought about exactly the same feature and contemplated adding it to Cocoalicious which shouldn’t be too hard. Seeing that such a tool already exists discourages me a little…
And to add the ugly to the bad :
The trick —> write i to myFile as «class utf8» doen’t work with AppleScript Studio, since the chevrons «» are not understood by ASS
I also found the Satimage page and followed the instructions. But I always end up with garbage. All ASCII charcters are fine. But if it comes to German Umlauts, I just get (as an example) insetad of the bytes c3 bc for an umlaut u, an u followed by something else: 75 cc
Skeeve, It’s a bit difficult to tell what exactly your are seeing in your situation as you don’t give an exact explanation / example of what you did.
From what you write, my guess is that you started with an UTF-8 encoded NFD string and you dropped the 88 that follow the CC in it. That’s perfectly good Unicode, you just need to know its format and handle it accordingly.
I recommend that you look at our UnicodeChecker application, and take a close look at the UTF-8 encodings and the Normalisation Utility to get a better feel for this.
Thanks for the reply… I found a solution.
What I had was a loop over some iTunes tracks
repeat with loc in every location of thesetrackref
It seems as if this already converted the text into a form that couldn’t be converted back to utf-8
What I have now works and, in case aomeone needs an m3u export script for iTunes, the code can be seen here:
The Forum is german and the messages in the script too, but I don’t think it’s a problem.
@Patrick «class utf8» DOES work in AppleScript Studio. If it doesn’t it is just that your .applescriptfile uses the wrong encoding (UTF-8 instead of MacRoman). To fix, in XCode select the script then the Menu Format -> File Encoding -> Western (MacOS Roman). Now, with MacOS Roman encoding, it will compile just fine. Took me quite a while to figure out.