1153 words on Mac OS X
Rather than bitching about stupid journalists quoting stupid Austrian academics (well at least one of the parties must have been stupid, it’s hard to tell which) on the topic of Labyrinths which in turn led to my social sciences flatmate to get a completely wrong impression of what a labyrinth or maze is because he’ll simply believe anything written on paper (aka social science thinking) rather than a perfectly logical argument I present to him (aka thinking), let me bitch about something else…
Like the fact that some complete frigtard managed to produce a TeX file which contained Roman text along with cyrillic characters. Well, that shouldn’t be a big problem. However, it is a problem if - and that’s how we reconstructed things - the document was created in the following way:
This is hard to notice when looking at the text file in the right encoding and it’s rather tricky to find out what exactly happened there as well. (Just try to find out the exact encoding of a file in a Linux GUI if you don’t really know what you’re doing… and who’d guess Mac-friggin-Cyrillic for it?) But it certainly leaves us with a document containing Cyrillic characters where they shouldn’t be.
The next problems were to find all the Cyrillic characters in the document and to replace them. That should be fairly trivial but with the tools I have at hand I found it quite complicated. [read: if you know a one line regular expression to achieve the same thing, please leave a comment!] A straightforward change of file encoding didn’t do the trick as this replacement of characters is no such thing. So after a bit of back and forth I ended up doing the following: Open the file in a text editor as Mac Cyrillic. This made sure the wrong characters are correctly interpreted; Copy the file’s text to the clipboard. Hack together the following PyObjC script to replace characters in the clipboard and write them to a file - printing out the conversions done and the potentially remaining problems to the console:
#!/usr/bin/env python #coding=utf-8 from AppKit import * pb = NSPasteboard.generalPasteboard() input = pb.stringForType_(NSStringPboardType).mutableCopy() originalstring = u'ЗЃгНМЕКаСТ•А†бä' newstring = u'3oyHMEKaCTeAacä' l = len(originalstring) for i in range(0, l-1): a = originalstring[i] b = newstring[i] count = input.replaceOccurrencesOfString_withString_options_range_(a, b, NSLiteralSearch, NSMakeRange(0, input.length())) print a + "->" + b print count a = input.rangeOfCharacterFromSet_( NSCharacterSet.characterSetWithRange_(NSMakeRange(0,127)).invertedSet()) print input[a.location-100:a.location+10] print input.writeToFile_atomically_encoding_error_("/Users/ssp/Desktop/test.tex", 0, NSASCIIStringEncoding, None)
This still feels like it is a bit too complicated for the simple problem at hand. But it did the job. I think I’m starting to like PyObjC. To begin with, python as a language seems quite nice and intuitive. I haven’t read a manual but just from seeing a few examples I could deduce enough to get a working script. I wouldn’t say that this is necessarily a technically good thing for a language, but for a scripting language used for off-the-cuff hackish stuff like this it’s brilliant. [This certainly beats the atrocity known as perl or the hideousness of shell scripts.]
Then there’s Objective-C and the Cocoa frameworks. As I am familiar with them, having direct access is rather good and gives me the ability to do many things right away without needing to re-learn everything. Again, perfect for hacks like this. And it’s better than the ‘real thing’ because I don’t have the overhead of creating an XCode project, compiling and so on when doing this. A nice thing about Cocoa is its fault tolerance. You can just get away with passing zero pointers in many situations, leaving you with a working script even though things didn’t go perfectly. [That certainly beats AppleScript; As AppleScript is probably the worst language ever for string manipulation I wanted to avoid it here.]
Extra kudos go to the PyObjC implementors for making a bit of an effort in their error messages. Compared to other languages, these were actually helpful. There’s nothing as bad as starting a hack like this and running into a cryptic error message which halts progress. At that point you start losing loads of time. For some things - as the inclusion of non-ASCII characters in the script, an issue which I was really scared about considering the general Unicode incompetence in scripting languages - PyObjC simply spat out a message pointing right to a web page discussing the issue and telling me I want to write #coding=utf-8
at the top of my script. Another Google search made clear that I want to add a u
in front of my string constants and things worked from there. That’s excellent.
Of course this script is still imperfect. It’s off-the-cuff by someone who doesn’t know what he’s doing. And I failed to do everything I wanted to achieve. A few questions that stuck follow. Any insight on the issues they present will be appreciated.
-length
method works, for others I had to use Python’s len()
function.
self
. I tried entering self
but PyObjC just gave an error. I tried passing other things like 0 or a string instead and that led to a rather unfortunate situation where OS X.5’s whole clipboard infrastructure was broken. pboard
needed to be killed and any running application which tried to use the clipboard froze and needed to be killed when doing so. Hence I just wrote the results to a file…
Your code got Markdowned into brokenness
I’ve never liked the underscore-underscore thing in Python, but overall there are very few annoyances compared to Ruby (which has way too much sugar, especially in the way that it screws up and slows down the implementations).
I’ve written several PyObj-C applications, often to write simple menubar helper apps without using Interface Builder or anything, just a REPL. I like it quite a bit!
I’m battling Markdown right now… it always takes me by surprise.
Update: To me this looks more like a bug in Markdown than incompetence of myself. Colour me surprised.
Non-ASCII characters: set([c for c in input if ord(c) > 255])
Pasteboard: If you aren’t using lazy writing, you can use None/nil for the owner. self should work, though, if it’s an Objective-C object.
@Steffen:
Thanks for the hints. None does the trick and knowing how range() works certainly helps.
I’d still say that passing the wrong parameter to Cocoa shouldn’t be able to stall all other applications. Filed a bug on that, let’s see how it goes.
@Michael:
Yes, None does work. But zero just screws things up. I thought self should work but it gave an error message for me. Perhaps because I didn’t define my own class and ‘self’ essentially was the whole script?
In Objective-C, nil and 0 are interchangeable, but in Python None and 0 are not. PyObjC will convert None to a nil object, but it will convert 0 to an NSNumber, which is not a valid pasteboard owner.
In Python, self is not implicitly defined. If you don’t create a variable called self or have a method with self as an argument, self will be undefined.
@Michael:
Aha, I’ll have to get used to that self thing, I suppose.
And I’d still say the whole clipboard infrastructure shouldn’t collapse even if I pass a blatantly inadequate object to it…
Here are my copy and paste functions for Python.
def pbcopy(s): "Copy string argument to clipboard" board = AppKit.NSPasteboard.generalPasteboard() board.declareTypes_owner_([AppKit.NSStringPboardType], None) newStr = Foundation.NSString.stringWithString_(s) newData = \ newStr.nsstring().dataUsingEncoding_(Foundation.NSUTF8StringEncoding) board.setData_forType_(newData, AppKit.NSStringPboardType) def pbpaste(): "Returns contents of clipboard" board = AppKit.NSPasteboard.generalPasteboard() content = board.stringForType_(AppKit.NSStringPboardType) return content
I then put an object around these to make using them from the Terminal more convenient. (For example, as written, pbcopy will crash if passed a non-string.)
class PasteBoard(object): def copy(self, s): if not isinstance(s, basestring): s = repr(s) pbcopy(s) paste = property(lambda self: pbpaste(), fset=copy) copy = property(lambda self: pbpaste(), fset=copy) def lines(): def fget(self): return pbpaste().replace("\r","\n").split("\n") def fset(self, l): pbcopy('\n'.join(unicode(i) for i in l)) return {'fget':fget, 'fset':fset} lines = property(**lines()) def split(): def fget(self): def _(sep): return pbpaste().replace("\r"," ").replace("\n"," ").split(sep) return _ def fset(self, t): pbcopy(unicode(t[0]).join(unicode(i) for i in t[1])) return {'fget':fget, 'fset':fset} split = property(**split()) join = split def words(): def fget(self): return pbpaste().replace("\r"," ").replace("\n"," ").split(" ") def fset(self, l): pbcopy(' '.join(unicode(i) for i in l)) return {'fget':fget, 'fset':fset} words = property(**words()) def to_plain(self): pbcopy(pbpaste()) def to_ascii(self): pbcopy(pbpaste().encode("ASCII", "ignore")) def to_nonascii(self): pbcopy(''.join(char for char in pbpaste() if ord(char)>128)) def to_indent(self): pbcopy('\n'.join('\t'+line for line in pbpaste().split("\n"))) def to_dedent(self): lines = pbpaste().replace("\t", " ").split("\n") lines = '\n'.join(line[4:] for line in lines) pbcopy(lines) def to_title(self): pbcopy(pbpaste().title()) pb = PasteBoard()
This can be used in the terminal like so:
>>> pb.copy = 1234 >>> pb.paste u'1234' >>> pb.to_indent() >>> pb.paste u'\t1234' >>> pb.lines = pb.paste >>> print pb.paste 1 2 3 4 >>> pb.copy = "the quick brown fox jumps over the lazy dog" >>> sum(1 for word in pb.words) 9 >>> pb.words = (word for word in pb.words if "e" in word) >>> pb.paste u'the over the' >>> pb.paste = u"I ♥ 日本語!!" >>> pb.paste u'I \u2665 \u65e5\u672c\u8a9e!!' >>> print ''.join(char for char in pb.paste if ord(char)<128) I !!
I find it convenient enough.