The iPhone announcement yesterday prompted a lot of criticism from technology pundits. Some of it was childish, some of it intelligent. Perhaps the new features are not enough to compete with Android phones; perhaps the hardware upgrades were not really as big as they could have been; maybe it needs a bigger screen, etc. But the pundits who accuse Apple of lacking ambition don’t really understand what they’re claiming with Siri.
I’ve been a big fan of Star Trek since I was about six years old. Say what you will about the writing, acting and special effects (all of which were frequently awful), Star Trek did not lack for technological vision. Warp drive and transporters haven’t arrived, but I think the LCARS touch interface on their computers was a risky and stunningly accurate prediction. I remember thinking how unrealistic it looked, that an interface like that could never work (this was mainly because there was never anything like a keyboard). Well, Apple created something along those lines with the iOS touch interface. They built an intuitive, easy to use touch device that included always-on connectivity to a massive store of easily searchable information. They actually built something out of Star Trek. In fact, in many ways, the iPad is much better than the PADDs depicted in Star Trek 20 years ago.
There was another (and older) way of interacting with computers in Star Trek: the very intuitive (and sometimes unrealistically psychic) voice interface. That made intuitive sense the instant you saw it used: you ask the computer for what you want it to do, and it does it for you. Simple, easy, and impossibly hard to actually build. With Siri, Apple is making the claim that they are building something else from Star Trek: you talk to the computer, and it does what you want. That is not unambitious. That is not small. If they succeed (which I’m somewhat skeptical of), it will be an enormous step forward, and a fitting memorial to Steve Jobs, whose company would then have revolutionized the way we interact with computers three times, instead of the already unbelievable two times (touch and GUI).*
Personally, I’m a bit skeptical that Siri will work as well as it does in their demo and advertisement. AI is something that is quite easy to demo beautifully, but it can fail in an enormous number of little and highly visible ways. Frequently, once you step outside a narrow set of potential queries, it can completely fail.
There are two ways that I’ve seen at getting around this. The first is to just work diligently for years or decades until it becomes good enough to use in restricted (though still impressive) domains. That’s the tactic that voice recognition and OCR have taken so far. A lot of call-center menus are handled with voice recognition. It works OK, and it’s certainly easier than waiting through a list of 6 slightly different choices trying to keep track of which number was the closest to why you’re calling. With OCR and handwriting recognition, I now no longer have to fill out deposit slips at ATMs. The ATM scans the check, enters the values, and all I have to do is push OK. You can even do it with your phone’s camera at some banks. I’ve never seen it fail to work correctly, but it’s a pretty restricted domain.
The other way to get around fundamental limitations in current AI/machine learning technology is with clever UI design and expectation lowering. For example, Google’s search engine is now achieving comparatively high levels of precision, but it’s still extremely far from perfect. They’ve been so successful by (1) expectations being low** (Altavista was awful) and (2) by showing you a long list of results some of which are hopefully relevant to your query. If you see an irrelevant result, it’s pretty quick to skip them and move on. Now with Google Instant, they’re showing you even more search results per query, and I generally stop typing once it shows me a relevant result. It looks as though their precision is high because what I want is usually the top result, but they may have flashed 5-10 top results until I stopped looking. Then they have 10 more tries after I’m done typing the query before I hit the “boy this isn’t working” stage and either refine the query or click to page 2. Similarly with Stumbleupon, you can “stumble” to a new link. The recommendation is likely to be good even if it isn’t very topical because otherwise nobody would have recommended it, and if you don’t like it, the next page is a click of a button. The whole process takes a few seconds at most, and you only remember the hits, not the misses.
I don’t think speech recognition is good enough yet for the “just be good” solution, at least not without investing a lot of time training the recognizer for the specific user. Apple’s recognizer is further limited since it runs on the phone instead of on beefier servers like Google’s and Microsoft’s, and Siri is probably not a limited enough domain to truly succeed often enough.
Apple has struggled with the second solution in the past with the text correction interface. Indeed, it’s so bad that it’s a meme. I get frustrated by it at least once a day (almost every time I type something), even though it does objectively make typing easier on the device. It feels like it makes it harder. Similarly, unless Siri is extremely good at returning control to the user when it makes a mistake, users will feel like it does a bad job, even if it is actually quite helpful.
Still, even if it works poorly, so did the first iPhone, and that was revolutionary.
* Yes, these were developed by others first. They built it for consumers (GUI), or built it right (touch).
** They also introduced “I’m Feeling Lucky”, which implies that by going with their top result, you are throwing the dice. It might work out; it might not.
Update: Hijinks Ensue, as usual nails the idiocy of the technopunditocracy: