Fog Creek Software
Discussion Board




Voice Recognition - In corporate?

Where do you see advantage of voice recognition? Imagine that our cubicles are noisy now, after voice recognition it will be much louder.

At home office, where you play music, your girlfriend watch TV, etc, you will add noise again.

In public computers, eg. in a railway station, the back noise is too big probably.

na
Wednesday, December 17, 2003

in the army, I saw throat mikes that didn't require you to actually speak, only to make the movements.  Worked quite well, and didn't have background noise problems (think about the noise in a tank or beside an artillary battery, for example) and didn't make you loose your voice.  And it ain't rocket science.  Maybe it is a way forward?

i like i
Wednesday, December 17, 2003

Voice recognition generally sucks.  You'll get your flying car before you'll get decent voice rec.

As an experiment, try the Sci-Fi fantasty of a voice controlled house lights.

As you turn the light on, say "House, lights on" and pretend the computer did it.  The flaws start to show up fairly quickly.

Examples - putting small child to bed, yelling "House, lights off" is not a good move.

Leaving the house at night, you turn the porch light on and the hall light off as you leave "House, light on, light off, no dammit, light on, not that one..."

Of course, when you say "Light on", the computer should say "Which on", given the number of lamps & such about.

All that even before you try watching TV!

AJS
Wednesday, December 17, 2003

i like i: do you have a link for that army system?

Sounds like mouse gestures (mouth gestures?)

AJS
Wednesday, December 17, 2003

One will not be coding with voice recognition any time soon.  We use voice recognition software for our executives who are often writing sales bits or marketing literature.  It allows them to write as fast as their brain can generate the appropriate phrasing, and it seems to really work with the way that they approach the computer.  Previously they would think up an idea, say it, then try to write it down.  Holding the message in the brain when the wording was very carefully chosen by how it hits the ear is a difficult thing to do, especially on long documents.  This seems to be a natural fit.

As for the noise level, if you've ever been to a claims processing area or a help desk you know that there's a certain hum generated by many human voices.  It isn't distracting, but it can be annoying.  Of course none of us spend all day talking, we read, process, attend meetings, etc.  I suspect we'll see more voice recognition software for creating documents in the future, but that the amount of processing will not be more cumulatively noisy than the standard clacking of keys in a cube farm.  If anything the human voice can be a bit more pleasant.

Lou
Wednesday, December 17, 2003

Sorry, no link of that specific army system, have no idea who made it.  Throat microphones are everywhere in that kind of environment; this isn't top secret stuff!

http://www.google.co.uk/search?q=throat+microphone

i like i
Wednesday, December 17, 2003

I see the advantage in the number of transcriptionists we have to keep on the payrolls for our doctors.

Of course, that is a very specialized use, but it does demonstrate a point.  7 years ago, voice recognition didn't even work very well for medical uses.

Your point about home automation is kinda false though.  In your example the problem isn't with the voice recognition, it is with the intelligence of the system that is using the voice recognition.

Steve Barbour
Wednesday, December 17, 2003

"I like I"
I believe you are mistaken about throat microphones. What they do is to capture the throat vibrations on the skin as they propagate through your neck, before the sounds are actually emitted. You still have to speak albeit in a low voice. The advantage is the noise level is very small compared with the surrounding environment.

coresi
Wednesday, December 17, 2003

I used voice recognition for a few years due to a repetitive strain injury. Background noise, even fairly loud noise, was never a problem; the mic doesn't pick it up or the software filters it out. Loud intermittent noises like a dog barking in the next room do get picked up but if it's a noise that will occur regularly you can easily train the software to ignore it. I trained it to ignore my coughs and sneezes when I was sick, for example.

David Pogue had a very funny article on voice-recognition software (which he uses to write all his computer books and his New York Times columns); I remember he described how he experimented with different sounds to see how the software transcribed them: He gargled some Calistoga water in his mouth and the words "equipment equipment equipment" appeared on the screen.

Having worked as a journalist in a busy newsroom, and having jobs before that where I worked with lots of people talking, I think it's fairly easy to tune out the sound of voices around you.

The one instance where voice recognition is likely to be most problematic is when you're writing a confidential document in a public place. Obviously you wouldn't want your neighbors to hear what you're dictating.

Brad
Wednesday, December 17, 2003

AJS - you're thinking too narrowly. Not "house, lights on" but rather "I'm home" runs a macro to set the lights, turn on the TV, etc. Then "I'm leaving" does the reverse. "I'm up" turns on lights, starts the coffee, and gets the shower running.

It's just like smart remotes - you're wasting your money if you duplicate the buttons on your old remotes. You should set things like "Watch the game" which turns on the right network, sets your home theater sound levels, turns off the radio, etc.

Another use - UPS drivers can simply say "delivering package, 123 Main Street." I'm sure it can go on...

As for office use, I think voice recognition is square peg/round hole.

Finally - the throat mike thing is called "subvocalization" - you speak without projecting. :-)

Philo

Philo
Wednesday, December 17, 2003

Speaking of running macros -- when I used voice recognition on my computer I could just say "Get my e-mail," which launched a macro that dialed up my ISP, logged me in, got all my e-mail, logged me out, hung up the modem, and opened my first message. Sweet. Then I could go through my e-mails without touching my keyboard by saying "delete message" or "next message" or "reply" etc.

Voice recognition also came in handy for editing: I could say "select sentence" or "select paragraph" or "delete word," "delete paragraph," "go to top of page," etc. instead of using keyboard commands or mousing through menus.

Brad
Wednesday, December 17, 2003

"coresi"

well yes, talking so quiet you can't here it yourself and not actually talking is quite a tricky distinction ;-)

Not only does a throat mike give you a low profile for SWAT in your officespace, it might also simplify the voice recog itself?

On a related note, you can play audio in the ear by bone vibration rather than using real sound waves as intermediaries: here we have a pair of technologies that might mean that audio interaction in your booth doesn't interrupt others nor gets much interference from others..?

i like i
Thursday, December 18, 2003

It tend to agree with most of what people have said.  Voice rec, like the early laser, is a solution without a problem.  Unlike laser, voice rec doesn't work all that well, and there aren't many problems for it to solve.

There are a few niches.  TI had voice rec on office PC in the early 1980's.

Steve Barbour has a point where he say the computers aren't smart enough.  Natural language recognition still has a long way to go. 

I still think voice controlled house are dumb.  Make the house smart.  Can't find the link to the guy with the AI controlled house that eventually figure out his patterns, running the heat, lights, hot water etc.

For the macro/smart remote idea, only programmers like macros.  Woz (Apple) bult a system smart remote called CORE once.  All evidence of it's existence has been destroyed.

Macros attempt to join a bunch of crap together, rather than standardising the crap.  Remember how TVs had twin tuners, UHF/VHF?  Notice how they have vanished?  They haven't, TV just got a better user interface.

Alan Cooper suggested a continuation of this, where all inputs become a channel.  Your remote no longer has TV/VHS/SAT/DVD modes.  You just keep flicking thru channels, which includes TV, Cable, AM/FM radio, your turntable, tape deck, CD, Tivo, IPod, etc.  Cable TV does this anyway by having radio stations...

The UPS delivery is solved by GPS.  Stand on doorstep, house details come up automatically, press ok.

AJS
Thursday, December 18, 2003

Voice recognition is great for those who either haven't learned to type (plenty of bosses and most occasional letter writers), for those who might be able to type but are not in a physical situation to do so (doctors and field engineers for example) and for those who have spent so long typing in bad postures or on laptops that they have developed RSI.

The main problem is that it was released as an immature technology. The result was that even a poor touch typist such as myself could type a lot faster than he could use VR. Even now when VR reaches speeds of 50wpm, which is the speed of a non-professional typist, the interruptions to flow it causes are such that an experienced touch typist stil finds it easier to type.

Stephen Jones
Thursday, December 18, 2003

I build speech applications for a living, and it's a major industry. They're most useful in telephony installations, since they're much more user-friendly than touch tones and cheaper than hiring call center agents.

Recognition of arbitrary speech still doesn't work all that well. However, when you have a restricted grammar of utterances that the caller might say, such as numeric values or discrete choices, speech recognition has high accuracy.

As an aside, my wife had hand problems for a while. She ended up using a speech recognition package to write Perl in emacs. It was frustrating for her, but at least she stayed employed.

Julian
Thursday, December 18, 2003

Apart from insufficent software quality (which will be solved soon, I think), the big problem with voice controlled computers is that their UIs just are designed for it. Running a system designed for a mouse with voice commands is horribly unwieldy.

I predict the next big OS shift will come not from Linux, but from whoever comes up with a really functional voice activated system.

Mr Jack
Thursday, December 18, 2003

"Sounds like mouse gestures"

My computer doesn't understand my mouse gestures. When I throw my mouse down on the desk it should damn well realise I am frustrated and start working properly again.


Thursday, December 18, 2003

""I'm up" turns on lights, starts the coffee, and gets the shower running. "

"Hey, I'm up for some Chinese take out, anyone else want some?"

Oops.

Jim Rankin
Thursday, December 18, 2003

Another thought experiment is to imagine a world where everything works on voice recognition.  Then someone invents the "switch".

"You mean I can turn the lights on without saying a word, I just need to push a button on the wall?  Wow!"

As a geek, it's easy to think people want to write macros to organize their day around.  In reality, most people want the shower to start when they turn the nob on the faucet.  Direct manipulation has a lot going for it as an interface.

Another way to look at it:  what do voice recognition macros have going for them over the clapper?  Clapper's cheap, been around for a while.  Still know a lot of people who like their light switches, though.

Jim Rankin
Thursday, December 18, 2003

Julian: I build speech applications for a living, and it's a major industry. They're most useful in telephony installations, since they're much more user-friendly than touch tones and cheaper than hiring call center agents.

Yeah, everyone is happy bar the users.  Good story in the paper the other day.  Woman rings a hospital Sexual Assault line complaining she can't get the top off.  Confusion follows.  What, this isn't the Saxa Salt company?

To be honest, I prefer touch tone.  I know my credit card of by heart, (from testing credit card IVR aps), muscle memory kicks in.  Look at SMS vs voicemail.

Anyway, as previously mentioned, the computer ain't smart enough.  Even if it was, light switches are cheap and rarely crash.

AJS
Thursday, December 18, 2003

Well, in the case of a light switch, it doesn't have to be an either or situation.  I already have something similar in my home using X-10 light switches.

I can control them via remote (but seldom do), from the switch (almost always) or from the computer (usefule for appearing to home whne you are not).

I would presume that by the time we get around to wiring a central computer, with voice recognition throughout the house, we would still know how to make it work with manual switches.

BTW, I wouldn't recommend doing the x-10 switches thing unless you're just interested in playing with it or you are away from your home on trips often.  We hardly use ours anymore, but I did it more for the gee whiz factor.

Steve Barbour
Thursday, December 18, 2003

Voice recognition in the house is a gimmick.

But as a replacement for typing it definitelly has its place.

The problem I see is that typing is slower than thought, so we get time to organize our thoughts. With speech that isn't so, and the result is likely to be prose that missing the intonations of speech, is naggingly unsatisfactory.

Stephen Jones
Thursday, December 18, 2003

Jim, first of all, nobody is going to make you go with a voice-controlled house, so don't worry about it. Secondly, in the case of "I'm up for some chinese", let me suggest how easy it is to simply check for inflection and silence immediately before and after the command. As for light switches, as someone pointed out - nobody's taking them away.

"Alan Cooper suggested a continuation of this, where all inputs become a channel.  Your remote no longer has TV/VHS/SAT/DVD modes.  You just keep flicking thru channels, which includes TV, Cable, AM/FM radio, your turntable, tape deck, CD, Tivo, IPod, etc.  Cable TV does this anyway by having radio stations..."

Heh. Cooper screwed up. I have something like 200 channels on my TV. Do I have to flip all the way to 201 to get to my DVD player? No? Just type '201'? How about just hitting the 'DVD' button?
...because that's the way it works now... [grin]

Philo

Philo
Thursday, December 18, 2003

The real problem with voice recognition is that you need to consider a lot more than simple signal to noise.  We are really talking about interacting in general with a computer using voice.  Even if the human is the only one talking.

And of course, voice interaction is a multi-layered topic, including lots of different topics such as audio and voice quality, recognition, prosidy, sematics, and all that psychological stuff that pops up when you are working with another person (or computer as social actor), such as attraction, presense, etc.

Since I think it's still just a matter of time for good voice recognition, and I'm generally uninterested in the mechanics of getting a computer to recognize a human, I'm much more interested in the user experience.

Humans are speech and language machines.  Our brain is optimized for speech and language. Like many things our brain is optimized for, other systems are well connected to the speech system, like emotion.  Anything you talk at or to or that talks to you will engage a bunch of unconscious psychological responses.

For example:

You use the same part of your brain as working memory to deal with speech.  Typing uses some other mysterious part, so you have more neurons to think and type with.  Working with a voice interface, especially if you are the one talking, automatically cuts down on your total processing power.  Also, some applications, like writing code, don't really lend themselves to a conversational workstyle.  When was the last time you talked about your code, line by line, verbally, in the office?  Chances are you dragged the other half of the discussion to a computer monitor and let them read it, pointing out the issues.

Voices, computer or human, automatically engage your emotions.  If you are extraverted, you will like faster, louder voices better (again, computer or human), and dislike soft, slower voices.  You will find male voices more authoritative, trust the content more, be more persuaded -- especially if YOU are male. 

There is a bunch of very interesting research going on in the Stanford Communication department. See a good overview slideshow at http://www.stanford.edu/group/SRCT/comm169/lectures/536,28,Voice Interfaces.  Start with slide 28. 

An interesting discussion of voice design:
http://www.acm.org/ubiquity/interviews/b_kotelly_2.html

For an example of voice personalities:
Northwest agent - efficient and concerned: 1-800-225-2525, press 1 for flight status  babe

Compass bank - stoned: 1-800-239-4357

Laurie Kleiner
Thursday, December 18, 2003

Cooper's idea is the same as Gates's that the user shouldn't care whether he is accessing the local machine or the internet.

After all with Unix you don't ever need to know where your home direictory is physically.

The problem of course is that users do want to make the distinction between the internet and their local computer. We don;t want the porn sites we accessed late at night to be in the favorites menu for our word processor in the morning; we know that our computer is safe from malicious code, but need strict security when accessing the internet; and the difference in bandwidth is important; I want to know whether the powerpoint presentation I'm looking at is going to snap from slide to slide because it's on my hard disk, or take ages to transition because I have to download it over the telephone line.

With the TV it's even more obvious, though I do wish I didn't have to have five remotes on the coffee table (two satellite recieivers, the TV, the video and the DVD player). Or at least that when I change channels on one satellite receiver it wouldn't change the channels on the other one at the same time!

Stephen Jones
Thursday, December 18, 2003

You often encounter lousy voice user interfaces (VUIs), since VUI expertise is much more scarce than GUI expertise. In general, a speech app with a first-rate VUI is a lot easier to use than a touch-tone app.

Julian
Thursday, December 18, 2003

Jilian: You often encounter lousy voice user interfaces (VUIs), since VUI expertise is much more scarce than GUI expertise...  GUI? on IVR?  Have you got a videophone?  (Sorry!  If you ever want to be bored, test IVR apps)

Philo: How about just hitting the 'DVD' button? ...because that's the way it works now... [grin]

What Cooper was really suggesting was a way to replace the handful of remotes that Stephen Jones mentioned.  Another box (PC?) you plug all your gear into, including whatever replaces DVD (that your remote doesn't have a button for :-) )

Rather than a DVD button, you have a Movie button, that selects whatever device plays movies.  PnP for entertainment.  Use firewire or USB.

Bruce Springsteen was right.  57 channels and nothing on.  Voice rec still sucks.

AJS
Friday, December 19, 2003

I've always blamed Star Trek for promoting voice rec.

I've a friend who fixes lifts.  The most common thing building managers want to do is make the lift doors open and close faster, 'you know, like on Star Trek'

AJS
Friday, December 19, 2003

When mentioning GUIs, I was simplying pointing out that a lot more people can design a decent web site than can design a decent speech app.

Julian
Saturday, December 20, 2003

That's all right, I just thought I'd make a crap joke.

Dunno about good web designers though...  With IVR, the quality of the audio tends to be the major point.  Gotta stop them sounding like robots, especially reading out numbers.

AJS
Sunday, December 21, 2003

*  Recent Topics

*  Fog Creek Home