Fog Creek Software
Discussion Board




A big usability leap for browsers?

Probably for middle aged people/elederly who are not computer savvy, this is going to be good. My father would love a program where he can say "Show My E Mail" and it does it for him.  Opera has not reached there. But one day it will.

http://timesofindia.indiatimes.com/articleshow/578308.cms

Karthik
Sunday, March 28, 2004

There is a lot of voice command software which does that.

The problem is: they don't work very well.

MX
Sunday, March 28, 2004

And really awkward to use, unless it can also respond to "zara us *email* ko *open* karke padna" or even worse, "andhey *email*-ai  *open* panni padi"

Regards

KayJay

kayjay
Sunday, March 28, 2004

And what happens when someone walks past and shouts out the URL of an adult site while your boss is in your office talking to you about the future of your job?

Paul
Sunday, March 28, 2004

I remember playing with voice control software on windows 3.11 back in 1993, it was included with my sound card. (Can't believe it cost $300 back then for a 16bit Soundblaster)

It was the kind where you "trained" it on a phrase, and then you could use that phrase to open programs. It was quite good, but I never found it quicker than actually double clicking on an icon. And you kind of felt stupid sitting at the computer saying "word", or "excel".

Matthew Lock
Sunday, March 28, 2004

In about 1985 Apricot produced the Portable, a landscape format LCD screen and computer in the one box with an integrated microphone, about pencil size.

One of the marketing guys I know and worked with had the job of doing the promotional video.  He built a batch file that launched visicalc with a spreadsheet and ran a chart and associated that with the phrase 'Give me the sales figures'.

For years afterwards people would taunt him with 'give me the sales figures'.  But it did work.

So voice commands for specific actions have a long history, the complications turned out to be not so much background noise (which can be compensated for), as moving from single discrete commands to what amounts to a conversation.

Simon Lucy
Sunday, March 28, 2004

>>" for middle aged people/elederly who are not computer savvy"

Translation:  people who are too lazy to spend a little time learning how to use a mouse.

Attempts at voice-recognition have been around for a really long time and still don't work well.  The truth is, voice-recognition only appeals to a few people -- particularly corporate executives who view typing on a keyboard as menial clerical work that is beneath them.

My Cousin Vinniwashtharam
Sunday, March 28, 2004

We've spent thirty years moving from command line programs to GUIs. Now they want to go back to command line (okay, you say it instead of typing it, but you still need to remember all of the commands and enter them correctly).  That's better than a menu/icon/hyperlink design?

This might have a limited market for data collection, for example a lab technician speaking results of measurements when their hands are full or wearing gloves. Other than that it seems like a solution in search of a problem.

Anony Coward
Sunday, March 28, 2004

"Canned phrase" voice recognition has a very real niche among users who don't have hands free or can't look at a screen (drivers come immediately to mind).

For general use, voice recognition isn't going to sell until it can just plain work and we can dictate letters or emails with 99.999% accuracy. And of course then there will be even MORE pressure to give workers private offices. [grin]

Philo

Philo
Sunday, March 28, 2004

The Voice tags on my mobile work about 60% of the time. And I sure look stupid the other 40%

Stephen Jones
Sunday, March 28, 2004

"give workers private offices"

That's exactly why companies won't adopt voice software.  Large companies are pretty committed to cube farming, and unless you are in the hopefully-soon-to-be-dead industry of telemarketing and therefore talking all the time (and therefore used to filtering out your neighbors who are also talking all the time), it's really tough to get useful work done surrounded by people talking.

This makes me realize that for it to be useful it not only has to recognize voice a very high percentage of the time, it has to be able to do so in a situation with a lot of other people talking nearby.  Humans are pretty good at it, but until dictation software can follow the nearest speaker and ignore the three nearest neighbors (and beyond) it is pretty much screwed.

Aaron F Stanton
Sunday, March 28, 2004

One important market for voice-driven software consists of people who can't type due to physical disabilities.

For example, my wife used voice-recognition software to keep her programming job when she had wrist problems. Though it was rather frustrating for her to write Perl code in emacs by voice.

Julian
Sunday, March 28, 2004

Some research has shown that speaking uses auditory memory, which is in the same space as your short-term and working memory. It's hard to speak and think at the same time.

See:
http://www.washingtonpost.com/ac2/wp-dyn?pagename=article&node=&contentId=A56499-2002May8&notFound=true

While speech may help blind and disabled people interact with computers, it's unlikely to become the dominant way people connect with them.

From the article, University of Maryland's Ben Shneiderman says of speach recognition, "It's the bicycle of user interfaces. It gets you there  but it's not going to carry the heavy load that visual interfaces will."

Michael Bean
Sunday, March 28, 2004

Repeat after me:

Speech recognition is not Speech comprehension.

If TELLING your computer what to do where a great idea, why haven't command line interfaces (where you TYPE what you want) been popular?

Because it's damn difficult to remember the names of all of this stuff.

What people THINK they're getting is speech *comprehension* where the compute will UNDERSTAND them. But, of course, we're no where near that. In fact, it's not clear that we even know how language comprehension works.

Mr. Analogy
Monday, March 29, 2004

Mr. Analogy, that's precisely the point. Hence my emphasis on "email" & "open". I find it a bit strange that while Search Technology is advancing at so rapid a pace and with increasing sophistication and effectiveness, all depending on the notion of "Keywords", Speech Interface research is moving in the opposite direction, IMO.

I believe it would have been well worth the effort to focus on recognition accuracy on known Keywords (Pauses, Nouns, Verbs & Conjunctions) and disregard the rest of the spoken sentence as noise, rather than increasing the database with prepositions, conjugations, suffixes and other such nuances. In the email example, "blah blah bleh bloh Email blih bluh blah Open O bla di O bla da...." should make considerable sense in any application under any environment, for I cannot concieve of a situation where one Emails an Open.

This would still have the limitation of using English words interspersed in a sentence constructed with other languages. But that has not stopped the growth of Internet and of Computing, in general.

Regards

Kaushik Janardhanan

kayjay
Monday, March 29, 2004

"Translation:  people who are too lazy to spend a little time learning how to use a mouse."

Not everyone has the manual dexterity necessary to operate a mouse, either because of some sort of disability or because they are elderly, or for any number of reasons.

However, as Mr Analogy said, the problem is not that speech interface is a bad idea, it is that what people want is speech comprehension, and we are nowhere near that.  People want speech interface to be like what it is on star trek, when in fact right now the best we can do is design a command interface that operates on speech instead of typing

MikeMcNertney
Monday, March 29, 2004

And after you finish developing in New York you've got the New York accent down, but it can't understand Bostonians, can't understand Southerners, can't understand Canadians, and don't even get started on the British, Scottish, Irish, Australia and South African accents, people who speak English as a Second Language, people who have lispes or other speech defects/abnormalities, hybrid accents.

I think it's fair to say speech recognition is at least NP-complete. :)

If you want speech comprehension then you have account for all the local idioms on top of it....  Not to mention that the language is fluid and everything will be different in a while.

For speech processing to work, people will have to learn a way of talking to machines, just like I've learned to type a certain way even though it's unnatural way of writing.


Monday, March 29, 2004

Speech recognition will happen, but only when it stops being a botched command line over a windowing system. A good speech recognition system needs an OS designed for it (and a whole load better recognition and comprehension than we can manage right now).

The noisy office thing is a red herring, easily solved by using head-worn close-pickup microphones.

Mr Jack
Tuesday, March 30, 2004

*  Recent Topics

*  Fog Creek Home