Fog Creek Software
Discussion Board




Mars Rover Software Problems

I've heard on the news lately about software problems with the Mars rover. According to NASA, they're slowly resolving the problems.

I'm curious, though, if anyone here knows any more detail about these "software problems". All of the reports that I've heard have been very vague.

Benji Smith
Monday, January 26, 2004

They had a problem reading/writing to the on-board flash memory. During the period where they attempting to determine whether it was a software or a hardware problem, they put the rover in a mode where it would only utilize its on-board RAM instead of the flash drive. The problem has since then been put on the software, which can be hotfixed from a few million miles away.

SG
Monday, January 26, 2004

I don't know specifically about the problems, but while watching the Nova episode ededicated to this project, they said that the software is more complex than the hardware, and that they expected they'd have to patch the software while the rover was on Mars.

Since Earth & Mars are only near each other once every... is it 26 years, you don't get this opportunity often.

www.MarkTAW.com
Monday, January 26, 2004

It's a problem with the Flash Memory that lead to the onboard computer continually rebooting itself.  I believe I read this on the NASA website.


Monday, January 26, 2004

I believe its every 26-27 months that Mars lap earth in it's path around the sun.


Monday, January 26, 2004

It's the system admin's fault. He should have applied SP4 before launch.

B.Y.
Monday, January 26, 2004

/var got filled up.  Or the moral equivalent, since I doubt they're using Unix.  As an aside, I am deathly curious about the software architecture of the rovers ...

Debugging a system in which you have 20 minute lag and can only communicate with the rover for a few hours a day has got to be pretty extreme, too.  I wonder how it's done ...

Alyosha`
Monday, January 26, 2004

You're right, 26 months not years, though last year Mars was closer than it's been in over 60,000 years due to the elliptical orbits of both planets.

www.MarkTAW.com
Monday, January 26, 2004

I doubt that Mars has to be close to us though to relay data.  Like they are going to spend 3 weeks getting it back to normal.  They spend 3 days making it move one meter, so I don't feel like they are in a rush to do things.

I don't understand why they didn't just build 2? and put them together?  Isn't all the money spent on R&D and software?  The hardware should be a small portion of that.  I realize they have the other one on the other side of mars now, but it would have been more useful if there was a second rover taking pictures of the first.

Anway, I doubt they'll get it up and running again.  The martians smashed it up pretty good probably and have already retreated to their underground caverns.

Michael H. Pryor (fogcreek)
Monday, January 26, 2004

I would imagine you spend 20 hours coming up with ideas and ways to test them, and 4 hours desperately trying to communicate with the rover. Then another 20 figuring out what the results your getting mean and so on.

www.MarkTAW.com
Monday, January 26, 2004

"it would have been more useful if there was a second rover taking pictures of the first"

Yes, because they would've gotten so much more scientific data by having both rovers exploring the exact same area.

www.MarkTAW.com
Monday, January 26, 2004

Not to mention the insight that we could gain by studying photos of Mars Rovers, taken by Mars Rovers.

Benji Smith
Monday, January 26, 2004

Alyosha,

They have a simulator that they try everything on here. Also, they must be putting together complex checklists on what to commands to send and what to do for each possible response. They probably had some sort of fault tree worked out prior to the probe landing.

pdq
Monday, January 26, 2004

Quit being obtuse.

How else would we have gotten pictures of the Martians stomping on the first rover if the second rover didn't have a camera?

And you are overlooking the fact that you could send them in different directions and then when the one falls in a hole it can't get out of, you still got the other one.  When one gets it wheel stuck, the other one can still go.  When the martian spectromomonomanator machine goes kaput on the one, you got the other one.  When the intern at JPL uploads the code to turn the wheel one rotation and accidentally moves the decimal point and pushes the rover off the cliff, the other rover can send us pictures of the rubble.

Anyway, the 2 rover approach is exactly what NASA is doing now.  Cheaper missions and more of them with a higher chance of failure.  It cost too much to get down to .0001 failure rates.

Michael H. Pryor (fogcreek)
Monday, January 26, 2004

The latest news feeds explain that the filesystem is probably full on the flash memory.

[Trosper said the "too many files" problem was the current leading theory for why Spirit failed last week. She said mission planners had not expected the rover's flash memory to accumulate so many data files during its trip to Mars and investigations on the surface.

"This is a new problem that we've encountered, based on having many files," she said.

During the weeks-long process of reviving Spirit, hundreds of unneeded files from Spirit's cruise phase would be deleted, she said, and controllers would keep a closer eye on memory management for Spirit as well as Opportunity. "We will be more conscious of this limit that we have," she said.]

The full article is at http://www.msnbc.msn.com/id/4042603/

I read somewhere that the rover software was written in Java. I haven't been able to locate the actual article I read that on.

Slartibartfast
Monday, January 26, 2004

Written in Java?  I think I see the problem ...

Actually I'd sincerely doubt it's written in Java.  A Java VM has too many moving parts that would have to be extensively tested -- moreover, it would also take up a lot of memory.  If I were to guess, I would bet it's a C solution on a homegrown operating system.

"How would you design (or debug) a Mars Rover" should be a kickass interview question, by the way.

Alyosha`
Monday, January 26, 2004

The Mars Rover seems to do a good enough job of taking pictures of itself. 99% of the pictures it sent back are of it's own wheels. Though I guess it might get lonely.

I think it also sucks that it's already filled it's Flash ROM drive with Martian Porn. I'm sure they'll learn their lessons for the Venus Rover.

www.MarkTAW.com
Monday, January 26, 2004

In case somebody wants to know what is really going on ...

The Mars Rover software is not written in Java as some dumbass mentioned - that is a practice nav-aid used on the earth end to test and plan Rover travels - they use VxWorks from Wind River for the actual Rover firmware.  In case you missed it, Wind River was all over the place with press releases mentioning the fact when the Rover woke up on schedule and started moving around as directed using their stuff.  Not quite so much PR action from WR since the beast started having troubles ...

From what I read in the NYT, the last thing NASA tried to do was download a complete set of new tasks to the Rover, and during the transmission the atmospherics at the ground station involved went haywire and the download blew off.  The Rover acknowledged OK and then promptly went "tits-up", as we say in the technical trades.  Anyone clued into the various stream error correction algorithms knows what (in a worse case scenario) might have happened, probably did happen, and hence we are where we are. 

Broken, but fixable.  Amen, Brother.

Mitch & Murray (from downtown)
Monday, January 26, 2004

The Rover itself's OS is not in Java, but the analysis software for looking over the data that comes back is not only written in Java but you can download it off of nasa's web site and run it on in Windows, Linux and OS X with the original data files.

Check out the interactive 3D Mars landscape! Whee!! This is fun flying around on mars! Wheee! Wheee!!!

(Check out the rover's 3D model - you can even go 'inside' the rover and inspect it's drive mechanisms.)

Dennis Atkins
Monday, January 26, 2004

Alrighty, this is a new term:  Rover.

As in, "Oh my god!  My system got ROVERed!"  (got full on memory).

T.J.
Tuesday, January 27, 2004

"The Mars Rover software is not written in Java as some dumbass mentioned"

Hey Mitch, lighten up will ya? The poor guy just said that is what he heard. Yeesh.

Been spending too much time on /.?

I Hate Whiners
Tuesday, January 27, 2004

http://www.cnn.com/2004/TECH/space/01/16/space.mars.java.reut/

SomeBody
Tuesday, January 27, 2004

" The latest news feeds explain that the filesystem is probably full on the flash memory. "

How can it happen?
I  mean, I wouldn't be surprised if it was software programmed by a junior developper in a startup company. But from NASA? With all the error checking someone else here described?
Not to mention this not the first time. IIRC a martian probe already crashed because of a bug and the first flight of CEA's Ariane 5 failed due to a counter overflow bug.

Astrobe
Tuesday, January 27, 2004

oops it's ESA's Ariane 5.

At the Appolo missions epoch CS was at stone age; may I suggest to include a software guy in future mission-to-Mars teams ?

Astrobe
Tuesday, January 27, 2004

Re: Multiple Rovers

There's an article on CNN by the lead of the Beagle 2 team who said that next time they're basically going to shotgun a whole bunch of rovers over in the hopes that a couple make it.

MR
Tuesday, January 27, 2004

"may I suggest to include a software guy in future mission-to-Mars teams"

That was tried in a few simulations. The first crew murdered the IT chump after 18 days. In the second simulation the other two astronauts hung themselves on day 15. The third simulation was cancelled after two mission members slith their wrists on day 21. The mission log stated they would rather bleed to death by sucking their own veins dry than to listen to one more UserFriendly joke.

Just me (Sir to you)
Tuesday, January 27, 2004

>Been spending too much time on /.?

IHW, what's /.?


Tuesday, January 27, 2004

"IHW, what's /.? "

/. = Slashdot.org, commonly known as slashdot, or just /.

I Hate Whiners
Tuesday, January 27, 2004

More detail on the rover internals:  http://space.com/businesstechnology/technology/mer_computer_040128.html

Been spending waaaay too much time on JPL's website lately, I have...

- Mike C

Mike C
Wednesday, January 28, 2004

No it is not any of those things that brought down Spirit.  It was SPAM.  Spirit's mailbox is full and not accepting any messages.  But it is sending out regular ads for mortgages and viagra.

Jim c
Saturday, January 31, 2004

*  Recent Topics

*  Fog Creek Home