Fog Creek Software
Discussion Board




Genetic Algorithm to Optimize Binary?

I'm curious as to the possibility of using genetic algorithms to optimize binary code.  Essentially, to do the same thing for a GCC-compiled program that would take an Asm programming team months to do, by using genetic algorithm at the binary level.  Set various parameters to modify how many 0's or 1's to change per generation, how many copies to make per generation, and how many generations to allow the "evolution of the program" to occur.  This seems like it could be an excellent way to optimize binary code very specifically for the CPU it's running on.  Has anything like this been done at the binary level?

dh003i
Tuesday, March 18, 2003

I don't think it would work on that low a level. Also, I think it would be difficult to apply to a complete program, because of testing.

Evolutionary programming requires that you be able to judge the fitness of a given iteration. In this case, you'd require that any genes that didn't pass all unit tests die off. Slow genes die off.

The problem is this: Your unit tests need 100% coverage, which is extremely difficult to attain.

Second, munging 1's and 0's is slow: your program has no guarentee to even run. Maybe working at the assembler level, *maybe*, but I doubt it.

I think you want to work at the algorithm level, rather than the program level. Recently a new, more efficient, sequence for Shell's sort was discovered using this method.

Besides, the biggest bottleneck in today's applications, generally speaking, is either human interaction or I/O. It doesn't matter how fast you can analyse the data if you can't pull it off the disk quickly enough.

Mike Swieton
Tuesday, March 18, 2003

Wouldn't it be easier just to get an infinite number of monkeys?

X. J. Scott
Tuesday, March 18, 2003

It was once thought that an infinite number of monkeys typing on an infinite number of typewriters would eventually recreate the works of Shakespeare.

Now, due to the Internet, we know this is not true.

Mike Swieton
Wednesday, March 19, 2003

Your algorithm would randomly change the bits of an executable?  That's not a great idea.  The resulting "executable" wouldn't even run, and if by chance it did, it would not do what it's supposed to.  If writing random bits to "foo.exe" could create a program, we'd be out of jobs.

Brian
Wednesday, March 19, 2003

look at corewars http://www.koth.org, http://corewars.sourceforge.net

Several hobbyists are using GAs to create programs better than hand-written programs.. and so far, in over 10 years, they have failed.  But a new bunch of keen hobbyists are making headway, so maybe they'll have a winner by the end of the year?

Look at http://redcoder.sourceforge.net/evolving.html

Nice
Wednesday, March 19, 2003

I'm well aware that the vast majority of random modifications wouldn't even run.  Just like in evolution, the vast majority of mutations are lethal.  Those programs which wouldn't run -- or wouldn't do what they're supposed to -- would be selected againts.

dh003i
Wednesday, March 19, 2003

The fundamental problem with this approach is that the search space is huge and that only a few solutions actually work let alone improve on the original. e.g. a 1kb function has 8192 bits, that's a search space of 2^8192 which I'm sure is much much larger than the number of subatomic particles in the universe. In this case you're much better off working at a higher level (e.g. the parse tree) with some good heuristics (e.g. an optimising compiler). You can always test a few candidate solutions to find the best of a good bunch.

Search for "genetic programming" in google (look for Koza) if you want to see a better (but still rarely useful) approach.

<rant>
I did an MSc in artificial intelligence and did GAs for my thesis and PhD thesis. It really annoys me when people say "why not use a GA/Neural Network/Reinforcement Learning to write programs/make money on the stockmarket/find a formula for world peace". GAs/NNs/RL are not magic black box techniques -- such things simply don't exist -- and it is simply ignorance that causes people to believe otherwise. The performance of any decent optimisation technique is proportional to the amount of problem-specific knowledge and CPU horsepower used. There are no exceptions.  If an off-the-shelf technique works on your problem then your problem is trivial. I welcome counter-examples.
</rant>

Tom Payne
Wednesday, March 19, 2003

I also worked in Neural Nets/GA for a while, and one of the things we found is the most important thing is to get the problem space right. You need to transform your problem into a format where the genetic mods represent significant and interesting variations. Effectively this means applying intelligence and understanding to the problem. Manipulating ones and zeros probably won't do it, for all the reasons mentioned above.

David Clayworth
Wednesday, March 19, 2003

A compiler optimizes a program for speed essentially by taking a logical model of the program and performing operations on the code that render the program logically equivalent but provide better performance given some assumptions about what is fast and slow in an architecture, what architectural resources are available, and how programs generally run.

The most common and simplest optimizations ("logical transformations") are loop unrolling, function inlining, simplifying if conditions, and reordering statements to speed up memory access and increase parallelism.  Modern compilers spend a lot of time trying to order operations so that memory loads are interleaved with parallel operations so that the processor doesn't sit around waiting for data to load from memory.  Compilers for Intel's new Itanium chips have to do a lot of the scheduling and some fairly heavy graph manipulation as well in order to get the best performance.

You could probably use a GA or other stochastic optimization approach if you could figure out a way to represent the space of valid transformations on a piece of code, and then you could evaluate fitness based on the speed of the resulting code.  The difficult part of this problem is coming up with a representation that doesn't render most of the produced programs invalid or unequivalent to the initial program.  Its unlikely that you could beat GCC for most standard processors, but for new architectures like Itanium, you might be able to have some interesting results.
 

Colin Evans
Wednesday, March 19, 2003

*  Recent Topics

*  Fog Creek Home