Fog Creek Software
Discussion Board

Self modifying code

Do you/have you ever written self-modifying code?

I just wrote my first one for a few years (mix of C + assembler)

S. Tanna
Monday, May 19, 2003

I was somewhat involved with debugging some once.  That's an experience I wouldn't care to repeat.  Talk about complex test cases and bizarre bugs. 

Interesting question though.  At the time, I did wonder how often it was used outside of insane dot-coms.  When would you use self modifying code?

Monday, May 19, 2003

I have not. I've considered it, but the languages I'm proficient in (C, C++, Java), don't really do that well at all.

Personally, I fear self-modifying code 8-} I've heard it can be done well. I've read (perhaps here) that some of Yahoo's store was written in Lisp or some variant and performed exceedingly well.

That same post I believe mentioned that Yahoo rewrote it because no one understood it.

In any case, even if it never happened, one can see how it could have, at least.

Does anyone know of actual use of self-modifying code being extensively used in a real-world project? Something that actually, successfully, uses it?

Mike Swieton
Monday, May 19, 2003

I didn't write it, but someone on our project did:

We were avoiding an audio encryption patent which would expire a year after product shipped.  The patent was hugely broad and covered any encryption of audio over a transmission channel.  No mention of encryption type, etc. So, for the first year, audio was sent in the clear - no encryption.  When the patent expired the MPEG encoders would switch on audio encryption and the code would enable decryption. 

However, to avoid the patent the code had to be incapable of decryption, not merely disable the feature.  Additionally, this was back before cheap and widespread use of FLASH devices, and the microcode was held in ROM.  So the ROM code was written with a hard-coded jump to RAM, execute code if available, and return.  When code was available in RAM, it would perform keying of the decryption device when the patent expired.

So, the day that the patent expired, the patch code was downloaded to receiver, which had a small amount of EEPROM memory - enough to hold the patch.  Then every time the box booted, it copied the EEPROM code to the special RAM location, which when keying was enabled, would push decryption keys into the decryption device.

Not exactly smooth, but it was self-modification of the text segment.  And, we avoided paying those leaches a cent.  That was the coolest part.

Nat Ersoz
Monday, May 19, 2003

for fun, check out Corewars ( , ); self-modifying code can be a plus!...

a corewars fanatic
Monday, May 19, 2003

> When would you use self modifying code?

Some games use something like a sprite- or object- compiler.  What they do is based on the graphic design data read from a file or something, generate the optimal assembly for performance.  In other words rather than know (say) how to do a general sprite display algorithm, they know how to generate the best code for all the specific cases that they will need..... I guess this kind of logic could apply to lots of problem domains.

In my app, I am using it as part of a copy protection scheme. 
- Part of the user's machine specific "password" is used to generate machine code, which is later used in the application (essential to correct functioning of the app). 
- A cracker might be able to generate a password which the application accepts, but the chance of generating one which generates correct machine code (as opposite to different but valid machine code [more likely], or totally invalid machine code [less likely]) is remote.

S. Tanna
Monday, May 19, 2003

Check out the Synthesis operating system. It was a university research project developed in the early 1990s. It would generate new code on the fly, allowing extremely optimized code. For example, when opening a file, you would not get a file descriptor or handle to use in the other file APIs. You would get some code that was generated and optimized specifically for the caller's needs, such as file name, buffer sizes, locking semantics.

I heard that Windows 3.x did something similar with runtime generated code to improve performance.

Monday, May 19, 2003

That's laughable.

Win 3.1 self-modifying code was used to disable any compatability with non-MSFT DOS vendors:

Nat Ersoz
Monday, May 19, 2003

It also goe NOOPed out of the release code because it was discovered by DRI and shown to MS.

Simon Lucy
Monday, May 19, 2003

I thought it was discovered by Andrew Schulman (who wrote Undocumented Windows)

I'm absolutely sure that it says that in both his own writings and in at least 1 book about the anti-trust case. I would go find the references if I was not do darn lazy.

I'm also sure it was discovered AFTER the code had been removed, and possibly after Windows 3.1 was released.

What happened
- Journalist hints to Schulman there might be an anti-DRDOS function in beta versions of Windows 3.1
- Schulman checks current (I think the full release) Windows 3.1 - no such code shows up
- Schulman starts working back thru previous beta releases, and eventually finds the code in one of the betas. It basically causes a warning type error, but if the user ignores it, everything is okay from then on. i.e. make the user nervous type thing.
- The code is obfusticated (self encrypting etc) and appears in 5 programs (i.e. concerted effort). Apparently signed AARD, which allegedly supposedly means "Aaron Reynolds" (I think somebody fairly senior at MS)
- In later versions the code is still present, but NOPed out

I am bit vaguer on these bits, but IIRC
- Schulman says the test is artificial - not a real compatibility issue, but a fake one
- It comes out in the anti-trust case, possibly with associated papers or emails from MS.

S. Tanna
Monday, May 19, 2003

The night before DR DOS 6 was to ship we heard that the beta 3.1 wouldn't run.  I can't at this remove remember what it was that was broken, but I'm pretty sure it was that.

I'm also pretty sure that we didn't publicise what was discovered about what was done and when.  We also didn't publicise the left over bits of CP/M that were (allegedly) still in the MS DOS code but which no one at MS knew the purpose of. 

Simon Lucy
Monday, May 19, 2003

And another discoverer:

Calera vs Microsoft
See the quotes on page 7

S. Tanna
Monday, May 19, 2003

Yes, in an Atari600XL, some twenty years ago. Nothing fancy, plain Atari Basic. I used this a lot for math functions evaluations, instead of parsing the whole thing.

Monday, May 19, 2003

> Some games use something like a sprite- or object- compiler.  What they do is based on the graphic design data read from a file or something, generate the optimal assembly for performance.

I think what you're describing here is optimisation of data structures, particularly for scene traversal and 3D rendering. Data describing the world is read from file then processed into fast traversal and rendering structures, with extensive redundancy. That type of optimisation doesn't involve runtime modification of code and in fact I haven't heard of this being done in that context. FWIW.

Tuesday, May 20, 2003

Very interested in the registration potential, but.

Tuesday, May 20, 2003

I have no desire to trudge through a bunch of legal documents, but out curiosity, what was the error message in the dr-dos issue here?

Interesting, if the error message was just a warning about compatibility, then I actually don'’ see much of problem with this behavior. If the code BROKE dr-dos, then I certainly have an issue.

I mean, if huge numbers of windows start being used, and a major bug is found due to dr-dos, then Microsoft is then complete hostage to dr-dos. In other words, why should ms risk huge customer problems due to people who running  dr-dos in place of ms dos to run windows? Why should they take this risk? What happens if a update to dr-dos, or some serious bug is found, does not MS have to refund all those customers who bought windows?

After all, dr-dos just has to claim that the windows failure or problem is due to windows, and not dr-dos. You can see, that is his a huge risk, and that problem in dr-dos can risk your whole consumer approval of your product. It seems silly take such a risk, and trust a core part of the os (dos) to another vender.

Of course there is also the issue of MS not wanting customers to use dr-dos. (we would have to be stupid not to admit, or realize that).

So, if this special windows 3.1 code broke dr-dos, then I cannot support MS’s action in any why, shape, or form.

However, if they just warning the customer about compatibility, then I see that is not only a good thing but in fact their DUTY TO inform the customer. MS has NO control over dr-dos. They have no way of knowing if the dr-dos will not cause problems with windows. It is impossible for them to guarantee that windows will function correctly. They are DUTY BOUND to inform the customer.

Can you imagine if they did not do this, and there becomes a problem after millions of users purchased windows, and risked running it on dr-dos?

I not sure I see any problem here.

Does anyone have the text of the error message posted?

Albert D. Kallal
Edmonton, Alberta Canada

Albert D. Kallal
Tuesday, May 20, 2003

Found it...

Non-Fatal error detected: error #2726
Please contact Windows 3.1 beta support
Press ENTER to exit or C to continue
Program terminated.

That looks not good. They should have just put in a message about compatability ...and they could have left it in the produciton version also! The above don't look good...but then it was beta....

Albert D. Kallal
Tuesday, May 20, 2003

Windows would halt during the load.

Nat Ersoz
Tuesday, May 20, 2003

Actually, it is not clear when/how the error message is to be generated.

The above message was generated when useing the “debug” option. Hardly a consumer opiton…

Albert D. Kallal
Tuesday, May 20, 2003

Sorry, I was wrong - it did not halt.

Nat Ersoz
Tuesday, May 20, 2003

Heh, this reminds me of Genetic Programming....a bunch of programs (usually written in Lisp) of 'random' instructions.

A test of fitness is used to determine the most fit programs (with respect to the problem they are supposed to solve)...which are mated by splicing together their parse trees. Repeat...

Although the programs are usually not self-modifying, they are modified by a 'master' program.

Michael Chapman
Tuesday, May 20, 2003

> I think what you're describing here is optimisation of data structures, particularly for scene traversal and 3D rendering. Data describing the world is read from file then processed into fast traversal and rendering structures, with extensive redundancy. That type of optimisation doesn't involve runtime modification of code and in fact I haven't heard of this being done in that context. FWIW.

No I am describing games with real self modifying code.

On the Atari 2600 (remember that):

In Windows 9x (this is a good how to for beginners)

Referenced for 6x86 (i.e. fairly recent)

Google for it, and you'd be surprised how much comes up

S. Tanna
Tuesday, May 20, 2003

I remember an article from Creative Computing back around 85-86 maybe? It was about graphics programming for the Apple ][.

For those who don't know (and that's probably everybody at this point) the Apple video ram was set up rather strangely. Pixel row zero was followed in RAM by pixel row 32 (I think that's it). Pixel row 1 was at address (row 0 + 32 rows). It was all interleaved.

I'm sure that made the hardware easier somehow, but it was a real pain to do programming for. Plus the 6502 wasn't exactly the most powerful CPU - it lacked some of the more powerful addressing modes, and was limited to offsets of 256.

So, anyway, the article I read showed how to build self-modifying machine code that would adjust itself on the fly as a sprite moved around to do the correct memory calculations and render quickly. A neat approach that's totally useless today. ;-)

Chris Tavares
Tuesday, May 20, 2003

> However, if they just warning the customer about compatibility, then I see that is not only a good thing but in fact their DUTY TO inform the customer. MS has NO control over dr-dos. They have no way of knowing if the dr-dos will not cause problems with windows. It is impossible for them to guarantee that windows will function correctly. They are DUTY BOUND to inform the customer.

Actually all the "problems" are allegedly deliberately manufactured ones.  They look for an exotic and arbitary combination of factors indicating DR-DOS.  None of this exotic combination has any known effect on anything in Windows, except whether  to generate the error message.

It is one thing to detect a real problem and report it.

It would be another thing entirely to manufacture a problem when one doesn't exist, merey to undermine your competitor.

The Andrew Schulman link given by somebody earlier describes the technical aspects of the problem. 

The CALDERA vs Microsoft link I gave (and no I don't want to read it all either, but the quotes from MS employees on page 7 are quite revealing), one such quote says (typos are mine, read the original if you don't want typo):-

"what the guy is supposed to do is feel uncomfortable, and when he has bugs suspect the problem is dr dos and then go out and buy ms dos. or decide not to take the risk for other machines he has to buy for the office."

S. Tanna
Tuesday, May 20, 2003

Yes, I totally agree about the “manufactured” problem.

I think a nasty message complaining and warning about stability, and compatibility problems would have been sufficient here.

In fact, from the point of view of risk, and liability, MS is duty bound to give some type of warning. However, that  is different then making code that breaks the system.

Of course MS wants to discourage the use of dr-dos. The question is did they break it, or did windows break on purpose when running it?

Albert D. Kallal
Tuesday, May 20, 2003

I used to use self-modifying code (or actually run-time compilation) for sprites, just like S.Tanna said. But that was a decade ago and the target platform was a 80286. The idea was to put the sprite data in the code as immediate values, avoiding fetching and comparing data (is this a transparent pixel? is this a transparent pixel?) all the time. It was very fast indeed, but nowadays we have code caches and things like that, so I think it would be just stupid, at least if there are lots of large sprites.

Tuesday, May 20, 2003

S Tanna, thanks for those links. I've learnt something today.

Tuesday, May 20, 2003

I doubt there would be too much use using self modifying stuff for a sprite these days, the CPU is probably just too fast to make it worth the bother - but I guess it might save a few cycles on something like texture mapping.

S. Tanna
Tuesday, May 20, 2003

python is a playground for writing self-modifying code.

fool for python
Tuesday, May 20, 2003

Okay, I'm definitely a Python zealot, but IMHO using Python for modifying code via code actually *sucks*.

Code generation?  Yep.  Partial evaluation and higher order functions?  Most definitely.  Metaclasses?  You betcha.  *Modifying* code in code?  No way, Jose.

Really, the lack of a high-level way to manipulate Python code structures is about the only thing Python is missing to make it as capable as Lisp, but without all the damn parentheses.  :)  To do any kind of code modification in Python, you either have to:

1. manipulate pure source code and go through parsing hell

2. Use the built-in parser, and then deal with ridiculously complex parse trees with 10 levels of nested lists to represent an assignment statement like "a=1"!

3. Manipulate bytecode directly

4. Create your own parsing/compiling solution

Of course, in most other languages this doesn't even come up because you don't get this far before giving up.  Or, you only get to choose *one* of these ways to not solve the problem.  :)

Of course, none of this really relates to self-modifying code, anyway.  Python classes and modules aren't "code", they're objects.

Phillip J. Eby
Tuesday, May 20, 2003

I just realised I use sort of self modifying code, more recently than my most recent example. In PHP I have used eval for something where I couldn't find another work round.

I guess there are different levels of self modifying.

1. Code that really changes itself

2. Code that generates code to then be executed

And a fuzzy gray line betwixt the two

S. Tanna
Tuesday, May 20, 2003

I used to play a lot with self modifying code in x86 assembly. It was very fun before I learned the fun of doing useful things.

I've seen other people use self modifying code in copy protection systems, viruses, and processor detection algorithms. A popular trick is to modify an instruction that is already in the processor's prefetch (somewhere between 6 to a couple dozen bytes ahead of the currently executing instruction, depending on the model). The processor runs the unmodified instruction since it has already loaded it from memory. However, someone running your application in a debugger will run the modified code (and get sent off on a wild goosechase, or better, a hacker flag is saved to disk...).

There's all kinds of cute tricks like that.. You can jump into the middle of an instruction to confuse a debuggers instruction alignment, so that MOV AX,0x1234 is display on the screen but 0x1234 is really the instruction that is being executed. And there are hours of more fun with PUSH and POP...


Tuesday, May 20, 2003

In regards to Yahoo store: It was originally written in LISP as "Viaweb" by Paul Graham:

Daniel Searson
Tuesday, May 20, 2003

yes Lisp is much more self aware than python. But from the other direction Java has a separate reflection api just to look at stuff. With python, it's part of the language.

fool for python
Tuesday, May 20, 2003

The runtime code for the language XPL0 used self-modifying code on the 6502 processor for performance and space reasons.

The 6502 processor only had an absolute jump instruction; you couldn't index into a jump table. The XPL0 compiler produced pcode, and the pcode interpreter would use the pcode instruction as an index into a table of the addresses of routines that implemented the pcodes. It would load the address and overwrite a jump instruction a few instructions later. When the jump instruction executed, it transferred control to the routine that handled that particular pcode instruction.

Also, the Forth language had a concept called "deferred words" that was effectively a way to allow self-modifying code at a high level. I understand that at least some LISP implementations allow something similar.

Steve Wheeler
Friday, May 23, 2003

Self-modifying code was needed in languages like BASIC before there was dynamic memory allocation (i.e. pointers), to accomplish certain tasks.

In a modern language, you would just use pointers or nested data structures in a situation where self-modifying code would have been previously needed.

Wednesday, June 4, 2003

There are still uses for SMC.

For example, in this C++ code:

int inner, k;
cin >> inner;
for (int i =0; i<5000; i++) for (int j=0; j<inner; j++) k++;

Using SMC to dynamically unroll the loop in the middle can give a big performance boost.

Also, other cases are like, removing an if statement if it is only used once, like this:

string k;
cin>> k; (from now on, k isn't modified)

for (int i=0; i< 100; i++){
if (k=="ABC") doblah();
else if (k=="DEF") doblah2();
... 100 different things

Statically, you can't remove the if/else if statements. If K was an int, maybe you could replace it with a jump table, but here, there is nothing you can really do but if it all the way down.

Using SMC to just make a single jump or call (unrolled 100 times) you can do the whole if statement process once.

Wednesday, July 28, 2004

I have written some self-modifying code about 8 years ago.  The code in question was written in x86 Assembly Language and was used to perform a check for what video card a user had one time (instead of everytime the video needed to be accessed) and then changed code throughout itself to handle that specific card type.

Carson Ball
Monday, August 23, 2004

*  Recent Topics

*  Fog Creek Home