Fog Creek Software
Discussion Board




Self-reproducing code

I was reading the Green Hills commentary on Linux insecurity that has been slashdotted to death.  But there was a reference to Ken Thompson's classic article (http://www.acm.org/classics/sep95/) "Reflections on Trusting Trust". In it, he discusses why compilers -- written in the language they compile -- can't be trusted.

As an exercise, he proposes the following programming challenge: "...the problem is to write a source program that, when compiled and executed, will produce as output an exact copy of its source. If you have never done this, I urge you to try it on your own. The discovery of how to do it is a revelation that far surpasses any benefit obtained by being told how to do it. The part about "shortest" was just an incentive to demonstrate skill and determine a winner...."

Strangely enough, I don't think I'd ever attempted such an exercise before. So after a longer programming session than I'd originally envisioned, I created the following self-reproducing PHP script which I release to the world with all the caveats and limitations of liability specified by a Microsoft End-User License Agreement.

Anyhow, here it is. I'd be interested to find out if this is woefully inefficient, etc.  My 'lessons learned' for PHP were:

- Automated variable expansion in double-quoted strings can be a hindrance in an exercise such as this
- Escapements, escapements, escapements!

Here's hoping this publishes right...

<?php
    function Esc($s) { return (str_replace(chr(0x27), chr(0x5c).chr(0x27), $s)); }
    $aSrc = array(
        'echo("<?php\r\n");',
        'echo(\'    function Esc($s) { return (str_replace(chr(0x27), chr(0x5c).chr(0x27), $s)); }\'); echo("\r\n");',
        'echo(\'    $aSrc = array(\'); echo("\r\n");',
        'for ($i = 0; $i < sizeof($aSrc); $i++) {',
        '    echo(chr(0x09).chr(0x09)."\'".Esc($aSrc[$i])."\',".chr(0x0d).chr(0x0a));',
        '}',
        'echo("\t);\r\n");',
        'for ($i = 0; $i < sizeof($aSrc); $i++) {',
        '    echo(chr(0x09).$aSrc[$i].chr(0x0d).chr(0x0a));',
        '}',
        'echo("?>\r\n");',
    );
    echo("<?php\r\n");
    echo('    function Esc($s) { return (str_replace(chr(0x27), chr(0x5c).chr(0x27), $s)); }'); echo("\r\n");
    echo('    $aSrc = array('); echo("\r\n");
    for ($i = 0; $i < sizeof($aSrc); $i++) {
        echo(chr(0x09).chr(0x09)."'".Esc($aSrc[$i])."',".chr(0x0d).chr(0x0a));
    }
    echo("\t);\r\n");
    for ($i = 0; $i < sizeof($aSrc); $i++) {
        echo(chr(0x09).$aSrc[$i].chr(0x0d).chr(0x0a));
    }
    echo("?>\r\n");
?>

dir at badblue com
Tuesday, May 11, 2004

Well, it came out sort of okay, but the wrapping makes it look kinda fugly.  Anyhow, one caveat for doing this is - you can't count on the source file being present (only the "compiled" version, the equivalent of an encoded PHP assembly)...

dir at badblue com
Tuesday, May 11, 2004

I once posted the same (quines) here and it got deleted. Hope you have better luck.

Of course I used the term "VB6", so that could be an outright disqualifier. <g>

Anyways more info here

http://www.nyx.net/~gthompso/quine.htm

and the VB6 version
<code>

Sub Main()
    Debug.Print a("Sub Main()|Debug.Print a(%)|End Sub|Function a(t)|z = Chr(34)|a = Replace(Replace(t, Chr(124), vbCrLf), Chr(37), z & t & z)|End Function")
End Sub


Function a(t)
    z = Chr(34)
    a = Replace(Replace(t, Chr(124), vbCrLf), Chr(37), z & t & z)
End Function
</code>

The above is thanks to a very Buggy Programmer.

KayJay
Tuesday, May 11, 2004

I am not entirely sure what the application of self-reproducing code is, but it would be cool if programs could modify themselves as they are running. You can make changes in memory, but you cannot save the changes since the file is already opened by the OS.

Is there a way to do this without creating temp files and such?

grunt
Tuesday, May 11, 2004

C++ version.

The trick (of course) is to write something that generates code that generates itself. Kinda of like writing a compiler to compile the code that created the compiler.

#include <fcntl.h>
#include <io.h>
#include <sys/stat.h>
#include <string>

using namespace std;

int ChangeStr(string &strDest, bool bMatchCase, const string &strNeedle, const string &strHaystack, const string &strNewNeedle) {int iReplacementCount=0;int iNeedleCount=0;for(int iHaystackCount=0; iHaystackCount<strHaystack.size(); iHaystackCount++){char ch=strHaystack[iHaystackCount];if(bMatchCase ? (toupper(ch)==toupper(strNeedle[iNeedleCount])) : (ch==strNeedle[iNeedleCount]))    {iNeedleCount++;if(iNeedleCount>=strNeedle.size())    {iNeedleCount=0;strDest.append(strNewNeedle);iReplacementCount++;}continue;    }iHaystackCount-=iNeedleCount;iNeedleCount=0;ch=strHaystack[iHaystackCount];strDest+=ch;}strDest.append(strNeedle, 0, iNeedleCount);return iReplacementCount;}

void main(int argc, char *argv[])
{
    if(argc!=2)
        return;
    string strSource("#include <fcntl.h>\r\n#include <io.h>\r\n#include <sys/stat.h>\r\n#include <string>\r\n\r\nusing namespace std;\r\n\r\nint ChangeStr(string &strDest, bool bMatchCase, const string &strNeedle, const string &strHaystack, const string &strNewNeedle) {int iReplacementCount=0;int iNeedleCount=0;for(int iHaystackCount=0; iHaystackCount<strHaystack.size(); iHaystackCount++){char ch=strHaystack[iHaystackCount];if(bMatchCase ? (toupper(ch)==toupper(strNeedle[iNeedleCount])) : (ch==strNeedle[iNeedleCount]))    {iNeedleCount++;if(iNeedleCount>=strNeedle.size())    {iNeedleCount=0;strDest.append(strNewNeedle);iReplacementCount++;}continue;    }iHaystackCount-=iNeedleCount;iNeedleCount=0;ch=strHaystack[iHaystackCount];strDest+=ch;}strDest.append(strNeedle, 0, iNeedleCount);return iReplacementCount;}\r\n\r\nvoid main(int argc, char *argv[])\r\n{\r\n    if(argc!=2)\r\n        return;\r\n    string strSource(\"~\");\r\n    \r\n    string strNewSource, strTemp;\r\n    ChangeStr(strTemp, false, \"\\\\\", strSource, \"\\\\\\\\\");\r\n    strNewSource=strTemp;\r\n    strTemp.erase();\r\n\r\n    ChangeStr(strTemp, false, \"\\r\", strNewSource, \"\\\\r\");\r\n    strNewSource=strTemp;\r\n    strTemp.erase();\r\n    \r\n    ChangeStr(strTemp, false, \"\\n\", strNewSource, \"\\\\n\");\r\n    strNewSource=strTemp;\r\n    strTemp.erase();\r\n\r\n    ChangeStr(strTemp, false, \"\\\"\", strNewSource, \"\\\\\\\"\");\r\n    strNewSource=strTemp;\r\n    strTemp.erase();\r\n    \r\n    \r\n    string strTilde;\r\n    strTilde+=(char)126;\r\n    ChangeStr(strTemp, false, strTilde, strSource, strNewSource);\r\n\r\n    int iFile=open(argv[1], O_WRONLY|O_CREAT|O_BINARY|O_TRUNC, S_IREAD|S_IWRITE);\r\n    write(iFile, strTemp.c_str(), strTemp.size());\r\n    close(iFile);\r\n}\r\n\r\n");
    
    string strNewSource, strTemp;
    ChangeStr(strTemp, false, "\\", strSource, "\\\\");
    strNewSource=strTemp;
    strTemp.erase();

    ChangeStr(strTemp, false, "\r", strNewSource, "\\r");
    strNewSource=strTemp;
    strTemp.erase();
    
    ChangeStr(strTemp, false, "\n", strNewSource, "\\n");
    strNewSource=strTemp;
    strTemp.erase();

    ChangeStr(strTemp, false, "\"", strNewSource, "\\\"");
    strNewSource=strTemp;
    strTemp.erase();
    
    
    string strTilde;
    strTilde+=(char)126;
    ChangeStr(strTemp, false, strTilde, strSource, strNewSource);

    int iFile=open(argv[1], O_WRONLY|O_CREAT|O_BINARY|O_TRUNC, S_IREAD|S_IWRITE);
    write(iFile, strTemp.c_str(), strTemp.size());
    close(iFile);
}

Mark L. Smith
Tuesday, May 11, 2004

Grunt,

With the DotNet languages, you can get very close to dynamic self-modifying code.  Using reflection, you can dynamically generate a new assembly, compile it to IL, and then execute it on the fly:

http://www.ondotnet.com/pub/a/dotnet/excerpt/prog_csharp_ch18/?page=9

The only restriction is that you can't modify the currently executing code, so you need to have two assemblies -- one that controls the code generation process (which can't be modified) and the dynamically generated assembly (which could be rewritten.)  There's a whole book that was just released on code generation techniques:

http://www.apress.com/book/bookDisplay.html?bID=212

It looks like lots of fun, but I haven't yet found a compelling need to do something like this.

Robert Jacobson
Tuesday, May 11, 2004

How about a service that receives XML files -
If a file is received which doesn't match an existing schema, the application builds a schema for it, creates the library to manage the schema, builds the relevant database tables and stored procedures, processes the file, and parses all future files matching that schema using the created code and tables.

Philo

Philo
Tuesday, May 11, 2004

Satyaish should be overjoyed with that, Philo! (c.f. the "Diggler" thread)

KayJay
Tuesday, May 11, 2004

Obligatory perl entry:

----------
#!/usr/bin/perl

seek( DATA, 0, 0 );
print while( <DATA> );

__DATA__
----------

See SelfGOL [1] for a truly amazing (ab)use of this technique

[1] http://libarynth.f0.am/cgi-bin/view/Libarynth/SelfGOL

Chris Winters
Tuesday, May 11, 2004

The Python version has a cute face in it:

f = open("lines.py")
l = f.readlines()
for r in l:
  print r[:-1]

Tom H
Tuesday, May 11, 2004

#!/usr/bin/env python
from inspect import getsourcelines
from sys import modules
for line in getsourcelines(modules[__name__])[0]:
    print line,

John Eikenberry
Tuesday, May 11, 2004

Microsoft .BAT file:

@type %0


:-)

I Was Never Here
Wednesday, May 12, 2004

Yeah well printing out the source file is out of the question.

The >>.exe<< must spit out its own source code.

Alex
Wednesday, May 12, 2004

"If a file is received which doesn't match an existing schema, the application builds a schema for it, creates the library to manage the schema, builds the relevant database tables and stored procedures, processes the file, and parses all future files matching that schema using the created code and tables. "

Philo,

in all honesty that sounds like a disaster waiting to happen. You just defered control of your whole system to whatever input one is willing (by mistake or intentional) to trow at it. Even if access is controlled, and no "mistakes" assumed, what about schemas that are not orthogonal etc.

Just me (Sir to you)
Wednesday, May 12, 2004

"If a file is received which doesn't match an existing schema, the application builds a schema for it, creates the library to manage the schema, builds the relevant database tables and stored procedures, processes the file, and parses all future files matching that schema using the created code and tables."

You forgot "and fix my kitchen sink and bring world peace."

NoName
Wednesday, May 12, 2004

*  Recent Topics

*  Fog Creek Home