Fog Creek Software
Discussion Board




Binary diff algorithm

I am looking for text and binary diff algorithms. It would be helpful if you point me to some sources - how to implement a one (algorithm), COTS components (with redistribution license) or source code.

If it's an implementation I would prefer C, C# or C++.

Tornado
Thursday, May 27, 2004

Got Google?


Thursday, May 27, 2004

I am looking for a recommendation - a human recommendation :)

Tornado
Thursday, May 27, 2004

Uh, I thought google gave human recommendations?

Sorry, had to rub it in *lol* j/k.

Li-fan Chen
Thursday, May 27, 2004

;)

Tornado
Thursday, May 27, 2004

can't go wrong with this:
license GPL!!!
http://ftp.gnu.org/pub/gnu/diffutils/

Mark
Thursday, May 27, 2004

Check windiff on Sourceforge.

Matthew Lock
Thursday, May 27, 2004

Text:

1. Read a line of text from both files.
2. Strip the white space from the lines
3. Compare the lines using the current locale.
4. If File1 line equals File2 line then mark as equal & goto 1
5. Lines are not equal so mark the File2 line as not being in File1.
6. Continue marking and reading/compressing lines of text from File2 until one of them equals the next line of File1.
7. When you find one that is equal goto 1.

Disclaimer: This may not work but the real question is why not? :-)

Anon
Thursday, May 27, 2004

The problem is with what you consider trivial differences. Do you consider tabs to be the same as 4 spaces or not? Do you consider unix end of line characters to be the same as windows or mac ones?

Matthew Lock
Thursday, May 27, 2004

Binary diff are text diff are usually considered to be entirely different problems.

For binary diff, we've had very good experiences with the VCDiff algorithm, documented in RFC 3284 (I think).

Text diff has been studied for an awfully long time, but I nonetheless cannot recommend a really good spec or piece of literature to describe algorithms.  There are lots of implementations as well, but few that are open source.  If you need one for commercial use, there is a diff engine buried inside Subversion (BSD-ish license).

Eric Sink
Thursday, May 27, 2004

For text diff, the standard is the Levenstein/Levenshtein  edit distance algorithm.  It's easy to implement, and there are many existing implementations.  Here's one discussion, with examples in Java, C++ and VB:

http://www.merriampark.com/ld.htm

AFAIK, the algorithm should be equally applicable for a binary diff application.  (Just substitute bits or bytes instead of characters.)

Robert Jacobson
Friday, May 28, 2004

For text, there's also the ancient (but simple) "longest common subsequence" algorithm.

It's pretty hard to find a good description of this online, but it's a dead-simple dynamic programming approach.

J. Random Hacker
Friday, May 28, 2004

Google for "edit distance".

Several flavors exist, but basically it asks the question What do I have to do to String1 to make it into String2?

5v3n
Friday, May 28, 2004

If you invent a new binary diff algorithm, will you please name it "Biff" ? Fans of "Back to the Future" will thank you...

Doc
Saturday, May 29, 2004

The version control system Subversion claims to have a diff algorithm that is working equally on text and binary data.
Subversion is written in C afaik.

Check http://subversion.tigris.org/ for more information.

Florian
Sunday, May 30, 2004

I am going through vcdiff.I am struggling to make it in windows.Since i am having lack of support in windows.And i want to know whether vcdiff is the exact one for binary differencing?Whether it will worked out or not ?

Sarav
Wednesday, June 02, 2004

*  Recent Topics

*  Fog Creek Home