Fog Creek Software
g
Discussion Board




How do "Diff" software work?

Hi:

I am using a shareware for comparing differences in two files - Examdiff.

I was wondering what are the sorting algorithms behind this.

Lets say it is comparing two files T and T+1

Changes that are removed from T, will appear with a 'strike through' in T1. And new content will be highlighted.

Maybe this is asking too much - how about CityDesk having a version comparison mechanism built in.

Ram Dass
Monday, September 1, 2003

"optimal" diff is computed using a dynamic programming algorithm which you can find endless amount of material on by searching for "edit distance" or "levenstein distance" on Google. It's takes O(n*m) time.

Most diff programs don't compute the optimal set of differences, though - they use approximations of some sort.

[ http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/ ]

[
http://www.google.com/search?q=myers%20difference%20algorithm%20variations
]

Ori Berger
Monday, September 1, 2003

On the contrary, if N is the sum of the lengths of the two inputs, and D is the length of the edit sequence, then the optimal diff can be computed in O(ND) time and O(D) space.  (An edit sequence is a sequence of delete-element and insert-element commands to transform the first input into the second input.)

You can find an explanation of the O(ND) algorithm in this paper: http://www.cs.arizona.edu/people/gene/PAPERS/diff.ps

The GNU diff program always starts by using the O(ND) algorithm.  If it discovers that D would be too big, it terminates the algorithm early and may report a suboptimal edit sequence.

rob mayoff
Tuesday, September 2, 2003

Was about to ask the same question - thanks for the links !

Evgeny Goldin
Tuesday, September 2, 2003

I stand corrected.

Thanks!

Ori Berger
Tuesday, September 2, 2003

You might also be interested in seeing an application -- the Algorithm::Diff module (Perl) may be useful for you:

http://search.cpan.org/author/NEDKONZ/Algorithm-Diff-1.15/

Chris Winters
Tuesday, September 2, 2003

Anyone know if there are good general purpose algorithms for computing differences between tree structures like those implied by XML?

anon
Wednesday, September 3, 2003

i just downloaded ms xmldiff today. it referenced some named algorithm, take a look and see what it is.

http://apps.gotdotnet.com/xmltools/xmldiff/

mb
Wednesday, September 3, 2003

thx!

anon
Thursday, September 4, 2003

*  Recent Topics

*  Fog Creek Home