Fog Creek Software
Discussion Board




How do "Diff" software work?

Hi:

I am using a shareware for comparing differences in two files - Examdiff.

I was wondering what are the sorting algorithms behind this.

Lets say it is comparing two files T and T+1

Changes that are removed from T, will appear with a 'strike through' in T1. And new content will be highlighted.

Maybe this is asking too much - how about CityDesk having a version comparison mechanism built in.

Ram Dass
Monday, September 01, 2003

"optimal" diff is computed using a dynamic programming algorithm which you can find endless amount of material on by searching for "edit distance" or "levenstein distance" on Google. It's takes O(n*m) time.

Most diff programs don't compute the optimal set of differences, though - they use approximations of some sort.

[ http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/ ]

[
http://www.google.com/search?q=myers%20difference%20algorithm%20variations
]

Ori Berger
Monday, September 01, 2003

On the contrary, if N is the sum of the lengths of the two inputs, and D is the length of the edit sequence, then the optimal diff can be computed in O(ND) time and O(D) space.  (An edit sequence is a sequence of delete-element and insert-element commands to transform the first input into the second input.)

You can find an explanation of the O(ND) algorithm in this paper: http://www.cs.arizona.edu/people/gene/PAPERS/diff.ps

The GNU diff program always starts by using the O(ND) algorithm.  If it discovers that D would be too big, it terminates the algorithm early and may report a suboptimal edit sequence.

rob mayoff
Tuesday, September 02, 2003

Was about to ask the same question - thanks for the links !

Evgeny Goldin
Tuesday, September 02, 2003

I stand corrected.

Thanks!

Ori Berger
Tuesday, September 02, 2003

You might also be interested in seeing an application -- the Algorithm::Diff module (Perl) may be useful for you:

http://search.cpan.org/author/NEDKONZ/Algorithm-Diff-1.15/

Chris Winters
Tuesday, September 02, 2003

Anyone know if there are good general purpose algorithms for computing differences between tree structures like those implied by XML?

anon
Wednesday, September 03, 2003

i just downloaded ms xmldiff today. it referenced some named algorithm, take a look and see what it is.

http://apps.gotdotnet.com/xmltools/xmldiff/

mb
Wednesday, September 03, 2003

thx!

anon
Thursday, September 04, 2003

*  Recent Topics

*  Fog Creek Home