Skip to content
jordanell edited this page Jul 3, 2012 · 11 revisions

Differ project generates the differences between two version of an input file. The input file can be any text file (UTF-8) but not binary files. The Differ returns two results - Insert object list and Delete object list.

Differ project is used in Eggnet's CallGraphAnalyzer as a tool to analyze changes in a file between commits. It is also used in Eggnet's Ownership to identify changes's ownership.

Details

Differ is a wrapper class around Diff-match-patch library developed by Neil Fraser. The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text. This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

Basically, the Differ can do the following:

  1. Compare two text files (character-wide or line-wide).

  2. Create list of diff objects (EQUAL, INSERT or DELETE). Each object contains the start, end character position which are relative to the OLD VERSION file.

  3. Clean up the diff objects by merging all the junks to make human-readable content.

Note that, we are comparing character-by-character not line-by-line, so the Differ result can look like junks. However, this approach will be easier to process later on and provide more precise output.

Remark: you can reconstruct the new version and old version of the file based on the diff objects:

  • NEW VERSION == EQUAL + INSERT
  • OLD VERSION == EQUAL + DELETE

Limitations

The Differ results sometime are inconsistent with Git diff results, because they are probably using different algorithms. As a result, we need to be careful to work/test on Diff-match-patch result, and only use Git diff result as a reference.

The Differ can only parse UTF-8 file(Text file) but not binary file. If the input is binary, the Differ will throw exception.

We assume that AST parser provides character position of functions and variables are consistent with Diff-match-patch.

Clone this wiki locally