Plaintext Files Analysis
Source code and text comparison is an established, well-known analysis technique. Using a program capable of simply listing file A in the left window and file B in the right window and highlighting the differences between each and every line, preferably in a different color, is frequently an easy way to detect copied text. Some of the more advanced analysis utilities can also compare, merge, and synchronize files and directories. These approaches are appropriate for single text or XML files.
When multiple files are involved, differential comparison tools use a more advanced yet similar approach, calculating the longest common subsequent string of data between two data sets and producing an easily readable output by highlighting the differences between analyzed datasets.
Plaintext comparison techniques are easily defeated by software obfuscators. Software obfuscators are programs that scramble source code to create an encoded copy that maintains the programming language structure. This scrambled source code can be compiled in order to create an executable file with the same functionality as the original pre-obfuscated source. However, as the source text instructions are different, the compiler process will translate them to a different machine binary code with the same functionality as the original instructions but creating a different executable file; often much more difficult to read and understand by a programmer.
Taking into account that plaintext comparison techniques expect exact, or almost exact words, matching in order to be effective, any digital investigation can’t solely rely on these techniques if the defendant has adopted any obfuscating techniques.
From: Detecting Source Code Re-Use through a Binary Analysis Hybrid Approach by Daniel Cabezas and Bram Mooij