Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

The ultimate goal is comparing 2 binaries built from exact same source in exact same environment and being able to tell that they indeed are functionally equivalent.

One application for this would be focusing QA time on things that were actually changed between releases, as well as change monitoring in general.

MSVC in tandem with PE format naturally makes this very hard to do.

So far I found and neutralized those things:

  • PE timestamp and checksum
  • Digital signature directory entry
  • Debugger section timestamp
  • PDB signature, age and file path
  • Resources timestamp
  • All file/product versions in VS_VERSION_INFO resource
  • Digital signature section

I parse PE, find offsets and sizes for all those things and ignore byte ranges when comparing binaries. Works like charm (well, for the few tests I've run it). I can tell that signed executable with version 1.0.2.0 built on Win Server 2008 is equal to unsigned one, of version 10.6.6.6, build on my Win XP dev box, as long as compiler version and all sources and headers are the same. This seems to work for VC 7.1 -- 9.0. (For release builds)

With one caveat.

Absolute paths for both builds must be the same must have the same length.

cl.exe converts relative paths to absolute ones, and puts them right into objects along with compiler flags and so on. This has unproportional effects on whole binary. One character change in path will result in one byte changed here and there several times over whole .text section (however many objects were linked I suspect). Changing length of the path results in significantly more differences. Both in obj files and in linked binary.

Feels like file path with compile flags is used as some kind of hash, which makes it into linked binary or even affects placement order of unrelated pieces of compiled code.

So here is the 3-part question (summarized as "what now?"):

  • Should I abandon the whole project and go home because what I am trying to do breaks laws of physics and corporate policy of MS?

  • Assuming I handle absolute path issue (on policy level or by finding a magical compiler flag), are there any other things I should look out for? (things like __TIME__ do mean changed code, so I don't mind those not being ignored)

  • Is there a way to either force compiler to use relative paths, or to fool it into thinking the path is not what it is?

Reason for the last one is beautifully annoying Windows file system. You just never know when deleting several gigs worth of sources and objects and svn metadata will fail because of a rogue file lock. At least creating new root always succeeds while there is space left. Running multiple builds at once is an issue too. Running bunch of VMs, while a solution, is a rather heavy one.

I wonder if there is a way to setup a virtual file system for a process and its children so that several process trees will see different "C:uild" dirs, private to them only, all at the same time... A light-weight virtualization of sorts...

UPDATE: we recently opensourced the tool on GitHub. See Compare section in documentation.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
91 views
Welcome To Ask or Share your Answers For Others

1 Answer

I solved this to an extent.

Currently we have build system that makes sure all new builds are on the path of constant length (builds/001, builds/002, etc), thus avoiding shifts in the PE layout. After build a tool compares old and new binaries ignoring relevant PE fields and other locations with known superficial changes. It also runs some simple heuristics to detect dynamic ignorable changes. Here is full list of things to ignore:

  • PE timestamp and checksum
  • Digital signature directory entry
  • Export table timestamp
  • Debugger section timestamp
  • PDB signature, age and file path
  • Resources timestamp
  • All file/product versions in VS_VERSION_INFO resource
  • Digital signature section
  • MIDL vanity stub for embedded type libraries (contains timestamp string)
  • __FILE__, __DATE__ and __TIME__ macros when they are used as literal strings (can be wide or narrow char)

Once in a while linker would make some PE sections bigger without throwing anything else out of alignment. Looks like it moves section boundary inside the padding -- it is zeros all around anyway, but because of it I'll get binaries with 1 byte difference.

UPDATE: we recently opensourced the tool on GitHub. See Compare section in documentation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...