Get the difference between binary files using vcdiff
It took me to understand where and how the JPEG file was corrupted during the transfer.
VCDIFF is a format and algorithm for delta coding. Described in RFC 3284 .
Delta encoding (English Delta encoding) - a way to present data as a difference (delta) between serial data instead of the data itself.
For example, I use text files encoded in Windows-1251 for clarity. But with the same success it can be binary files.
Sources:
"копия текст копия" ( source.txt )
"копия изменения копия" ( target.txt )
Need to get the difference between files:
" изменения " ( source.txt -> target.txt )
" текст " ( target.txt -> source.txt )
I use the xdelta3 program but I think any one that works with the vcdiff format will do.
How to get
We will need another file filled with spaces:
" " ( spaces.txt )
It must be greater than or equal in size to the source file (source.txt)
Team:
xdelta3 -e -A -n -s source.txt target.txt | xdelta3 -d -s spaces.txt
Result:
изменения
Used flags: -e
- Delta creation -A
- removes unnecessary headers -n
- removes crc (it does not allow to use the delta with another source) -s [файл]
- the source with which the target file is compared and restored -d
- getting the target file from the delta and the source
How it works
If you run the command:
xdelta3 -e -A -n -s source.txt target.txt | xdelta3 printdelta
Then after all the headers see the commands VCDIFF
Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2
000000025 CPY_09 S@0
000009 010 ADD 9000018 025 CPY_09 S@14
VCDIFF is inherently very simple. It consists of 3 teams.
COPY
(copy) - copies data from the source or target ADD
(add) - writes data stored in the delta (unique data that is not in the source) to the target file RUN
(repeat) - repeats one byte from the delta a specified number of times
Delta stores only unique data and the rest is copied from the source. If you run the command:
xdelta3-e-A-n-ssource.txttarget.txt > target.vcdiff
We will see in the delta only the word "changes" which is only in the target file.
D0A6D093D094200102011720131B20090302изменения190D0A19200E
( JSON does not like special characters so I translated them to HEX )
If the delta is applied on the source (source.txt) then we will get the target file (target.txt)
xdelta3-d-ssource.txttarget.vcdiff
копия изменения копия
Replacing the source (source.txt) with a file filled with spaces (spaces.txt) we replaced the data that is repeated in the source and in the target file with spaces.
xdelta3-d-sspaces.txttarget.vcdiff
изменения
You can use any other character in the spaces.txt file. The main condition is that the spaces.txt file is greater than or equal in size to the source file.
I actually compared the JPEG files like this:
xdelta3 -e -A -n -s bad_image.jpg good_image.jpg | xdelta3 -d -s spaces.txt
The result of comparing these files:
F488A2 F2AB
Many spaces and bytes that were "broken". Broken bytes translated to HEX.
Test jpeg files on which you can test comparison methods:
![]() | ![]() |
xdelta3 -e -A -n -s tortoise_bad.jpg tortoise.jpg | xdelta3 -d -s spaces.txt
The result of comparing these files:
F1BF F0B786 F39BAF F3BD94
Broken bytes translated to HEX.