How to compare parts of files by hash?

sinned 12/06/2018. 7 answers, 2.751 views
bash hashing

I have one successfully downloaded file and another failed download (only the first 100 MB of a large file) which I suspect is the same file.

To verify this, I'd like to check their hashes, but since I only have a part of the unsuccessfully downloaded file, I only want to hash the first few megabytes or so.

How do I do this?

OS would be windows, but I have cygwin and MinGW installed.

7 Answers


Konrad Rudolph 12/06/2018.

Creating hashes to compare files makes sense if you compare one file against many, or when comparing many files against each other.

It does not make sense when comparing two files only once: The effort to compute the hashes is at least as high as walking over the files and comparing them directly.

An efficient file comparison tool is cmp:

cmp --bytes $((100 * 1024 * 1024)) file1 file2 && echo "File fragments are identical"

You can also combine it with dd to compare arbitrary parts (not necessarily from the beginning) of two files, e.g.:

cmp \
    <(dd if=file1 bs=100M count=1 skip=1 2>/dev/null) \
    <(dd if=file2 bs=100M count=1 skip=1 2>/dev/null) \
&& echo "File fragments are identical"

davidbaumann 12/06/2018.

I am sorry I can't exactly try that, but this way will work

dd if=yourfile.zip of=first100mb1.dat bs=100M count=1
dd if=yourotherfile.zip of=first100mb2.dat bs=100M count=1

This will get you the first 100 Megabyte of both files.

Now get the hashes:

sha256sum first100mb1.dat && sha256sum first100mb2.dat 

You can also run it directly:

dd if=yourfile.zip bs=100M count=1 | sha256sum 
dd if=yourotherfile.zip bs=100M count=1 | sha256sum 

Xen2050 12/06/2018.

You could just directly compare the files, with a binary / hex diff program like vbindiff. It quickly compares files up to 4GB on Linux & Windows.

Looks something like this, only with the difference highlighted in red (1B vs 1C):

one                                       
0000 0000: 30 5C 72 A7 1B 6D FB FC  08 00 00 00 00 00 00 00  0\r..m.. ........  
0000 0010: 00 00 00 00                                       ....
0000 0020:
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080: 
0000 0090: 
0000 00A0: 

two        
0000 0000: 30 5C 72 A7 1C 6D FB FC  08 00 00 00 00 00 00 00  0\r..m.. ........  
0000 0010: 00 00 00 00                                       ....               
0000 0020: 
0000 0030:
0000 0040:
0000 0050:
0000 0060:
0000 0070:
0000 0080:
0000 0090:                                
0000 00A0:             
┌──────────────────────────────────────────────────────────────────────────────┐
│Arrow keys move  F find      RET next difference  ESC quit  T move top        │
│C ASCII/EBCDIC   E edit file   G goto position      Q quit  B move bottom     │
└──────────────────────────────────────────────────────────────────────────────┘ 

Tonny 12/07/2018.

Everybody seems to go the Unix/Linux route with this, but just comparing 2 files can easily be done with Windows standard commands:
FC /B file file2

FC is present on every Windows NT version ever made. And (if I recall correctly) was also present in DOS.
It is a bit slow, but that doesn't matter for a one-time use.


Blerg 12/08/2018.

I know it says for Bash, but OP also states that they have Windows. For anyone that wants/requires a Windows solution, there's a program called HxD which is a Hex Editor that can compare two files. If the files are different sizes, it will tell if the available parts are the same. And if need be, it's capable of running checksums for whatever is currently selected. It's free and can be downloaded from: the HxD website. I don't have any connection to the author(s), I've just been using it for years.


Jim L. 12/12/2018.

cmp will tell you when two files are identical up to the length of the smaller file:

$ dd if=/dev/random bs=8192 count=8192 > a
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.514571 secs (130417197 bytes/sec)
$ cp a b
$ dd if=/dev/random bs=8192 count=8192 >> b 
8192+0 records in
8192+0 records out
67108864 bytes transferred in 0.512228 secs (131013601 bytes/sec)
$ cmp a b
cmp: EOF on a

cmp is telling you that the comparison encountered an EOF on file a before it detected any difference between the two files.


user48918 12/07/2018.

If you can access a shell session the remote system, then you can break the source file up into pieces using the split command. To split a big file into (binary) bits of one million bytes or less each:

split -b 1000000 bigfile.tgz will create pieces xaa xab etc. From there it is trivial to concatenate the pieces to reconstruct the file:

cat x?? > reconstructed_bigfile.tgz Of course you have control over the names of the file components. I am just illustrating using the defaults.


HighResolutionMusic.com - Download Hi-Res Songs

1 Ariana Grande

7 Rings flac

Ariana Grande. 2019. Writer: Ariana Grande;Richard Rodgers;TBHits;Njomza;Michael "Mikey" Foster;Kaydence;Tayla Parx;Scootie;Oscar Hammerstein II;Victoria Monét.
2 Alan Walker

Lily flac

Alan Walker. 2018. Writer: Alan Walker;Lars Kristian Rosness;Magnus Bertelsen;K-391;Didrik Handlykken;Marcus Arnbekk.
3 Alec Benjamin

Let Me Down Slowly flac

Alec Benjamin. 2019. Writer: Alec Benjamin;Sir Nolan;Michael Pollack.
4 Alan Walker

Lost Control flac

Alan Walker. 2018. Writer: Alan Walker;Thomas Troelsen;Mood Melodies;Sorana;Fredrik Borch Olsen;Magnus "Magnify" Martinsen.
5 Skylar Grey

Everything I Need flac

Skylar Grey. 2018. Writer: Elliott Taylor;Rupert Gregson-Williams;Skylar Grey.
6 Post Malone

Sunflower flac

Post Malone. 2018. Writer: Carl Rosen;Louis Bell;Billy Walsh;Carter Lang;Swae Lee;Post Malone.
7 Westlife

Hello My Love flac

Westlife. 2019. Writer: Steve Mac;Ed Sheeran.
8 Alan Walker

Different World flac

Alan Walker. 2018. Writer: Shy Nodi;Alan Walker;Fredrik Borch Olsen;James Njie;Marcus Arnbekk;Gunnar Greve Pettersen;K-391;Corsak;Shy Martin;Magnus Bertelsen.
9 Sam Smith

Fire On Fire flac

Sam Smith. 2018. Writer: Steve Mac;Sam Smith.
10 Conor Maynard

Way Back Home (Sam Feldt Edit) flac

Conor Maynard. 2018. Writer: Ji Hye Lee;Shaun.
11 Normani

Dancing With A Stranger flac

Normani. 2019. Writer: Mikkel S. Eriksen;Tor Hermansen;Jimmy Napes;Normani;Sam Smith.
12 Slushii

Never Let You Go flac

Slushii. 2019. Writer: Sofía Reyes;Slushii;Aviella Winder.
13 Skrillex

Face My Fears flac

Skrillex. 2019. Writer: Poo Bear;Skrillex;Utada Hikaru.
14 Alora & Senii

Love U So flac

Alora & Senii. 2019. Writer: Alora & Senii.
15 The Chainsmokers

Hope flac

The Chainsmokers. 2018. Writer: Kate Morgan;Chris Lyon;Alex Pall;Andrew Taggart.
16 Imagine Dragons

Believer flac

Imagine Dragons. 2019. Writer: Dan Reynolds;Lil Wayne;Wayne Sermon;Ben McKee;Daniel Platzman;Robin Fredriksson;Mattias Larsson;Justin Tranter.
17 Alan Walker

I Don't Wanna Go flac

Alan Walker. 2018.
18 Mike Perry

Runaway flac

Mike Perry. 2019. Writer: Andreas Wiman;Dimitri Vangelis;Richard Müller;Sasha Rangas;Stefan Van Leijsen;Mike Perry.
19 Gesaffelstein

Lost In The Fire flac

Gesaffelstein. 2019. Writer: Ahmad Balshe;Nate Donmoyer;Gesaffelstein;DaHeala;The Weeknd.
20 Hozier

Almost (Sweet Music) flac

Hozier. 2019. Writer: Hozier.

Related questions

Hot questions

Language

Popular Tags