2.7. Hashing#

A hash function is a mathematical function that takes an input (of any size) and produces a fixed length output called a hash value.

Note

Another term for “hash value” is “digest”!

As an example, a hash function might take an entire file and generate a short, random sequence of characters. The same file will always produce the same result but even a single bit change will result in a completely different hash a property known as the avalanche effect.

2.7.1. Data Integrity#

Hash functions are primarily used for data integrity checks. That means checking that the data is the same as when it was created, since there is a risk of corruption and tampering when storing and transmitting data.

Corruption or tampering can be checked by comparing the hash of the data before and after storage or transmission. If the hash is different then the data is different.

2.7.2. Bins and Collisions#

Hash functions produce a fixed length hash value and an interpretation of this is that the hash value is the number of a bin or container. The hash function is therefore assigning each input to a particular bin.

Since a hash function produces a fixed-size output but can take infinite possible inputs, there must be different inputs that produce the same hash. When different inputs produce the same hash value we have a collision.

Collisions are inevitable if the hash function allows any length input, however a well designed hash function makes them rare.