How do you catch a node that wants to cheat by secretly changing a transaction? Just make sure that even a minor change is easily visible for all other network participants, so they can easily reject it. As a result, the blockchain consists of an ever-growing chain of only approved transactions. Hashing is a cryptographic technology that ensures that minor changes are easily detectable. Each blockchain uses hashing, but different blockchains may apply other hashing techniques. We explain how hashing contributes to a blockchain’s immutability, followed by an overview of hashing basics.
A hashing function delivers an output (hashed result) based on an input. A minor variation in the input results in a significant output variation - a vital characteristic of a hashing function. Later, we will describe this in more detail. First, we want to show how this looks and how we can use it to create an immutable transaction chain. For this purpose, we use a basic description: A set of transactions is called a block. A sum of blocks is called a blockchain.
Let’s use two input values:
The hashing function, called SHA-256, delivers two completely different output values (hashed results). You can test this yourself using the SHA-256 hash generator at the bottom of this page.
Hashed results for Value 1
Hashed results for Value 2
Let’s assume John entered into a transaction with Maria. They agreed upon Maria delivering John a product in exchange for John sending 1 bitcoin to Maria. They both completed their part of the transaction. However, John is a malicious actor and tries to reverse the transaction into 0.1 bitcoin after Maria delivered her part of the deal. The hashed result of the original and valid transaction looks as follows
Hashed result valid transaction
The hashed result of the corrupt transaction is entirely different.
Hashed result invalid transaction
Using SHA-256, you can thus easily detect a valid transaction from a fraudulent one.
A simplified version of the blockchain looks as follows:
The transactions are defined as follows:
The input and output value for block 81 are thus:
Hashed value of block 81
The input and output value for block 82 are:
Hashed value of block 82
If you apply a minor change to a single transaction, you change the entire block’s hash value. That change will propagate to the hash value of the next block because that one includes the hash value of the previous block. As such, a minor change to a single transaction propagates through the entire chain. Because of this characteristic, minor changes can be easily detected and thus rejected.
For users to feel safe about transactions, they must be immutable once completed. Therefore, the network must have a way to detect fraudulent transactions. A hashing function contributes to that objective. A new block is accepted when the majority of network participants agree upon its hashed value. And once approved, its hash value must stay the same. But that value changes, because of hashing, once an earlier transaction is adjusted. Hashing ensures fraudulent actions are easily detectable. The network will ignore them. The list of agreed transactions stays the same. In summary, a hashing function flags reversals of earlier transactions, thus enabling the network to reject them.
A cryptographic hashing function is an algorithm that converts an input of any size into an output of fixed length. The algorithm contributes to the immutability of the blockchain, as we described above. But how does it work? What are the properties of a cryptographic hashing function? And is there just one sort or multiples types? What are the applications of these hashing functions beyond contributing to immutability?
The five main properties of a cryptographic hash function are:
We’ll explain them one by one.
Deterministic means that the algorithm will always produce the same output based on the same input, no matter who submits or how often you submit the same data. That means that everyone automatically agrees on the result if they agree on the submitted data.
A hashing function should be quick to return a result because that reduces the time required to validate a transaction. A network can process more transactions per second if the time needed for validation diminishes. In Proof of Work consensus, it’s tough to find the input that leads to the required output. However, once found, other nodes can quickly validate the input data is correct because the algorithm is so efficient. And move to the creation of the subsequent blocks of transactions after that. In summary: efficient hashing means less time to validate transactions, which means a higher transaction speed.
Pre-image resistance means that given the output, it’s practically infeasible to derive the input. It is not impossible, because you can always try and guess. But you would need so many guesses that you won’t find the answer in your lifetime. Remember that the algorithm converts an input of any size into an output? That means you have an infinite amount of options to try, which explains why you would need more than a lifetime.
Collision resistance means it’s infeasible to find two inputs that lead to precisely the same output. Collision may exist, but they are simply hard to find. Why do they exist? Simply because there is a finite number of results for an infinite number of data sets.
A small change in the input creates a considerable difference in the output. The bigger the avalanche effect, the better. Why? If your guesses showed, the response would start to look like the required output one can begin to make educated guesses instead of random guesses. And the fewer guesses you need, the less secure the algorithm becomes.
The most common cryptographic hash functions are those that belong to the SHA-family. SHA means Secure Hash Algorithm. The current families are SHA-0, SHA-1, SHA-2, and SHA-3. SHA-1 is more secure against cryptographic attacks than SHA-0, and so on. The algorithm we use in this article is the SHA-256 algorithm. The 256 refers to a hashed value of 256 bits or 64 hexadecimal digits. If you count the characters of each hashed value in the SHA-256 generator at the end of this page, you will always find 64 characters.
Security is always a trade-off. For example, SHA-256 is more secure than a SHA-1 algorithm but is less efficient. Calculating the outcome takes 20-30% more time. In general terms, the better cryptographic hash functions perform against their listed properties, the more secure they are.
The National Security Agency (NSA) in the United States developed the SHA-1 and SHA-2 algorithms. The US National Institute of Technology (NIST) organized a competition to design a SHA-3 hash function to complement existing algorithms. Keccak was the winner of that competition, and Groestl was one of the finalists. Its algorithm is applied in the cryptocurrency Groestlcoin.
Cryptographic hashing contributes to the immutability of data stored on the blockchain. But, there are more applications such as Checksums and storing passwords.
You can also apply this principle by calculating a hashed value of a timestamp and a document and store the outcome on the blockchain. As such, you can prove you submitted a specific data set at a particular point in time. Blockchain services based on this principle are Originstamp and Blocknotary.
"I really, really like the article. It's very clear and well written."