Definition

hashing

What is hashing?

Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string.

The most popular use of hashing is for setting up hash tables. A hash table stores key and value pairs in a list that's accessible through its index. Because the number of keys and value pairs is unlimited, the hash function maps the keys to the table size. A hash value then becomes the index for a specific element.

A hash function generates new values according to a mathematical hashing algorithm, known as a hash value or simply a hash. To prevent the conversion of a hash back into the original key, a good hash always uses a one-way hashing algorithm.

Hashing is relevant to but not limited to data indexing and retrieval, digital signatures, cybersecurity, and cryptography.

The process of encrypting a digital signature.
When someone creates and encrypts a digital signature with a private key, hash data is also created and encrypted. The signer's public key then enables the recipient to decrypt the signature.

How does hashing work?

Hashing involves three components:

  • Input. The data entered into the algorithm is called input. This data can have any length and format. For instance, an input could be a music file or a paper. In hashing, every piece of input data is used to produce a single output.
  • Hash function. The central part of the hashing process is the hash function. This function takes the input data and applies a series of mathematical operations to it, resulting in a fixed-length string of characters. The hash function ensures that even a small change in the input data produces a significantly different hash value.
  • Hash output. Unlike the input, the hashing process's output or hash value has a set length. It's challenging to determine the length of the original input because outputs have a set length, which contributes to an overall boost in security. A hash value is a string of characters and numbers that a hacker might not be able to read, keeping a person's information private. As each hash value is distinct, hash values are also frequently referred to as fingerprints.

Benefits of hashing

Hashing has applications in various fields such as cryptography, computer science and data management. Some common uses and benefits of hashing include the following:

  • Data integrity. Hashing is commonly used to ensure data integrity. By generating a hash value for an amount of data, such as a file or message, a user can later compare it with the hash value of the received data to verify if any changes or corruption occurred during transmission.
  • Efficient data retrieval. Hashing enables efficient data retrieval in hash tables, especially when dealing with large data sets. It uses functions or algorithms to map object data to a representative integer value. A hash can then be used to narrow down searches when locating these items on that object data map. For example, in hash tables, developers store data -- perhaps a customer record -- in the form of key and value pairs. The key identifies the data and operates as an input to the hashing function, while the hash code or the integer is then mapped to a fixed size. Typically functions supported by hash tables include insert (key, value), get (key) and delete (key).
  • Digital signatures. In addition to enabling rapid data retrieval, hashing helps encrypt and decrypt digital signatures used to authenticate message senders and receivers. In this scenario, a hash function transforms the digital signature before both the hashed value -- known as a message digest -- and the signature are sent in separate transmissions to the receiver. Upon receipt, the same hash function derives the message digest from the signature, which is then compared with the transmitted message digest to ensure both are the same. In a one-way hashing operation, the hash function indexes the original value or key and enables access to data associated with a specific value or key that's retrieved.
  • Password storage. Hashing is widely used for secure password storage. Instead of storing passwords in plain text, they're hashed and stored as hash values. This adds an extra layer of security so even if the hash values are compromised, it's computationally infeasible to reverse-engineer the original passwords.
  • Fast searching. Hashing algorithms are designed to organize data into easily searchable buckets. This makes searching for specific data faster compared to other data structures. Hashing is particularly useful in applications that require rapid search results, such as databases and search engines.
  • Efficient caching. Hash tables are commonly used to configure caching systems. By using hash values as keys, data can be quickly retrieved from cache memory, reducing the need to access slower storage systems. This improves overall system performance and response times.
  • Cryptographic applications. Hashing plays a crucial role in various cryptographic algorithms. Cryptographic hash functions are used to generate digital signatures, authenticate messages and ensure data integrity and authenticity. Hashing algorithms such as Secure Hash Algorithm 2, or SH-2, are widely used in cryptographic applications.
  • Space efficiency. Hashing enables efficient use of storage space. Hash values are typically shorter than the original data, making them more compact and easier to store. This is especially beneficial when dealing with large data sets or limited storage resources.
  • Blockchain technology. Hashing is widely used in blockchain, especially in cryptocurrencies such as Bitcoin. Blockchain is a digital ledger that stores transactional data and each new record is called a block. Since all participants in a blockchain have access to identical data, ensuring the integrity of previous transactions is critical. This is when hashing comes into play, as it ensures the integrity and immutability of data stored in blocks.
  • Data compression. By employing coding algorithms such as the Huffman coding algorithm, which is a lossless compression algorithm, hashing can be used to encode data efficiently.
  • Database management. When dealing with large data sets, combing through multiple entries to obtain the necessary data can be intimidating. Hashing offers an alternative by letting users search for data records using a search key and a hash function rather than an index structure. Hash files organize data into buckets, each of which can hold numerous records. The basic role of hash functions is to map search keys to the exact location of a record within a given bucket.
A hash table and how it works.
This illustrates the process of converting key values into indexes.

Disadvantages of hashing

While hashing offers several benefits, it also has certain drawbacks and limitations, including the following:

  • Risk of collisions. Hashing can sometimes suffer from collisions, which occur when two different inputs produce the same hash value. Collisions can lead to decreased performance and increased lookup time, especially if the number of collisions is high. Techniques such as chaining and open addressing can be used to handle collisions, but they can introduce additional complexity. For example, the cache performance of chaining isn't always the best, as keys use a linked list.
  • Non-reversible. Since hash functions are intended to be one-way functions, reversing the process and getting the original input data isn't computationally viable. This could be a drawback if reverse lookup is necessary.
  • Limited sorting. Hashing isn't ideal if data needs to be sorted in a specific order. While hash tables are designed for efficient lookup and retrieval, they don't provide inherent support for sorting operations. If sorting is a requirement, other data structures such as balanced search trees might be worth considering.
  • Space overhead. To store the hash values and the related data, hashing typically requires more storage space. This space overhead can be substantial when working with big data sets and can be a cause for concern when storage resources are limited.
  • Key dependency. Hashing relies on the uniqueness of keys to ensure efficient data retrieval. If the keys aren't unique, collisions can occur more frequently, leading to performance degradation. It's important to carefully choose or design keys to minimize the likelihood of collisions.
  • Difficulty in setting up. Configuring a hash table or a hashing algorithm can be more complex compared to other data structures. Handling collisions, resizing the hash table and ensuring efficient performance requires careful consideration and planning and can make hashing challenging to set up.

What is hashing in data structure?

Hashing is used in data structures to efficiently store and retrieve data. The Dewey Decimal System, which enables books to be organized and stored based on their subject matter, has worked well in libraries for many years and the underlying concept works just as well in computer science. Software engineers can save both file space and time by shrinking the original data assets and input strings to short alphanumeric hash keys.

When someone is looking for an item on a data map, hashing narrows down the search. In this scenario, hash codes generate an index to store values. Here, hashing is used to index and retrieve information from a database because it helps accelerate the process. It's much easier to find an item using its shorter hashed key than its original value.

What is hashing in cybersecurity?

Many encryption algorithms are used to enhance cybersecurity, including MD5, SHA-256, SHA-512 and Bcrypt. Each algorithm has unique qualities and levels of security and the application's specific requirements determine which algorithm is used.

Hashed strings and inputs are meaningless to hackers without a decryption key. For example, if hackers breach a database and find data such as "John Doe, Social Security number 273-76-1989," they can immediately use that information for their nefarious activities. However, a hashed value such as "a87b3" is useless for threat actors unless they have a key to decipher it. As such, hashing secures passwords stored in a database.

What is hashing in cryptography?

The primary purpose of hashing in cryptography is to provide a unique and irreversible representation of data. Cryptography uses multiple hash functions to secure data.

The MD5 hashing algorithm.
The MD5 hashing algorithm and how it works in cryptography.

Some of the most popular cryptographic hashes include the following:

  • SHA-2.
  • SHA-3.
  • The series of message-digest hash functions: MD2, MD4, MD5 and MD6.

Message-digest hash functions such as MD2, MD4 and MD5 hash digital signatures. Once hashed, the signature is transformed into a shorter value called a message digest.

SHA is a standard algorithm used to create a larger 160-bit message digest. While it's similar to MD4 as well as good at database storage and retrieval, this isn't the best approach for cryptographic or error-checking purposes. SHA-2 is used to create a larger 224-bit message digest. SHA-3 is SHA-2's successor.

What is a collision?

Hashing in cybersecurity demands unidirectional processes that use a one-way hashing algorithm. It's a crucial step in stopping threat actors from reverse engineering a hash back to its original state.

It typically takes numerous brute force attempts to defeat a cryptographic hash function. A hacker would have to estimate the input until the corresponding output is produced to revert to a cryptographic hash function. However, separate inputs could produce the same outcome, which means two keys can end up generating an identical hash. This phenomenon is called a collision.

The following key points should be considered regarding a collision in hashing:

  • A good hash function never produces the same hash value from two different inputs. As such, a hash function that is extremely collision-resistant is considered acceptable.
  • Open addressing and separate chaining are two ways of dealing with collisions when they occur.
  • Open addressing handles collisions by storing all data in the hash table itself and then seeking out availability in the next spot created by the algorithm. Open addressing methods include double hashing, linear probing and quadratic probing.
  • Separate chaining, by contrast, avoids collisions by making every hash table cell point to linked lists of records with identical hash function values.
  • To further ensure the uniqueness of encrypted outputs, cybersecurity professionals can also add random data into the hash function. This approach, known as salting, guarantees a unique output even when the inputs are identical.
  • Salting obstructs bad actors from accessing non-unique passwords because each hash value is unique, even when users reuse their passwords. Thus, salting adds another layer of security to thwart rainbow table attacks.
  • Hashing can also be used when analyzing or preventing file tampering. This is because each original file generates a hash and stores it within the file data. When a receiver is sent the file and hash together, it can check the hash to determine if the file was compromised. If someone manipulated the file in transit, the hash would reflect that change.

Hashing vs. encryption

Hashing and encryption are both cryptographic techniques used to protect data, but they serve different purposes and have distinct characteristics.

Hashing

  • Hashing is a one-way process that turns data into a fixed-length hash value using a hash function.
  • The primary goal of hashing is to ensure data integrity and validate the original data.
  • Hash functions are intended to be fast and efficient, generating unique hash values for each input.
  • Hashing is irreversible, which means it's computationally impractical to recover the original data from the hash value.
  • Hashing is often used to store passwords, create digital signatures and verify data integrity.
  • Hashing algorithms include MD5, SHA-3 and SHA-256.

Encryption

  • Encryption is a two-step procedure that converts data into an unreadable form, or ciphertext, using an encryption algorithm and a key.
  • The fundamental goal of encryption is to ensure data secrecy and protect sensitive information from unauthorized access.
  • Encryption requires both encryption and decryption keys to convert data between plaintext and ciphertext.
  • Encryption algorithms are intended to be secure and resistant to attacks, making it impossible for unauthorized parties to decrypt the ciphertext without the correct key.
  • Encryption is a popular method for secure communication, data storage and securing sensitive information.
  • Examples of encryption algorithms include RSA, or Rivest-Shamir-Adleman; Advanced Encryption Standard; and Blowfish.

Ensuring the integrity of online interactions is crucial for seamless business operations. Explore how to use a public and private key to handle electronic documents using digital signatures.

This was last updated in May 2024

Continue Reading About hashing

Dig Deeper on Data governance