cryptographic checksum
What is a cryptographic checksum?
Generated by a cryptographic algorithm, a cryptographic checksum is a mathematical value assigned to a file sent through a network for verifying that the data contained in that file is unchanged. The algorithm performs numerous mathematical operations to create a hash value, or fixed string of digits. This hash value is then used as a checksum to confirm that the sent file was not changed by an attacker.
A hash value remains unchanged from the time it is created and is considered an "electronic fingerprint" of a file. A cryptographic checksum is assigned to a file and is used to verify that the data in that file has not been tampered with or manipulated, possibly by a malicious entity.
Cryptographic checksums provide the basis of modern cryptography, particularly for signing and encryption, digital signatures, email certificates and website certificates. They are also known as message authentication codes, integrity check values, modification detection codes or message integrity codes.
How a cryptographic checksum works
A cryptographic checksum is based on hash functions that provide hash values -- also known as hash codes -- for every file. The cryptographic hash function takes an input and produces a fixed-length sequence of numbers and letters. The checksum is of the same length, regardless of the original file's size.
When a user creates a file and makes copies of it, the file always has the same hash code. If as much as one bit of information in the file changes -- say, because it was manipulated by an eavesdropper or data thief -- a different hash code/checksum is generated.
The hash function checksum procedure ensures that the files sent during communication return the same hash code for the sender and the receiver. If the hash code changes, any damage or manipulation can be easily identified.
If the checksum of the original file is known, an authorized user can run a checksum/hashing utility on the file to match the resulting checksum to the original checksum. If these two checksums match, the file is identical. However, if they don't match, the user can identify a fake version of the original file.
Checksums are used to check files and other data for errors or manipulation that might have occurred during data transmission or storage.
Cryptographic checksum algorithms
Typical checksum algorithms are MD5, SHA-1, SHA-256 and SHA-512.
MD5, or Message Digest Algorithm 5, is a cryptographic algorithm. It produces a 128-bit checksum. Although MD5 is fast, it is not as secure as the Secure Hash Algorithm (SHA) functions.
The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.
Checksums play an important role in data protection and file security. Organization or user requirements dictate which checksum is used.
Applications of cryptographic checksums
Cryptographic checksums help authenticate files and their integrity. These algorithms also make it possible to acknowledge receipt of a file's last processing status.
Checksums are used for applications such as the following:
- File integrity preservation. Cryptographic checksums help prevent unauthorized access and data manipulation. While version control systems serve a similar function, checksums are considered more effective and secure.
- Image licensing. There are multiple post-processing phases that images undergo before they are published. Cryptographic checksums can prevent the exploitation of images because the files cannot be modified for contrast characteristics or retouching.
- Documents with hash values. When publishing documents on the internet or on an intranet, specifying hash values and using Secure Sockets Layer/Transport Layer Security certificates increase document security. Comparing hash values ensures that downloaded documents are free from malicious code or transfer damage.
- Password storage. A plaintext password stored in the form of its associated hash value is more secure than a normal password. When users reenters their password, the new hash value is compared to the stored value. If there was an intrusion, the plaintext password is not lost.
- Safe email archiving. Electronic fingerprints of all incoming and outgoing emails are created and stored in encrypted form. When an email is read, the cryptographic checksum is formed again and compared with the original. A changed checksum could indicate email tampering.
How to get cryptographic checksums for your files
A user can compare the checksum of a particular file against the original checksum using utilities in Windows, macOS and Linux. Third-party utilities are not required.
Here is how users can access the checksums for their files in Windows using PowerShell's Get-FileHash command:
- Right-click the Start button, and navigate to Windows PowerShell.
- Or the user can launch PowerShell by searching the Start menu and clicking the Windows PowerShell
- Type Get-FileHash followed by a space, and type the path location of the desired file.
- Or a user can drag and drop files to the PowerShell window to automatically fill in the path.
- Run the command by pressing Enter.
- The output displays the corresponding algorithm name and the hash value.
If the hash value matches with the original file, then the files are identical. If not, the files are modified or corrupted.
Breaking the hash function: Collision attack
Based on the hash function checksum procedure, the checksums derived for different files should have different lengths. However, there could be an infinite number of computer files, so it's impossible to assign a different value to each file with a checksum number of fixed length.
Also, an attacker could manipulate a file and then try to obtain a file version with the cryptographic checksum that's identical to the original file, possibly by inserting invisible control characters through trial and error. If the attacker succeeds in creating a second file with the same cryptographic checksum as the original file, the hash function is considered broken.
When a bad actor tries to find two inputs producing the same hash value, it is known as a collision attack.
Theoretically, a computer with infinite computational power can break cryptographic checksums by trying out many possibilities in a brute-force attack. Nonetheless, this approach is not feasible since it would require complex calculations that cannot be completed in a reasonable time frame.
See also cryptology, asymmetric cryptography, strong cryptography, quantum cryptography, cryptanalysis, cryptosystem, Rivest-Shamir-Adleman algorithm, stream cipher, private key, public key, session key, International Data Encryption Algorithm and Advanced Encryption Standard.