Definition

What is a checksum?

A checksum is a value that represents the number of bits in a transmission message. IT professionals use it to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function. The term checksum is also sometimes substituted with the terms hash sum or hash value.

Why apply checksums

The primary aim of calculating checksums is to detect changes in the data. These changes might be the result of error or manipulation. Regardless of the cause, a checksum that's calculated using a checksum algorithm produces a fixed-size string or number that enables users to verify the integrity of the sent data and confirm that it has not been altered or tampered with during transmission.

Apart from malicious tampering, data integrity might also be affected due to accidental errors. These errors might be introduced during data transmission or data storage. Checksums can help users to detect these errors. They do this by calculating the checksum of the received data and comparing it to the checksum provided from the original data set. If there is a difference, they can take appropriate action. For example, they can re-download the file or ask the sender to resend the message.

How checksums work

Checksums work by giving the party on the receiving end information about the transmission to ensure that the full range of data is delivered. The checksum value itself is unique -- typically a long string of letters and numbers. It is calculated from the data object and appended to the sent packet.

On the sender's end, a checksum generator uses a checksum algorithm to calculate the checksum value from the data object that will be sent to the recipient. The generator divides the data object into equal subunits of n-bit length (usually 16 bits) and adds the subunits using one's complement method to arrive at a sum of n bits. This complemented sum value (checksum) is added to the end of the original data object and transmitted with it. It acts as a sort of fingerprint for a file or set of files to indicate the number of bits included in the transmission.

The user at the receiving end also calculates the checksum value, this time using a checksum checker. If this value is even slightly different (non-zero) from the checksum value of the original file, it can alert all parties in the transmission that the file was corrupted or possibly tampered with by a third party, such as in the case of malware. From there, the receiver can investigate what went wrong or try downloading the file again.

How to calculate checksums

The simplest way to calculate a checksum is to do the following:

  • Add all the byte values in a message.
  • Use the least-significant byte of the sum as the checksum byte.

Two main operations used in checksum calculations are sum and shift.

Consider two hexadecimal digits of the checksum P1 and P2.

If i represents the increment and if H(i) denotes the shift of digit i, here's how the checksum will be calculated for the two hex digits P1 and P2:

  1. Start with i = 0
  2. Start with P1 = 0 and P2 = 0.
  3. Let P1 = P1 + D(i + 1) #Sum.
  4. Let P2 = P2 + D(i + 2) #Sum.
  5. Let P1 = H(P1).
  6. Let P2 = H(P2).
  7. Let i = i + 2.
  8. Is i < 32? Then go back to steps 2 and 3 and repeat; otherwise, go to step 9.
  9. P1 is the first checksum digit, P2 is the second checksum digit.

The common protocols used to determine checksum numbers are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). TCP is typically more reliable for tracking transmitted packets of data, but UDP might be beneficial to avoid slowing down transmission.

Advantages of checksum

A basic checksum algorithm is adequate to verify data integrity and authenticity in numerous applications, and it can detect if any malicious tampering or accidental errors have occurred as the data is being transmitted using public/private networks, clouds or hard drives. Checksums provide an early warning to prevent data losses due to unexpected or unintended events like viruses, malware or deliberate corruption attempts. They also provide a way for users to detect incomplete data transfers, which might result due to accidental file edits or deletions.

Checksums are also useful during data storage. For example, data stored in shared drives and accessed by multiple people via the internet might be inadvertently modified or maliciously tampered with. It might also be duplicated, with different files containing the same data stored in different locations. Applying a checksum can help detect such events and thus increase accountability in the system. Also, if the data is stored over long periods, checksums are useful to ensure its security and authenticity.

Finally, checksums are helpful to create data inventories for archival purposes. The archived data might no longer be in active use or it might be ingested from obsolete storage devices (CDs, cassettes, etc.). By calculating a unique string of characters from a data file , the integrity of that file can be verified, thus ensuring that data was not corrupted during the transfer.

Disadvantages of checksum

While checksums are useful for detecting transmission errors or data manipulation, they cannot detect all errors. Such limited detection capability means that errors like byte rearrangements (bytes in the wrong order), missing bytes, or zero-value bytes can be missed, resulting in loss of data integrity and incorrect communications even if checksums are used.

Also, checksums are not the most reliable method to secure data transmissions. The values can be manipulated or forged by malicious parties. These actors might also use sophisticated methods to compromise data, rendering checksums inadequate for ensuring data integrity.

Another drawback: Checksums can only detect errors. There is no built-in mechanism to fix the errors and recover corrupted data. Finally, checksum calculations add complexity and overhead to the data transfer process.

Checksum applications and use cases

Data integrity verification is important for many applications. Because one of the main purposes of checksum is to verify data integrity, this method is typically used for the following:

  • Network communications.
  • Cybersecurity (to ensure data confidentiality, integrity and availability -- also known as the CIA Triad).
  • Data storage and archival.
  • Verification of log files.
  • Software distribution and updates.
  • Know Your Customer (KYC) verifications (e.g., in banking).
  • E-commerce product availability checks.
  • Bill payment verification.
  • Verification and communication of test results in healthcare.

What can cause an inconsistent checksum number?

While checksum values that don't match can signal something went wrong during transmission, a few factors can cause this to happen, such as the following:

  • An interruption in the internet or network connection.
  • Storage or space issues, including problems with the hard drive.
  • A corrupted disk or corrupted file.
  • A third party interfering with the transfer of data.

All of the above events can result in the alteration of data during transmission, resulting in a different checksum than the original. However, not all the incidents indicate data tampering or result in data losses.

Common types of checksum algorithms

There are multiple cryptographic hash functions that can be or have been used to generate checksum values. A few common ones include the following:

  • Secure Hash Algorithm (SHA) 0. This hash function, created in 1993, was the first of its kind. It produced a 160-bit output that acted as a "fingerprint" of the input. However, it was withdrawn in 1996 after the NIST discovered that its "collision security strength" is significantly lower than an ideal hash function, making it susceptible to many kinds of cyberattacks.
  • SHA-1. SHA-1 was first published in 1996. Like SHA-0, it generates an output hash value of 160 bits. By 2010, this hash function was no longer considered secure. The NIST also recommended that the algorithm not be used for generating digital signatures or to protect sensitive information after Dec. 31, 2010.
  • SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512). This family of hash functions relies on the size of the file and numbers to create a checksum value. An improvement compared to SHA-0 and SHA-1, each algorithm from this family has a different hash length, which inherently means a different level of security. This explains why SHA-2 has been implemented in many security protocols, including Transport Layer Security, Secure Sockets Layer and IPSec. Even so, the resulting checksums are vulnerable to length extension attacks, which involve a hacker reconstructing the internal state of a file by learning its hash digest. Also, some older applications and operating systems do not support SHA-2, creating compatibility issues that cause disruptions and affect user experiences.
  • Message Digest 5 (MD5). The MD5 hash function creates a checksum value, but each file won't necessarily have a unique number. So, it's open to vulnerabilities if a hacker swaps out a file with the same checksum value (known as a collision attack). This is why MD5 is only suitable to check for corruption in a file. It is not advisable to rely on this hash function to verify the file's authenticity.
Diagram illustrating the steps in MD5 Hashing, input, process, output
The computing process of an MD5 hash.

How to check an MD5 checksum

To verify that a file hasn't been tampered with, the MD5 hash of that file should be confirmed. Also, when installing drivers or patches, it's important to ensure that the downloaded files are complete.

The MD5 hash is a cryptographic checksum that can be checked on either a PC or laptop. It can also be verified on different operating systems, such as Microsoft Windows, Linux and Apple macOS.

Image of a computer screen highlighting the way to confirm an MD5 hash.
How to check the MD5 hash on Windows.

Verifying an MD5 checksum on Windows

  1. Open Windows PowerShell or the command line by clicking on the Windows button on the Start menu.
  2. Once on the command prompt, type cmd in the search box, and press Enter. Alternatively, press the Windows button and R, type cmd, and press Enter.
  3. Go to the folder that contains the file whose MD5 checksum needs to be verified by typing cd followed by the path to the folder that the file resides in. Alternatively, the required folder can be dragged and dropped from Windows Explorer to insert the path.
  4. Type certutil -hashfile <file> MD5. Replace <file> with the file name.
  5. Press Enter.

The result of the checksum can be compared and verified with the expected results.

Verifying the MD5 checksum on a Mac

  1. Open Terminal.
  2. Navigate to the folder that contains the file whose MD5 checksum needs verification. Alternatively, for direct route, Terminal can be open right at a folder from Finder.
  3. Type md5 <file>, and replace <file> with the file name. Alternatively, the file can also be dragged and dropped into the Terminal window after typing md5.
  4. Press Enter.

The result of the checksum can be compared and verified with the expected results.

When planning content migrations, it's imperative to ensure that files aren't corrupted during the move. Discover how to check and verify the integrity of files.

This was last updated in February 2025

Continue Reading About What is a checksum?

Dig Deeper on Data security and privacy