Getty Images/iStockphoto

Tip

How to check and verify file integrity

Organizations planning content migrations should verify file integrity to make sure files weren't corrupted during the move. File validation can keep critical data secure.

To secure sensitive data and systems, organizations must check file integrity.

As organizations migrate documents, records and data from one environment to another, content managers, records managers and security professionals must validate files by ensuring all files migrated or have remained unaltered, without corruption from security breaches or bugs in a migration script or during storage, transfer or processing.

Whether an organization aims to comply with security standards or just secure data from attacks, a plan to check file integrity for critical data can preserve the system and maintain data health and security. This plan protects against data corruption and unauthorized modifications of essential data.

What does it mean to verify file integrity?

Attacks or accidental corruption can introduce invalid files into a content repository. Verifying file integrity can identify issues, including those introduced by hackers. In the verification process, teams compare a file's digital signature or hashed content with known values to ensure it wasn't altered or corrupted.

A plan to check file integrity for critical data can preserve the system and maintain data health and security.

Some manual processes and automated checksum validation might not detect changes in a file, so corruption can lurk beneath the surface. IT teams can validate a digital signature or use a cryptographic checksum -- in which they run a hash algorithm against the file -- to check and verify file integrity.

Validation enables teams to find any changes to the file itself -- such as deletions, edits or movements -- or unauthorized access. These changes can reveal a prior intrusion from start to finish or indicate a larger attack that is underway or that a team is investigating.

Steps to check and verify file integrity

Because content migration provides a before and an after, content teams can compare files from each phase to help validate them. Teams can ask simple questions to start validation, including the following:

  • Based on a comparison of file names, do all the expected files show up?
  • Do the files have the same checksum?
  • Is the metadata identical? Or does it differ only where required, like accommodating differences in repositories?

However, these questions can't fully verify if files remained untouched. To thoroughly check file integrity, teams should take the following steps.

1. Evaluate what cybersecurity standards to follow

Depending on the industry's best practices, organizations may need to comply with security standards, including the following:

  • Payment Card Industry Data Security Standard.
  • Sarbanes-Oxley Act, which helps comply with monitoring requirements for financial records.
  • NIST Cybersecurity Framework and related NIST documents, including NIST Special Publications 800-53 and 800-171, which provide specific controls for integrity verification.
  • Center for Internet Security Critical Security Controls.
  • North American Electric Reliability Corporation Critical Infrastructure Protection.
  • ISO/IEC 27001: Information security standards.
  • ISO/IEC 27040: Data storage security standards.
  • Mitre ATT&CK framework.

Compliance with broader privacy requirements, such as HIPAA and GDPR -- which may not specifically mention file integrity -- can help organizations learn the importance of file validation and gain essential context for the project.

A list of several cybersecurity standards and regulations
Following specific cybersecurity standards can help content and security teams properly verify file integrity.

2. Choose files to verify

As digital assets have grown immensely over the years, content teams have lost the ability to monitor each file's integrity. Teams should choose a collection of files that are critical to business operations or that contain sensitive personal, health or customer data and focus verification efforts on that content.

3. Choose a file verification process and tools

For the verification process, teams can use cryptographic hashes or automated metadata validation to ensure data like the file extension, size, version, creation and modification date, last user ID and any other metadata have not changed. Teams could also generate and compare digital signatures on files.

The tools that teams choose depend on the verification process. Security firms, such as Kaspersky Lab, Qualys and McAfee, and analytics providers, such as Splunk, offer relevant tools for this process. In turn, the verification process varies depending on the use of different tools and methods, such as API calls to the OS or the content management system interacting with its file server.

4. Compare two copies of each file

Compare the content of multiple files to check if they are identical. Teams can do this with file comparison tools available in the OS.

Windows: Use the fc command at the command prompt.

fc /b file1.txt file2.txt

Linux: Use the diff command.

diff -q file1.txt file2.txt
A screenshot showing the Linux diff command.
Example output of the Linux diff command

5. Generate and compare hash values

Create a hash value -- checksum -- for the original file, and compare it to the hash of the received or stored file. Matching hash values can confirm file integrity. Common hash algorithms include Message Digest Algorithm 5, Secure Hash Algorithm 1 and SHA-256.

Windows: Use the certutil utility.

certutil -hashfile file.txt SHA256

Linux: Use the sha256sum command.

sha256sum file.txt
A screenshot showing the Linux sha256sum command.
Example output of the Linux sha256sum command

Once hash values are available for two versions of a file, compare them to see if the files are identical. Even a small change can radically alter the hash value.

6. Verify file metadata

Check file metadata, such as size, creation date and modification date, for consistency. Unexpected changes in metadata can indicate tampering, and the metadata itself can have business value.

Windows: Use PowerShell.

Get-Item "C:\path\to\file.txt" | Select-Object *

Linux: Use the stat command.

stat /path/to/file.txt
A screenshot showing the Linux stat command.
Example output of the Linux stat command.

7. Use file integrity monitoring tools

Going forward, consider file integrity monitoring tools, such as Tripwire File Integrity Monitoring or OSSEC, to continuously monitor critical files, detect unauthorized modifications and prevent security breaches.

8. Start small and repeat

When teams first learn how to check and verify file integrity, they must experiment with small subsets of files to ensure they see changes where necessary and the file integrity checks don't strain network or system performance. Teams should balance compliance and verification with performance and usability.

Learning how to check file integrity can keep systems compliant with security and data standards and reduce the risk of catastrophic attacks from hackers.

Editor's note: This article was originally published in 2022 and updated to reflect changes in version control best practices.

Jordan Jones is a writer versed in enterprise content management, component content management, web content management and video-on-demand technologies.

Next Steps

MAM vs. DAM: What's the difference?

Dig Deeper on Information management and governance