James Steidl - Fotolia
SHA-1 collision: How the attack completely breaks the hash function
Google and CWI researchers have successfully developed a SHA-1 attack where two pieces of data create the same hash value -- or collide. Expert Michael Cobb explains how this attack works.
Cryptographic hash functions are widely used in many aspects of security, such as digital signatures and data integrity checks. The short digital fingerprint generated from an electronic file, message or block of data is called a message digest or hash value. A key property of a secure cryptographic hash function is strong collision resistance -- that is, it should be computationally infeasible to find two different inputs that create the same hash value.
SHA-1 was designed more than 20 years ago, and it's been known for some time that the hash algorithm has weaknesses that make it potentially vulnerable to collisions attacks. Now, those weaknesses are no longer theoretical, as researchers from the Centrum Wiskunde & Informatica Institute in Amsterdam and Google have successfully developed a practical technique for generating a SHA-1 collision.
How a SHA-1 collision works
A SHA-1 collision occurs when two distinct pieces of data hash to the same message digest. If an attacker can craft a hash collision, they could use it to create two different files that share the same SHA-1 hash value. Systems that rely on hashes to validate the authenticity of data such as code repositories and backups could be deceived into accepting a malicious file in place of the genuine file. An example given by the researchers is of a malicious landlord crafting two colliding PDF files containing two identical rental agreements, except one has a vastly higher rent. This attack could be used to obtain a valid signature for the contract with a high rent by having a victim sign the contract stating a lower rent.
One method of breaking a cipher is through cryptanalysis; finding a weakness in the cipher that can be exploited with a complexity less than brute force. The SHA-1 collision attack requires significant computational resources, but it is still 100,000 times faster than a brute-force effort. Named the "SHAttered Attack," it is based on an identical-prefix collision attack: two files have the same predetermined beginning, followed by different inputs and an optional amount of identical data.
This attack technique doesn't allow an attacker to generate a collision with an existing file. For example, it's not possible to use this method to generate a malicious executable file which matches the signature of an existing legitimate executable. However, it would be possible for an attacker to generate two executable files which have the same SHA-1 hash but perform different actions when they run. Once Google releases the code behind the attack, anyone will be able to create pairs of PDF files that hash to the same SHA-1 sum, with two distinct images and certain preconditions.
This research certainly proves SHA-1 is broken, though finding and generating collisions still requires a lot of computational effort. According to Google, computing the SHA-1 collision was one of the largest computations ever completed. Even so, security experts who have been advising moving away from SHA-1 for more than 10 years said it is even more urgent to migrate to safer alternatives, such as SHA-2 and SHA-3. The National Institute of Standards and Technology has said that SHA-2 is still "secure and suitable for general use," as it doesn't suffer from SHA-1's mathematical weaknesses, and along with the SHA-3 family of cryptographic hash algorithms, are now the only ones approved by NIST for digital signature generation. Although the SHA-2 family includes SHA-224, only the stronger SHA-256, SHA-384 and SHA-512 algorithms are allowed by the CA/Browser Forum's baseline requirements for the issuance and management of publicly trusted certificates.
Preventing a hash collision
Some vendors and service providers have taken steps to address this attack. Google has already added protections to Gmail and Google Drive to detect files in which the collision technique has been used. Microsoft Research posted an open source library and command-line tool on GitHub for detecting cryptanalytic collision attacks against SHA-1 present in each file, while shattered.io includes a drag-and-drop tool for detecting files that have been crafted to produce a SHA-1 collision.
SHA-1 was officially deprecated by NIST in 2011, but it still remains widely used for document and Transport Layer Security certificate signatures and in software such as the Git versioning system for integrity and backup purposes. Where SHA-1 is used for integrity checks, it's important to always provide multiple types of hash values -- MD5, SHA-1 and SHA-256, for example -- as the other hash values would not match if a file had been maliciously replaced.
Applications still using SHA-1 should be upgraded. The areas that will require the most work are legacy systems that make SSL connections and software and hardware such as game consoles, phones and embedded devices that rely on hard-coded certificates. These certificates will all need to be replaced and have the software updated if they are unable to support SHA-2 encryption. Web masters should have already requested new SHA-2 certificates to replace any that use SHA-1 and that expired after Jan. 1, 2017; otherwise, they will not be trusted by major browsers.
Delaying updating systems will result in them being unable to communicate with the rest of the connected world, which not only risks major security ramifications, but also significant business disruption. It is also a best practice to implement a reliable key rotation procedure, as it allows for faster operational reaction time when new algorithm weaknesses are discovered. The cryptographic landscape is constantly changing, so to stay abreast of the latest developments follow the news and recommendations from standards bodies such as NIST.