Use software forensics to uncover the identity of attackers
By analyzing the proverbial fingerprints of malicious software -- its program code -- infosec pros can gain meaningful insights into an attacker's intent and identity.
The following is an excerpt from the Official (ISC)2 Guide to the CISSP CBK, fourth edition, edited by Adam Gordon, CISSP-ISSAP, ISSMP, SSCP. This section from Domain 8 explores how software forensics techniques can be useful in piecing together the identity of those responsible for malicious activity.
Software, particularly malicious software, has traditionally been seen in terms of a tool for the attacker. The only value that has been seen in the study of such software is in regard to protection against malicious code. However, experience in the virus research field, and more recent studies in detecting plagiarism, indicates that evidence of intention can be gained, and cultural and individual identity, from the examination of software itself. Although most would see software forensics strictly as a tool for assurance, in software development and acquisition it has a number of uses in protective procedures. Outside of vims research, forensic programming is a little known field. However, the larger computer science world is starting to take note of software forensics. It involves the analysis of program code, generally object or machine language code, to make a determination of or provide evidence for the intent or authorship of a program.
Uses of forensics in software analysis
Software forensics has a number of possible uses. In analyzing software suspected of being malicious, it can be used to determine whether a problem is a result of carelessness or was deliberately introduced as a payload. Information can be obtained about authorship and the culture behind a given programmer and the sequence in which related programs were written. This can be used to provide evidence about a suspected author of a program or to determine intellectual property issues. The techniques behind software forensics can sometimes also be used to recover source code that has been lost.
Software forensics generally deals with two different types of code. The first is source code, which is relatively legible to people. Analysis of source code is often referred to as code analysis and is closely related to literary analysis. The second; analysis of object, machine, or code, is generally referred to as forensic programming.
Literary analysis has contributed much to code analysis and is an older and more mature field. It is referred to, variously, as authorship analysis, stylistics, stylometry, forensic linguistics, or forensic stylistics. Stylistic or stylometric analysis of messages and text may provide information and evidence that can be used for identification or confirmation of identity.
Physical fingerprint evidence frequently does not help identify a perpetrator in terms of finding the person that a fingerprint is obtained from. However, a fingerprint can confirm an identity or place a person at the scene of a crime once a suspect is determined. In the same way, the evidence gathered from analyzing the text of a message, or a body of messages, may help to confirm that a given individual or suspect is the person who created the fraudulent postings. Both the content and the syntactical structure of text can provide evidence that relates to an individual.
Some of the evidence discovered through software forensics may not relate to individuals. Some information, particularly that relating to the content or phrasing of the text, may relate to a group of people who work together, influence each other, or are influenced from a single outside source. This data can still be of use to us, in that it will provide clues in regard to a group that the author may be associated with, and may be helpful in buildings profile of the writer. Groups may also use common tools. Various types of tools, such as word processors or databases, may be commonly used by groups and provide similar evidence. In software analysis, indications of languages, specific compilers, and other development tools can be found. Compilers leave definite traces in programs and can be specifically identified.
Languages leave indications in the types of functions and structures supported. Other types of software development tools may contribute to the structural architecture of the program or the regularity and reuse of modules.
Software forensics: Finding clues in text
In regard to programming, it is possible to trace indications of cultures and styles in programming. A very broad example is the difference between design of programs in the Microsoft Windows environment and the UNIX environment. Windows programs tend to be large and monolithic, with the most complete set of functions possible built into the main program, large central program files, and calls to related application function libraries. UNIX programs tend to be individually small, with calls to a number of single-function utilities.
Evidence of cultural influences exists right down to the machine-code level. Those who work with assembler and machine code know that a given function can be coded in a variety of ways, and that there may be a number of algorithms to accomplish the same end. For example, it is possible to note, for a given function, whether the programming was intended to accomplish the task in a minimum amount of memory space (tight code), a minimum number of machine cycles (high-performance code), or a minimal effort on the part of the programmer (sloppy code).
In software forensics, the syntax of text tends to be characteristic. Does the author always use simple sentences? Always use compound sentences? Have a specific preference when a mix of forms is used? Syntactical patterns have been used in programs that detect plagiarism in written papers. The same kind of analysis can be applied to source code for programs, finding identity between the overall structure of code even when functional units are not considered. A number of such plagiarism detection programs are available, and the methods that they use can assist with this type of forensic study. Errors in the text or program can be extremely helpful in our analysis and should be identified for further study.
It may be important to distinguish between issues of style and stylometry when we are dealing with authorship analysis. Literary critics, and anyone with a writing background, may be prejudiced against technologies that ignore content and concentrate on other factors. Although techniques such as cusum analysis have been proven to work in practice, the still engender unreasoning opposition from many who fail to understand that material can contain features quite apart from the content and meaning.
It may seem strange to use meaningless features as evidence in software forensics. However, Richard Forsyth reported on studies and experiments that found that short substrings of letter sequences can be effective in identifying authors. Even a relative count of the use of single letters can be characteristic of authors.
Certain message formats may provide additional information. A number of Microsoft email systems include a data block with every message that is sent. To most readers, this block contains meaningless garbage. However, it may include a variety of information useful in software forensics, such as part of the structure of the file system on the sender’s machine, the sender’s registered identity, programs in use, and so forth.
Other programs may add information that can be used. Microsoft’s word processing program, Word, for example, is frequently used to create documents sent by email. Word documents include information about file system structure, the author’s name (and possibly company), and a global user ID. Comments and ‘deleted' sections of text may be retained in Word files and simply marked as hidden to present them from being displayed. Simple utility tools can recover this information from the file itself.
CISSP® is a registered mark of (ISC)².