Definition

What is a compiler?

A compiler is a special program that translates a programming language's source code into machine code, bytecode or another programming language. The source code is typically written in a high-level, human-readable language, such as Java or C++.

A programmer writes the source code in a code editor or an integrated development environment that includes an editor, saving the source code to one or more text files. A compiler that supports the source programming language reads the files, analyzes the code and translates it into a format suitable for the target platform.

What is the main purpose of a compiler?

Many modern-day computer programs are written in high-level programming languages, like Java, C++ or Python. However, machines cannot understand these programs as written -- much less execute them. The programs must first be translated into a language that a computer can understand. The main purpose of a compiler is to translate a program from a complex, high-level language into a simpler format that the machine can read, understand and execute.

Regardless of the source language or the type of output, a compiler must ensure that the logic of the output code always matches that of the input code and that nothing is lost when converting the code.

What language is a compiler written in?

A compiler is written in some high-level programming language, like Java or C++. A compiler that's written in the same language that it compiles is known as a bootstrap compiler.

That said, most compilers can accept inputs written in a language that's different from the language in which they are written, and they can translate the code in that different language into machine code or bytecode. For example, a compiler written in Java may be able to compile source code written in C.

Some compilers, known as transcompilers or transpilers, can translate between high-level languages, regardless of the language they are written in themselves. Compilers that translate forms of expressions without changing the language are known simply as language rewriters.

Types of compilers

Some compilers translate source code into machine code. These programs can be native compilers or cross-compilers. The output of native compilers runs on the same type of computer and operating system (OS) as the compiler. In contrast, cross-compilers compile code on a computer with a different OS from the computer on which the code is produced. Cross-compilation is useful when the code is meant to run seamlessly on machines with different OSes or hardware configurations.

Machine code refers to the lowest-level code -- in the form of binary instructions and data -- that a computer can understand and execute. Other compilers translate source code into bytecode. Bytecode, which was first introduced in the Java programming language, is an intermediate language that can be executed on any system platform running a Java virtual machine (JVM) or bytecode interpreter.

Compilers that translate source code to machine code usually target specific OSes and computer architectures. This type of output is sometimes referred to as object code, which is not related to object-oriented programming. The outputted machine code is made up entirely of binary bits -- ones and zeros -- so it can be read and executed by the processors on the target computers. For example, a compiler might output machine code for the Linux x64 platform or Linux Arm 64-bit platform.

When a compiler translates source code into bytecode, it is known as a bytecode compiler. The JVM or interpreter converts the bytecode into instructions that can be executed by the hardware processor. A JVM also makes it possible for the bytecode to be recompiled by a just-in-time (JIT) compiler.

Some compilers can translate source code written in one high-level programming language into another high-level programming language. This type of compiler might be referred to as a transpiler, transcompiler, or source-to-source translator, or it might go by another name. For example, a developer might use a transpiler to convert COBOL to Java.

Compilers are also available that can translate code from a low-level language to a high-level language. These are typically known as decompilers.

How does a compiler work?

Compilers vary in the methods they use for analyzing and converting source code to output code. Despite their differences, they typically carry out these steps:

  1. Lexical analysis. The compiler splits the source code into lexemes, which are individual code fragments that represent specific patterns in the code. The lexemes are then tokenized, i.e., they are organized into meaningful character sequences called tokens, in preparation for the next steps of syntax analysis and semantic analysis.
  2. Syntax analysis. The compiler interprets the meaning of the tokens created during lexical analysis and verifies that the code's syntax is correct, based on the rules for the source language. This process is also referred to as parsing. During this step, the compiler typically creates abstract syntax trees that represent the logical structures of specific code elements. It also detects syntax errors in the code.
  3. Semantic analysis. The compiler verifies the validity of the code's logic. This step goes beyond syntax analysis by validating the code's accuracy. For example, it might check whether variables have been assigned the right types or have been properly declared or if control structures and data types are used properly.
  4. Intermediate representation (IR) code generation. After the code passes through all three analysis phases, the compiler generates an IR of the source code. The IR code is portable, meaning it is machine-independent and can run on different machines. It also makes it easier to analyze and translate the high-level source code into a different, machine-readable format. However, it must accurately represent the source code in every respect without omitting any functionality.
  5. Optimization. The compiler optimizes the IR code in preparation for the final code generation. The type and extent of optimization depends on the compiler, although most compilers aim to enhance the program's speed, efficiency and resource utilization, e.g., CPU or memory. Some compilers let users configure the degree of optimization.
  6. Output code generation. The compiler generates the final output code, using the optimized IR code, to create code that the machine can finally execute.
typical compiler steps
For most compilers, these are the general steps they take when converting written programming instructions into executable code.

Difference between compiler and interpreter

Compilers are sometimes confused with programs called interpreters. Compilers and interpreters are similar in the sense that they both translate human-written source code into machine-readable machine code. However, they differ in the way they work.

Compilers analyze and convert source code written in languages such as Java, C++, C# or Swift. They're commonly used to generate machine code or bytecode that can be executed by the target host system.

Interpreters do not generate IR code or save generated machine code. They process the code one statement at a time at runtime without preconverting the code or preparing it in advance for a particular platform. Interpreters are used for code written in scripting languages, such as Perl, PHP, Ruby or Python.

Thus, where a compiler translates and executes the entire source code into machine code or bytecode, an interpreter does the same line by line. This difference in operation can be both an advantage and a disadvantage for programmers.

Since a compiler runs the entire code at the same time, the program runs faster, particularly if the programmer has already optimized the code. Another advantage of compilers is that they generate an output in the form of distributable executable files with the source code hidden. Compilers thus provide greater security. And, since the code is permanently saved, programmers can reuse it in the future without having to start from scratch.

Most compilers provide debugging tools, which enable programmers to identify and fix semantic or syntax errors. That said, compilation speed can go down if the code is bulky or complex. Also, the programmer has to wait for the entire program to be compiled and translated before they can identify and fix errors.

Interpreters take little time to analyze the source code and display errors in every line. The main drawback of interpreted programs is that they run slower because the code is translated one line at a time. However, this can also be an advantage because programmers can fix errors as soon as they are flagged by the interpreter. On the other hand, fixing errors in such a piecemeal fashion can slow down code development and optimization.

Interpreters are generally smaller than compilers. As a result, they use fewer CPU resources and rarely create memory error risks. However, they do increase security risks since the source code is exposed.

These differences notwithstanding, some compilers can run inside an interpreter and compile the source code only at runtime. These programs are known as JIT compilers. Used with many modern programming languages, like Java and Python, JIT compilation includes an intermediary step of translating source code into bytecode. A bytecode interpreter executes the bytecode, which is then translated into machine code by the JIT compiler.

Difference between compiler and assembler

Unlike a compiler, which takes in high-level code and translates it into machine code, an assembler translates human-readable code written in a low-level assembly language into machine code. Its main purpose is to convert every assembly instruction into its equivalent machine code instruction.

Like interpreters, assemblers translate code line by line. They create an output in the form of binary code, which is placed in an output file. Every assembly language is designed for a specific computer architecture, so assembly languages are not universal.

Explore cloud programming languages developers need to know. Learn about a modern approach to enterprise software development and the differences between functional vs. object-oriented programming. Check out this breakdown of object-oriented programming concepts.

This was last updated in April 2025

Continue Reading About What is a compiler?