Compiler Design



Compiler is basically a program which converts all the instructions into the lower level or machine understandable language so that the machine can read or execute the code.


Computers are made up of combination of both software and hardware components. The hardware components just need the electrical chokes which in turn are signaled by 1’s and 0’s. In early years software’s are written in the assembly level language. Compiler took much longer to become established because the generated low level code is not that much efficient than the hand written assembly code.  The very first high level programming language was plankalkul and the compiler associated for the language was created in the year 1951. The term compiler was tossed by Grace Hopper. The first commercially available programming language was FORTRON. IBM introduced the first ever compiler for the FORTRON language. The computers understand the sequence of 0’s and 1’s. it is difficult to write the machine understandable language so, as an alternative we write the program in the machine understandable language. In order to convert the high level language into the machine understandable language we need a component which takes up the responsibility of doing that. That component is technically termed as COMPILER.  Examples for different compilers

ALGOL compilers

BASIC compilers

C compilers

C++ compilers

JAVA compilers

PYTHON compilers

GO compilers


The most primary thing in the programming is language processing. As a developer one should know how the input source code is being executed by the compiler. If we consider any types of application it will majorly consists of database, operating environment, programming language.

Why to study what is compiler?

Consider a black box testing, one who tests will not know the internal structure, but if we consider white box testing one will know the internal structure and will continue with the constraints which will not satisfy. So as a developer if know what actually happening inside we can design the source code with the knowledge of execution procedure.

Example for ALGOL compiler



Persistent S-algol

Example for BASIC compiler



Example for C compiler

Aztec C

Turbo c


Visual C++

Example for JAVA compiler

Edison Design Group




Example for PYTHON compiler




Shed skin

Example for GO compilers





The above mentioned compilers are not suitable for all types of operating systems. Some give their extensive support for the WINDOWS while some for the LINUX.



As of now compiler itself is a thing which will do all the sort of process. But there are lot of components in the compiler which will collaboratively do the things to convert the high level language into the low level language. The components with the compiler are also called as cousins of the compiler.


The initial process of compilation will be done by the preprocessor. When it comes to the real scenario it is not a part of the compiler but in terms it will process the input and the produced output is used by the compiler. The roles which are done by the preprocessor are

Macro processing

File inclusion

Rational preprocessor

Language extensions

Macro processing:

The user defined shorthand notions for the large constructs are identified and are processed in this phase.

if(rating == 5)

Rational preprocessor:

In this phase whenever the flow of control of the program is not optimized and in the traditional  way it is optimized with the modern flow of control. It also incorporates with the data structuring facilities

Language extensions:

In this phase the preprocessors attempt to add the capability of the language by adding the built in macros.


it is difficult to write the instructions in the machine understandable language called as mnemonic codes. For every instruction assembler will convert the assembly level language into the mnemonic code. It is also called as object code.

For example:

a+b; → ADD(a, b)

a-b; → SUB(a, b)

a=b; →MOVE( b, a)

a*b; → MUL(a, b)

a/b; → DIV(a ,b)


The mnemonic codes generated by the assembler are handover to the linker. The linker will convert the object code into the executable code for the program and then it handover to the linker.

The built in libraries and the header files contained in the source code are linked together with their respective functions. Incase if the in built functions are not founded by the linker it will signal the compiler with the error. There are two types of linkers.

Linkage editor

Dynamic linker

Linkage editor:

It is a linker that generates the relocatable, executable machine codes.


Dynamic linker:

It postpones the linking of some external modules until the executable modules is generated. Here the linking is done in the load time or rum time.


The responsibility of the loader is to put the executable code into the main memory and make it ready for the execution. It makes some memory allocation to the executable machine code.

There are three kinds of loading approaches:

Absolute loading

Relocatable loading

Dynamic run-time loading

Absolute loading:

In this process the executable code is loaded into the same memory location of the program. The major drawbacks in this process are if any updation like insertion or deletion occurs, all the address of the executable file must be changed.

Relocatable loading:

Unlike the above loading method it will produce the relative address but not the actual main memory address.

Dynamic run-time loading:

In this method the actual address of the executable machine code is generated when an instruction is executed. This has the facility of swapping or removing the program without being completed its execution. And if we want to place again the program for completion, it will done by changing the address. It is very flexible when compared with the other loading methods.


It is the thing which will translate the high level language into the machine understandable language. Types of compilers are:

Hybrid compiler

Cross compiler

Incremental compiler

Source to source compiler

One pass compiler

Threaded compilers

Hybrid compiler:

For every program there must be a compiler to execute the code. If we want to use multiples languages there must be a compiler which can run all types of languages without passing the errors,

For example:

Java complier produces the byte code and the execution of byte code is taken up by the interpreter. The benefit of this type of network is that it will facilitate is to run the byte code across the network.

Structure of Hybrid compiler:

Cross compilers:

Platform independencies can be achieved by using these type of compilers because the it runs on the host machine and produces output for another machine.

Incremental compilers:

In this method only modified code gets executed avoid the execution of entire code. After compiling the modified code it will merge it with the previous executed code.

Source to source compiler:

It is also a translator but it will not convert the high level language into a low level language, rather it will convert into another high level language..

One pass compiler:

it is a compiler which process the whole execution in only one-pass,

Properties of compiler:

must be error free

must generate the machine code

compiler must satisfy multi threading

must throw error if any

must have consistent optimization

must be fast enough to rum the huge line of code.


it is one of the translation program like compiler but with a change in the translation process. It produces the output directly when the input source code and the data is given to it without producing the object code.

languages that use interpreters are:






The first interpreted language was LISP. And in latter days it was realized that some functions in the LISP are to be compiled rather than interpreted.


A self-interpreter is a thing which is written in its own programming language.

BASIC is a self- interpreted programming language.

Interpretation is carried out in the following phases namely:

Lexical analysis

Syntax analysis

Semantic analysis

Direct execution


The structure of a compiler majorly consists of two phases.

Analysis phase

Synthesis phase


Now a days all the modern compilers are syntax- directed compilers. The majority of the compilation process is done according to the syntactic structure of the source program. The parse will generate the tokens from the input source program and builds the semantic structure out of it. Later the process is carried out to the semantic analysis phase. This phase will perform the dual actions, first it will perform the semantic errors and later it will initialize the synthesis phase.

The synthesis phase will generate the intermediate representation from the input source code to make the machine easy to understand and generate the executable code later on the produced output is sent to the optimizer and performs some optimizing methods if necessary else it will directly send the code to the generator.

Lexical analyzer:

It is also called as scanner. The scanner will take the input source code as the input and scans the program character by character. It will group the individual characters into the tokens. The tokens actually consists of identifiers, reserved words, delimiters etc.. It keeps the program into the uniform specific format as the stream of tokens. The meaning full tokens are called as lexems. It also eliminates the unnecessary information. It also pushes the information about the tokens into the symbol table. Scanner also performs the task of elimination of removing white spaces and the comments The identification of tokens is done by the regular expressions. Scanner uses the function called getnextTokens() to push the next token to the parser.

Examples of scanner generators are:




Syntax analyzer:

It is also called as parser. This will take the stream of tokens generated by the scanner as the input and develop the Abstract Syntax tree as the output. It will group the tokens into the language structures. Parses are created with the help of Context Free Grammars. The parser will verify the syntax of the tokens and will trigger the syntax error if any. This phase will handle the errors if any and maintains the error recovery strategies too.

Examples of scanner generators are:



Java CUP

Semantic analysis:

This phase is also called as type checker because it will take the tree generated by the parser and verifies that the language construct is legal and meaningful. If the construct is semantically correct then type checker adds the values to the construct by giving type of the node or adding symbol table information to it. It is purely dependent on semantic rules which will be defined by SDD’s and SDT’s. if there is any semantic error, it will trigger it and that will be shown in the console.

Intermediate code generator:

This phase is also called as the translator because it will translate the syntax tree into the intermediate representation so that it is easy to convert into the machine understandable language. This process allows the separation of dependencies among the source and the target.

Representation of intermediate code is done in two different ways.

High level representation.

Abstract syntax tree

Direct acyclic graphs

Flow graph

Low level representation

Postfix notation

Three address code

Code optimizer:

This phase is also called as optional phase. If the intermediate representation is good enough to develop the object code, then the compiler will skip the process of optimizing the intermediate representation.

Else, this phase will optimize the intermediate representation. Optimization can be complex; it may involve numerous sub phases, which may need to be applied more than once. Optimizations may be turned off to speed up the  translation.

Code generation:

This is the final phase in the generation of the target code. The intermediate code developed by the translator is used by the generator and produces the final target code. This process is bit difficult because it involves the usage of detailed information about the target machine and uses the machine specific optimizations. There are several issues in the code generation phase because of choice of evaluation order and we have to consider the register allocation etc.. A well-known compiler using automatic code generation techniques is the GNU C compiler.

Symbol table:

It is the main data structure used by every phase of the compiler. It stores the information related to each and every identifier in the input source code. It stores the values  of every identifier.



Software productivity tools:

it is very common to make an error in design the application, but the most easy method to do it as error free is testing the program.

Type checking

Bounds checking

Memory-management tools

Type checking:

It is one of the applications of the compiler design which in turn used to find the error in the application.

For example:

Consider a function which has two integer type arguments, for suppose assume that if we call the function without any arguments or with different data type arguments, then it will throw a error message in the console.

int internship_expenditure(int BASIC, double DA)



When we call it by neglecting any one of the argument


int internship_expenditure( 27000)

It will trigger the error and show what is the bug and where it  is located, this is one of the application of compiler design.

Bounds checking:

In modern days C compiler like Devc++ is able to identify the array out of bound error. But in the former days C compilers do not have the ability to find the array  out of bound error. When we keep the array size as static and when we make the program to flush the data which is more than the size of an array then in former days it used to accept it. So here comes the security breach, an individual can easily penetrate into our application and can modify the data to meet his requirement.


Memory management tools:

The major issue in the C and C++ programs is memory leaks. To resolve this issue many tools have been developed now a day. Purify is a tool which is used for resolving the memory leaks.

Garbage collection is another application of the compiler design.

Program translations:

Translation is a process of changing one language to another language

Binary translation

Hardware Synthesis

Binary translation:

This is used to translate the binary code for one machine to that of others, which allows the machine to run the executable files which are actually compiled for another instruction set.

Hardware synthesis:

Not only software programs in written in high level language, but the hardware programs are also written in the high level language. Hardware designs are typically described at the register transfer level (RTL), where variables represent registers and expressions represent combinational logic. Hardware synthesis tools simulate the language into the respective gate. These are used for the hardware optimization too.

Design of new computer architectures:

Compiler architecture is having the module of optimizing or reducing the instructions to a small number of simpler operations by eliminating the duplications among the complex instruction sets.




The early computers use the CISC architecture which is very complex in their design aspects. They also contain the memory-addressing nodes to support the data structuring facilities and maintain a stack for the register and pass parameters.

Examples: x86


To overcome the issues in the CISC, this was developed. The instruction set is reduced in RISC.

Example: PowerPC, SPARC, MIPS, Alpha, and PA-RISC.

Optimization of computer architectures:

The main things to be considered in every machine are about parallelism and memory hierarchy.




The actual meaning of the above coined word is executing the different threads of the same program at the same time. But when it comes to the code, the user generally assumes that all the instructions are executed sequentially, but the complier identifies the dependencies among the instructions and then executes them parallelly. Even the hardware also reorders the instruction sets.


There are different levels in the memory hierarchy. Each of them will differ in speeds and sizes. The average memory access time of an instruction can be reduced by giving the fast accessing memories to the program. Both parallelism and the existence of a memory hierarchy improve the potential performance of a machine.

Some of the specialized architectures are

SIMD (single instruction multiple data)

SISD (single instruction single data)

MIMD (multiple instruction multiple data)

MISD (multiple instruction multiple data)

VLIW (very long instruction word)

5. Implementation of programming languages

Compiler helps in implementing new futuristic programming languages. Initially C language was found in the year 1972 as a structure oriented language. Upon later years JAVA was developed in 1996 as a new trending object oriented language. At that time it has the features of data abstraction and inheritance mechanism. Compiler enables the developers to found more and more features later on by developing new module which are to be integrated to the programming languages.

Contributor's Info