##### Recursive Descent Parser

A recursive descent parser is a type of parsing tool that works on a recursive basis, in other words, on the basis of using one instance of a command or event to generate another. Recursive descent parsers can be used to parse different types of code, such as XML, or other inputs. They are a specific type of parsing technology that can involve nested or built-in subsequent operations.

=> Key Points:

• Recursive descent parser is a top-down parser.
•  The parsing program consists of a set of procedures, one for each non-terminal.
•  The process begins with the procedure for start symbol.
• The start symbol is placed at the root node and on encountering each non- terminal, the procedure concerned is called to expand the non-terminal with its corresponding production.
•  The procedure is called recursively until all non-terminals are expanded.
•  Successful completion occurs when the scan over the entire input string is done. ie., all terminals in the sentence are derived by parse tree.
• General recursive-descent may require backtracking; that is, it may require
repeated scans over the input. However, backtracking is rarely needed to parse programming language constructs, so backtracking parsers are not seen frequently.

Reference is Compiler tools technique by A.V.Aho.

##### LL(1) Parser and LL(1) Parsing Table Notes

LL(1) Parsers:

A top-down parser that uses a one-token lookahead is called an LL(1) parser.

• The first L indicates that the input is read from left to right.
• The second L says that it produces a left-to-right derivation.
• And the 1 says that it uses one lookahead token. (Some parsers look ahead at the next 2 tokens or even more than that.)

Notes on LL(1) Parsing Tables:

If any entry is multiply defined then G is not LL(1). Possible reasons:

•  If G is ambiguous
•  If G is left recursive
•  If G is not left-factored
•  Grammar is not LL(1)
•  Most programming language grammars are not LL(1)

=> Some can be made LL(1) though but others can’t There are tools that build LL(1) tables  E.g. LLGen

##### Derivation Tree(1)

In a derivation tree, the root is the start variable, all internal nodes are labeled with variables, while all leaves are labeled with terminals. The children of an internal node are labeled from left to right with the right-hand side of the production used.

Example: S → 0S1S | 1S0S | ε

The derivation for 011100  for leftmost derivation is : S =⇒ 0S1S =⇒ 01S =⇒ 011S0S =⇒ 0111S0S0S =⇒ 01110S0S =⇒ 011100S =⇒ 011100 ##### Ambiguous grammar(1)

Ambiguous grammar:

=> If for a given grammer, for a given string if we get more than one parse tree, more than one Left Most Derivate or more than one Right most derivate, then the grammar is called as Ambiguous grammar.

=> Most of the parser does not allow Ambiguity.

=> There is no algorithm to check ambiguity So you can say that Ambiguity problem is undecidable.

=> Ambiguity is a Property of Grammars, not Languages.

Example:-

A → A + A | A − A | A * A | a

is ambiguous since there are two leftmost derivations for the string a + a + a:

A→ A + A                         A→ A + A

→ a + A                         → A + A + A (First A is replaced by A+A. Replacement of the second A would yield a similar derivation)

→ a + A + A                  → a + A + A

→ a + a + A                   → a + a + A

→ a + a + a                   → a + a + a

##### Elimination of left recursion and left factoring the grammars(1)

1. Left-Recursive Grammar:-

Any production which is in the form of

A --> A α / B  is called Left-Recursive.

Many parsers fail when they see left-recursive grammars.

So convert  left Recursive grammar to right recursive do the following steps:

A --> A α / B

converted as

A --> B A'

A' --> α  A' / ε​

There is a formal technique for eliminating left-recursion from productions

Step One: Direct-Recursion

For each rule which contains a left-recursive option,

A --> A  α| B

introduce a new nonterminal A' and rewrite the rule as

A -->  B A'
A' -->  α  A' / ε​

Thus the production:

E --> E + T | T

is left-recursive with "E" playing the role of "A"," + T" playing the role of α   , and "T" playing the role of  A'. Introducing the new nonterminal E', the production can be replaced by:

E --> T E'
E' -->  ε​ | + T E'

Of course, there may be more than one left-recursive part on the right-hand side. The general rule is to replace:

A --> A α1 | A α2   |.....A αn  |B1   |B 2  ...... | B n

by

A -->  B1A' |  B2A'| .......|  BnA'
A' -->  ε​ |  α1A' | α2  A' | .......|  αnA'

##### Top Down Parsing

TOP-DOWN AND BOTTOM-UP PARSING TECHNIQUES

What is parsing?

The word parsing comes in the phase of syntax analyzer. This phase uses lot of parsing techniques depending upon the priority of obtaining the start symbols and the input symbols. A parser is a compiler or interpreter that breaks the data into smaller chunks depending upon the obtained tokens from the scanner. It builds a data structure in the form of parse tree or abstract syntax tree. Parser will follow the production rules of context free grammar. CFG rules divide the parsing methods into two types.

• Top-down parsing.
• Bottom-up parsing.

Top-down parsing:

In this methodology the parser starts the parsing from the start symbol in order to produce the input or to transform the start symbol into input is called as top down parsing.

Fields where parsers can be used:

• Web technologies like HTML and XML.
• Query languages like SQL.
• Scripting languages like JAVASCRIPT.
• Network protocols such as HTTP.

Key-words involved in pars

•  Left most derivation.
• Right most derivation.
• Ambiguity
• Left recursion
• Left factoring.
• FIRST sets.

Derivation:
Derivation is actually a process of designing the parse tree depending upon the methodology. There are two different types of derivation methodologies for designing of the parse trees.

• Right most derivatives.
• Left most derivative.

Right most derivatives:

The production rule is expanded always from the right end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

Lets check for the string ababa using right most derivation.

Consider 1st production as it is more likely to produce the given string.

AàABA

AB( a) (replace last letter( Aàa) because it is right most derivative).

Aba(replace B with Bàb)

ABAba (replace A with AàABA)

Ababa( replace Aàa)

Ababa(replace Bàb)

ababa(replace Aàa)

String is accepted.

Left-most derivation:

The production rule is expanded always from the left end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

Let’s check for the string ababa using leftmost derivation.

Consider 1st production as it is more likely to produce the given string.

AàABA

aBA(replace Aàa)

abA(replace Bàb)

abABA(replace AàABA)

abaBA(replace Aàa)

ababA(replace Bàb)

ababa(replace Aàa)

String is accepted.

So, if a grammar accepts both derivative forms, then we can say the grammar as ambiguous grammar. For a parser the grammar must be an unambiguous grammar. To make an ambiguous grammar into an unambiguous grammar. There are methods to be followed based upon the production rules.

Left recursion:

AàAα|β

If the given production is in the above-mentioned format, then we can rewrite as,

A→βAl

Al →α Al We have to generalize the rules into the standard format.

E→E+T|T

Check A as E, +T as α, T as β.

E→TEl

E→+TEl|ԑ

T→T*F|F

Check T as A, *F as α, F as β.

T→FTl

Tl→*FTl

So the final productions of the given grammar are:

E→TEl

El→ +TEl|ԑ

T→FTl

Tl→*FTl

F→(E)|id

Left factoring:

It is a method to remove the ambiguity from the CFG.

If the given production is in the below format,

A→αβ1| αβ2| αβ3…….|γ

We have to change it into

A→αAl

Al→ β1| β2| β3……. βn.

Example:

S→iEtSԑ|iEtSes|a

A= S

α= iEtS

β1= ԑ

β2= es

γ= a

So modified productions will be,

S→iEtSSl|a.

Sl→ ԑ|es

FIRST:

The rules for computing the FIRST set are:

• If X is terminal, then FIRST(X) is X.
• If X is a non-terminal and Xà ԑ, then FIRST(X) is ԑ.
• If X is a non-terminal and Xà Y1 Y2 Y3 …..Yk, and if ԑ is there in all of the FIRST(Y) then FIRST(X) is ԑ. Else add FIRST(Y) to FIRST(X).

Example:

E→TEl

El→ +TEl|ԑ

T→FTl

Tl→*FTl

F→(E)

F→ id

Figure out all the terminals and the non-terminals in the given production rules.

{E, El, T, Tl, F,+,*,(,),id}

FIRST(E)→FIRST(TEl)→FIRST(T)→FIRST(FTl)→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (E) is {(,id}.

Since production 1 only has E in its left hand side. We have to consider that only.

FIRST (El) →FIRST (+TEl)→FIRST(+)→ +.

FIRST (El)→FIRST (ԑ)→ԑ

So FIRST (El)→{+,ԑ}

FIRST(T)→FIRST(FTl)→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (T) is {(,id}.

FIRST (Tl)→FIRST(*FTl)→FIRST(*)à*.

FIRST (Tl)→FIRST(ԑ) àԑ

So FIRST (Tl)→{*,ԑ}.

FIRST (F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST(F) is {(,id}.

FIRST(+)→+

FIRST(*)→*

FIRST(()→(

FIRST())→)

FIRST(id)→id

As first of terminal is terminal.

FOLLOW:

The rules for finding the FOLLOW(X) are:

• FOLLOW of root is $. • If there is a production like AàαBβ, then FOLLOW(B)à FIRST(B), follow must only terminals so remove ԑ. • If the production is like AàαB then FOLLOW(B)àFOLLOW(A). • If the result contains ԑ, we have to remove that and instead of that add FOLLOW(A) in FOLLOW(B). Example: E→TEl El→ +TEl|ԑ T→FTl Tl→*FTl F→(E) F→id According to rule1, FOLLOW(root) is$.

So FOLLOW(E)→$. List out all the non-terminals. {E, El , T, Tl ,F} Check the productions for every non terminal it must be present in the right side of the productions. FOLLOW(E), E is present in 5th production in the right side of the production. So consider that, Fà(E), since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β). FOLLOW(E)→FIRST())→) As follow of the root is$, we have to make a union to both of these,

FOLLOW(E)→{),$}. FOLLOW(El), El is present in 1st and 2nd productions, initially consider 1st. E→TEl As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A). FOLLOW(E1)→FOLLOW(E). As we have previously found the value of E, FOLLOW (E1) is {),$}.

Now consider 2nd production.

El→  +TEl

As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A).

FOLLOW(E1)→FOLLOW(E1)

FOLLOW (E1) is {), $}. Now FOLLOW(T), T is present in 1st and 2nd productions, initially consider 1st E→TEl Since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β). FOLLOW(T)→FIRST(E1)→{+,ԑ} we have to remove ԑ from the result, so add FOLLOW of production root. {+,ԑ}-{ԑ}∪FOLLOW(E) {+}∪{$,)}

{+,$,)} And second production, El→+TEl Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β). FOLLOW(T)→FIRST(+)→+; So as doing union for both 1st and 2nd results, FOLLOW(T) is {+,$,)}

Tl is present in 3rd and 4th productions.

Initially consider 3rd production.

T→FTl

As it is in the form of A→αB, FOLLOW (B)→FOLLOW(A).

FOLLOW (T) is {+,$,)} Consider 4th production. Tl→*FTl As it is in the form of AàαB, FOLLOW (B)àFOLLOW(A). FOLLOW (Tl)→ FOLLOW (Tl), FOLLOW (Tl) is {+,$,)}.

Now FOLLOW(F)

F is present in 3rd and 4th productions.

Initially consider 3rd production.

TàFTl

Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β).

FOLLOW(F)→FIRST(Tl)→{*}

Consider 4th production.

Tl→*FTl

since it is in the form like AàαBβ, FOLLOW (B) is FIRST(β).

Predictive parsing under top-down parsing:

It is one of the forms of recursive descent parsing. It constructs the parse tree from the top and reads the input from left to right. It is one of the methods in the top down parsing which never needs the process of backtracking.

Example:

E→TEl

El→ +TEl|ԑ

T→FTl

T→*FTl

F→(E)

F→ id

Initially check whether the given productions contains ambiguity or not if found remove the thing by the process described above.

The above grammar doesn’t have any ambiguity.

So proceed to further steps.

We have to construct the parsing table.

There are some rules to be followed while constructing the parse tree.

Rule1:

For each terminal, a in the FIRST (α) add Aàα to M[A,α]

Rule2:

If ԑ is in first of α and $in the FOLLOW (A) add Aàα to M [A,$].

Initially make a table and list out all the terminals and the non-terminals as below. Consider the first production,

E→TEl

It is in the form of A→α

FIRST(α); FIRST(TEl)→FIRST(T)→{id,)} (as you found previously).

In place of M[E,id],M[E,(] place E→TEl

El→+TEl

FIRST (+TEl)→FIRST(+)à{+}.

El→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (El)→ {$,)}. So M [El,$],M [El,)] place the production.

T→FTl

FIRST (FTl)→FIRST(F)→{id, (}.

In place of M[T,id],M[T,(] place  T→FTl

Tl→*FTl

FIRST (*FTl)→{*}

In place of M [Tl,*] place  Tl→*FTl

Tl→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (Tl)→ {+,$,)}. So M[Tl,$],M[Tl,)], M[Tl,+] place the production.

F→(E)

FIRST((E))→ {(}.

In place of M[F,(] place F→(E).

F→id

FIRST (id)→ id

In place of M[F,id] place F→id.

It the table consists of 2 productions in the same cell. Then it is not an LL (1) grammar. Else it is called as LL (1) grammar.

We have to consider stack implementation for the string acceptance.

Say the string will be id+id*id

We have to dollor and root non terminal to the stack and dollor to the input

We have to see the last symbol in the stack column to the first symbol in the input column.

Example In the above picture, we have to see T to id from the table. And write in the output column.

And push Tl first and then F as it is a stack implementation. 