**TOP-DOWN AND BOTTOM-UP PARSING TECHNIQUES**

**What is parsing?**

The word parsing comes in the phase of syntax analyzer. This phase uses lot of parsing techniques depending upon the priority of obtaining the start symbols and the input symbols. A parser is a compiler or interpreter that breaks the data into smaller chunks depending upon the obtained tokens from the scanner. It builds a data structure in the form of parse tree or abstract syntax tree. Parser will follow the production rules of context free grammar. CFG rules divide the parsing methods into two types.

- Top-down parsing.
- Bottom-up parsing.

**Top-down parsing:**

In this methodology the parser starts the parsing from the start symbol in order to produce the input or to transform the start symbol into input is called as top down parsing.

**Fields where parsers can be used:**

- Web technologies like HTML and XML.
- Query languages like SQL.
- Scripting languages like JAVASCRIPT.
- Network protocols such as HTTP.

**Key-words involved in pars**

- Left most derivation.
- Right most derivation.
- Ambiguity
- Left recursion
- Left factoring.
- FIRST sets.
- FOLLOW sets.

**Derivation:**

Derivation is actually a process of designing the parse tree depending upon the methodology. There are two different types of derivation methodologies for designing of the parse trees.

- Right most derivatives.
- Left most derivative.

**Right most derivatives:**

The production rule is expanded always from the right end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

Lets check for the string ababa using right most derivation.

Consider 1^{st} production as it is more likely to produce the given string.

AàABA

AB( a) (replace last letter( Aàa) because it is right most derivative).

Aba(replace B with Bàb)

ABAba (replace A with AàABA)

Ababa( replace Aàa)

Ababa(replace Bàb)

ababa(replace Aàa)

String is accepted.

**Left-most derivation:**

The production rule is expanded always from the left end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

Let’s check for the string ababa using leftmost derivation.

Consider 1^{st} production as it is more likely to produce the given string.

AàABA

aBA(replace Aàa)

abA(replace Bàb)

abABA(replace AàABA)

abaBA(replace Aàa)

ababA(replace Bàb)

ababa(replace Aàa)

String is accepted.

So, if a grammar accepts both derivative forms, then we can say the grammar as ambiguous grammar. For a parser the grammar must be an unambiguous grammar. To make an ambiguous grammar into an unambiguous grammar. There are methods to be followed based upon the production rules.

**Left recursion:**

AàAα|β

If the given production is in the above-mentioned format, then we can rewrite as,

A→βA^{l}

A^{l }→α A^{l }|ԑ

We have to generalize the rules into the standard format.

E→E+T|T

Check A as E, +T as α, T as β.

E→TE^{l }

E→+TE^{l}|ԑ

T→T*F|F

Check T as A, *F as α, F as β.

T→FT^{l}

T^{l}→*FT^{l}|ԑ

So the final productions of the given grammar are:

E→TE^{l }

E^{l}→ +TE^{l}|ԑ

T→FT^{l}

T^{l}→*FT^{l}|ԑ

F→(E)|id

**Left factoring:**

It is a method to remove the ambiguity from the CFG.

If the given production is in the below format,

A→αβ_{1}| αβ_{2}| αβ_{3}…….|γ

We have to change it into

A→αA^{l}|γ

A^{l}→ β_{1}| β_{2}| β_{3…….} β_{n.}

Example:

S→iEtSԑ|iEtSes|a

A= S

α= iEtS

β_{1}= ԑ

β_{2}= es

γ= a

So modified productions will be,

S→iEtSS^{l}|a.

S^{l}→ ԑ|es

**FIRST:**

The rules for computing the FIRST set are:

- If X is terminal, then FIRST(X) is X.
- If X is a non-terminal and Xà ԑ, then FIRST(X) is ԑ.
- If X is a non-terminal and Xà Y
_{1} Y_{2} Y_{3} …..Y_{k, }and if ԑ is there in all of the FIRST(Y) then FIRST(X) is ԑ. Else add FIRST(Y) to FIRST(X).

Example:

E→TE^{l }

E^{l}→ +TE^{l}|ԑ

T→FT^{l}

T^{l}→*FT^{l}|ԑ

F→(E)

F→ id

Figure out all the terminals and the non-terminals in the given production rules.

{E, E^{l}, T, T^{l}, F,+,*,(,),id}

FIRST(E)→FIRST(TE^{l})→FIRST(T)→FIRST(FT^{l})→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (E) is {(,id}.

Since production 1 only has E in its left hand side. We have to consider that only.

FIRST (E^{l}) →FIRST (+TE^{l})→FIRST(+)→ +.

FIRST (E^{l})→FIRST (ԑ)→ԑ

So FIRST (E^{l})→{+,ԑ}

FIRST(T)→FIRST(FT^{l})→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (T) is {(,id}.

FIRST (T^{l})→FIRST(*FT^{l})→FIRST(*)à*.

FIRST (T^{l})→FIRST(ԑ) àԑ

So FIRST (T^{l})→{*,ԑ}.

FIRST (F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST(F) is {(,id}.

FIRST(+)→+

FIRST(*)→*

FIRST(()→(

FIRST())→)

FIRST(id)→id

As first of terminal is terminal.

**FOLLOW:**

The rules for finding the FOLLOW(X) are:

- FOLLOW of root is $.
- If there is a production like AàαBβ, then FOLLOW(B)à FIRST(B), follow must only terminals so remove ԑ.
- If the production is like AàαB then FOLLOW(B)àFOLLOW(A).
- If the result contains ԑ, we have to remove that and instead of that add FOLLOW(A) in FOLLOW(B).

Example:

E→TE^{l }

E^{l}→ +TE^{l}|ԑ

T→FT^{l}

T^{l}→*FT^{l}|ԑ

F→(E)

F→id

According to rule1, FOLLOW(root) is $.

So FOLLOW(E)→$.

List out all the non-terminals.

{E, E^{l} , T, T^{l} ,F}

Check the productions for every non terminal it must be present in the right side of the productions.

FOLLOW(E), E is present in 5^{th} production in the right side of the production.

So consider that,

Fà(E), since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β).

FOLLOW(E)→FIRST())→)

As follow of the root is $, we have to make a union to both of these,

FOLLOW(E)→{),$}.

FOLLOW(E^{l}), E^{l} is present in 1^{st} and 2^{nd} productions, initially consider 1^{st}.

E→TE^{l }

As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A).

FOLLOW(E^{1})→FOLLOW(E).

As we have previously found the value of E,

FOLLOW (E^{1}) is {),$}.

Now consider 2^{nd} production.

E^{l}→ +TE^{l}

As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A).

FOLLOW(E^{1})→FOLLOW(E^{1})

FOLLOW (E^{1}) is {), $}.

Now FOLLOW(T), T is present in 1^{st} and 2^{nd} productions, initially consider 1^{st}

E→TE^{l }

Since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β).

FOLLOW(T)→FIRST(E^{1})→{+,ԑ} we have to remove ԑ from the result, so add FOLLOW of production root.

{+,ԑ}-{ԑ}∪FOLLOW(E)

{+}∪{$,)}

{+,$,)}

And second production, E^{l}→+TE^{l}

Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β).

FOLLOW(T)→FIRST(+)→+;

So as doing union for both 1^{st} and 2^{nd} results,

FOLLOW(T) is {+,$,)}

Now FOLLOW (T^{l}),

T^{l} is present in 3^{rd} and 4^{th} productions.

Initially consider 3^{rd} production.

T→FT^{l}

As it is in the form of A→αB, FOLLOW (B)→FOLLOW(A).

FOLLOW (T^{l})→ FOLLOW (T),

FOLLOW (T) is {+,$,)}

Consider 4^{th} production.

T^{l}→*FT^{l}

As it is in the form of AàαB, FOLLOW (B)àFOLLOW(A).

FOLLOW (T^{l})→ FOLLOW (T^{l}),

FOLLOW (T^{l}) is {+,$,)}.

Now FOLLOW(F)

F is present in 3^{rd} and 4^{th} productions.

Initially consider 3^{rd} production.

TàFT^{l}

Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β).

FOLLOW(F)→FIRST(T^{l})→{*}

Consider 4^{th} production.

T^{l}→*FT^{l}

since it is in the form like AàαBβ, FOLLOW (B) is FIRST(β).

FOLLOW (F)→FIRST(*)→{*}

FOLLOW (F) will be {*}.

**Predictive parsing under top-down parsing:**

It is one of the forms of recursive descent parsing. It constructs the parse tree from the top and reads the input from left to right. It is one of the methods in the top down parsing which never needs the process of backtracking.

Example:

E→TE^{l }

E^{l}→ +TE^{l}|ԑ

T→FT^{l}

T→*FT^{l}|ԑ

F→(E)

F→ id

Initially check whether the given productions contains ambiguity or not if found remove the thing by the process described above.

The above grammar doesn’t have any ambiguity.

So proceed to further steps.

We have to construct the parsing table.

There are some rules to be followed while constructing the parse tree.

**Rule1:**

For each terminal, a in the FIRST (α) add Aàα to M[A,α]

**Rule2:**

If ԑ is in first of α and $ in the FOLLOW (A) add Aàα to M [A,$].

Initially make a table and list out all the terminals and the non-terminals as below.

Consider the first production,

E→TE^{l }

It is in the form of A→α

FIRST(α); FIRST(TE^{l})→FIRST(T)→{id,)} (as you found previously).

In place of M[E,id],M[E,(] place E→TE^{l }

E^{l}→+TE^{l}

FIRST (+TE^{l})→FIRST(+)à{+}.

E^{l}→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (E^{l})→ {$,)}.

So M [E^{l},$],M [E^{l},)] place the production.

T→FT^{l}

FIRST (FT^{l})→FIRST(F)→{id, (}.

In place of M[T,id],M[T,(] place T→FT^{l}

T^{l}→*FT^{l}

FIRST (*FT^{l})→{*}

In place of M [T^{l},*] place T^{l}→*FT^{l}

T^{l}→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (T^{l})→ {+,$,)}.

So M[T^{l},$],M[T^{l},)], M[T^{l},+] place the production.

F→(E)

FIRST((E))→ {(}.

In place of M[F,(] place F→(E).

F→id

FIRST (id)→ id

In place of M[F,id] place F→id.

It the table consists of 2 productions in the same cell. Then it is not an LL (1) grammar. Else it is called as LL (1) grammar.

We have to consider stack implementation for the string acceptance.

Say the string will be id+id*id

We have to dollor and root non terminal to the stack and dollor to the input

We have to see the last symbol in the stack column to the first symbol in the input column.

Example

In the above picture, we have to see T to id from the table. And write in the output column.

And push T^{l} first and then F as it is a stack implementation.