Top Down Parsing

TOP-DOWN AND BOTTOM-UP PARSING TECHNIQUES

What is parsing?

The word parsing comes in the phase of syntax analyzer. This phase uses lot of parsing techniques depending upon the priority of obtaining the start symbols and the input symbols. A parser is a compiler or interpreter that breaks the data into smaller chunks depending upon the obtained tokens from the scanner. It builds a data structure in the form of parse tree or abstract syntax tree. Parser will follow the production rules of context free grammar. CFG rules divide the parsing methods into two types.

  • Top-down parsing.
  • Bottom-up parsing.

Top-down parsing:

In this methodology the parser starts the parsing from the start symbol in order to produce the input or to transform the start symbol into input is called as top down parsing.

Fields where parsers can be used:

  • Web technologies like HTML and XML.
  • Query languages like SQL.
  • Scripting languages like JAVASCRIPT.
  • Network protocols such as HTTP.

Key-words involved in pars

  •  Left most derivation.
  • Right most derivation.
  • Ambiguity
  • Left recursion
  • Left factoring.
  • FIRST sets.
  • FOLLOW sets.

Derivation:
Derivation is actually a process of designing the parse tree depending upon the methodology. There are two different types of derivation methodologies for designing of the parse trees.

  • Right most derivatives.
  • Left most derivative.

Right most derivatives:

The production rule is expanded always from the right end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

 Lets check for the string ababa using right most derivation.

Consider 1st production as it is more likely to produce the given string.

AàABA

   AB( a) (replace last letter( Aàa) because it is right most derivative).

   Aba(replace B with Bàb)

   ABAba (replace A with AàABA)

   Ababa( replace Aàa)

   Ababa(replace Bàb)

   ababa(replace Aàa)

String is accepted.

Left-most derivation:

The production rule is expanded always from the left end and verifies whether the string is accepted or not.

Example:

A→ABA

A→a

B→b

 Let’s check for the string ababa using leftmost derivation.

Consider 1st production as it is more likely to produce the given string.

AàABA

   aBA(replace Aàa)

   abA(replace Bàb)

   abABA(replace AàABA)

   abaBA(replace Aàa)

   ababA(replace Bàb)

   ababa(replace Aàa)

String is accepted.

So, if a grammar accepts both derivative forms, then we can say the grammar as ambiguous grammar. For a parser the grammar must be an unambiguous grammar. To make an ambiguous grammar into an unambiguous grammar. There are methods to be followed based upon the production rules.

Left recursion:

AàAα|β

If the given production is in the above-mentioned format, then we can rewrite as,

A→βAl

Al →α Al

 

We have to generalize the rules into the standard format.

E→E+T|T

Check A as E, +T as α, T as β.

E→TEl

E→+TEl|ԑ    

 

T→T*F|F

Check T as A, *F as α, F as β.

T→FTl

Tl→*FTl

So the final productions of the given grammar are:

E→TEl

El→ +TEl|ԑ    

T→FTl

Tl→*FTl

F→(E)|id

Left factoring:

It is a method to remove the ambiguity from the CFG.

If the given production is in the below format,

A→αβ1| αβ2| αβ3…….|γ

We have to change it into

A→αAl

Al→ β1| β2| β3……. βn.

Example:

S→iEtSԑ|iEtSes|a

A= S

α= iEtS

β1= ԑ

β2= es

γ= a

So modified productions will be,

S→iEtSSl|a.

Sl→ ԑ|es

FIRST:

The rules for computing the FIRST set are:

  • If X is terminal, then FIRST(X) is X.
  • If X is a non-terminal and Xà ԑ, then FIRST(X) is ԑ.
  • If X is a non-terminal and Xà Y1 Y2 Y3 …..Yk, and if ԑ is there in all of the FIRST(Y) then FIRST(X) is ԑ. Else add FIRST(Y) to FIRST(X).

Example:

E→TEl

El→ +TEl|ԑ    

T→FTl

Tl→*FTl

F→(E)

F→ id

Figure out all the terminals and the non-terminals in the given production rules.

{E, El, T, Tl, F,+,*,(,),id}

FIRST(E)→FIRST(TEl)→FIRST(T)→FIRST(FTl)→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (E) is {(,id}.

Since production 1 only has E in its left hand side. We have to consider that only.

FIRST (El) →FIRST (+TEl)→FIRST(+)→ +.

FIRST (El)→FIRST (ԑ)→ԑ

So FIRST (El)→{+,ԑ}

FIRST(T)→FIRST(FTl)→FIRST(F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST (T) is {(,id}.

FIRST (Tl)→FIRST(*FTl)→FIRST(*)à*.

FIRST (Tl)→FIRST(ԑ) àԑ

So FIRST (Tl)→{*,ԑ}.

FIRST (F)→FIRST((E))→{(}or FIRST(id)→{id}.

So FIRST(F) is {(,id}.

FIRST(+)→+

FIRST(*)→*

FIRST(()→(

FIRST())→)

FIRST(id)→id

As first of terminal is terminal.

FOLLOW:

The rules for finding the FOLLOW(X) are:

  • FOLLOW of root is $.
  • If there is a production like AàαBβ, then FOLLOW(B)à FIRST(B), follow must only terminals so remove ԑ.
  • If the production is like AàαB then FOLLOW(B)àFOLLOW(A).
  • If the result contains ԑ, we have to remove that and instead of that add FOLLOW(A) in FOLLOW(B).

Example:

E→TEl

El→ +TEl|ԑ    

T→FTl

Tl→*FTl

F→(E)

F→id

According to rule1, FOLLOW(root) is $.

So FOLLOW(E)→$.

List out all the non-terminals.

{E, El , T, Tl ,F}

Check the productions for every non terminal it must be present in the right side of the productions.

FOLLOW(E), E is present in 5th production in the right side of the production.

So consider that,

Fà(E), since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β).

FOLLOW(E)→FIRST())→)

As follow of the root is $, we have to make a union to both of these,

FOLLOW(E)→{),$}.

FOLLOW(El), El is present in 1st and 2nd productions, initially consider 1st.

E→TEl

As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A).

FOLLOW(E1)→FOLLOW(E).

As we have previously found the value of E,

FOLLOW (E1) is {),$}.

Now consider 2nd production.

El→  +TEl

As it is in the form of AàαB, FOLLOW(B)àFOLLOW(A).

FOLLOW(E1)→FOLLOW(E1)

FOLLOW (E1) is {), $}.

Now FOLLOW(T), T is present in 1st and 2nd productions, initially consider 1st

E→TEl

Since it is in the form like A→αBβ, FOLLOW (B) is FIRST(β).

FOLLOW(T)→FIRST(E1)→{+,ԑ} we have to remove ԑ from the result, so add FOLLOW of production root.

{+,ԑ}-{ԑ}∪FOLLOW(E)

{+}∪{$,)}

{+,$,)}

And second production, El→+TEl

Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β).

FOLLOW(T)→FIRST(+)→+;

So as doing union for both 1st and 2nd results,

FOLLOW(T) is {+,$,)}

Now FOLLOW (Tl),

Tl is present in 3rd and 4th productions.

Initially consider 3rd production.

T→FTl

As it is in the form of A→αB, FOLLOW (B)→FOLLOW(A).

FOLLOW (Tl)→ FOLLOW (T),

FOLLOW (T) is {+,$,)}

Consider 4th  production.

Tl→*FTl

As it is in the form of AàαB, FOLLOW (B)àFOLLOW(A).

FOLLOW (Tl)→ FOLLOW (Tl),

FOLLOW (Tl) is {+,$,)}.

Now FOLLOW(F)

F is present in 3rd and 4th productions.

Initially consider 3rd production.

TàFTl

Since it is in the form like AàαBβ, FOLLOW (B) is FIRST (β).

FOLLOW(F)→FIRST(Tl)→{*}

Consider 4th production.

Tl→*FTl

since it is in the form like AàαBβ, FOLLOW (B) is FIRST(β).

FOLLOW (F)→FIRST(*)→{*}

FOLLOW (F) will be {*}.

 

Predictive parsing under top-down parsing:

It is one of the forms of recursive descent parsing. It constructs the parse tree from the top and reads the input from left to right. It is one of the methods in the top down parsing which never needs the process of backtracking.

Example:

E→TEl

El→ +TEl|ԑ    

T→FTl

T→*FTl

F→(E)

F→ id

Initially check whether the given productions contains ambiguity or not if found remove the thing by the process described above.

The above grammar doesn’t have any ambiguity.

So proceed to further steps.

We have to construct the parsing table.

There are some rules to be followed while constructing the parse tree.

Rule1:

For each terminal, a in the FIRST (α) add Aàα to M[A,α]

Rule2:

If ԑ is in first of α and $ in the FOLLOW (A) add Aàα to M [A,$].

Initially make a table and list out all the terminals and the non-terminals as below.

 

Consider the first production,

E→TEl

It is in the form of A→α

FIRST(α); FIRST(TEl)→FIRST(T)→{id,)} (as you found previously).

In place of M[E,id],M[E,(] place E→TEl

El→+TEl

FIRST (+TEl)→FIRST(+)à{+}.

El→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (El)→ {$,)}.

So M [El,$],M [El,)] place the production.

T→FTl

FIRST (FTl)→FIRST(F)→{id, (}.

In place of M[T,id],M[T,(] place  T→FTl

Tl→*FTl

FIRST (*FTl)→{*}

In place of M [Tl,*] place  Tl→*FTl

Tl→ԑ

FIRST (ԑ); if the result consists of ԑ go to rule 2.

FOLLOW (Tl)→ {+,$,)}.

So M[Tl,$],M[Tl,)], M[Tl,+] place the production.

F→(E)

FIRST((E))→ {(}.

In place of M[F,(] place F→(E).

F→id

FIRST (id)→ id

In place of M[F,id] place F→id.

It the table consists of 2 productions in the same cell. Then it is not an LL (1) grammar. Else it is called as LL (1) grammar.

We have to consider stack implementation for the string acceptance.

Say the string will be id+id*id

We have to dollor and root non terminal to the stack and dollor to the input

We have to see the last symbol in the stack column to the first symbol in the input column.

Example

 

 

 

 

In the above picture, we have to see T to id from the table. And write in the output column.

And push Tl first and then F as it is a stack implementation.

 

Contributor's Info

Created: Edited:
0Comment