CLR(1) and LALR(1) Parsers Notes

Introduction:

  • They are bottom-up parsers, and they use the reverse of the rightmost derivation.
  • They use LR(1) canonical item.

LR(1) canonical items=  LR(0) itenms + look ahead.

Purpose of look ahead: Reduce moves are going to place only in lookahead symbol.

Difference between CLR(1) and LALR(1):-

  • No. of states in CLR(1) is greater than or equal to LALR(1) parser.
  • If grammar is CLR(1), it may or may not be LALR(1), because conflict may arise when you will merge the states for constructing LALR(1) table.
  • If grammar is not CLR(1), it will not be LALR(1), because already conflict happened in CLR(1) so the conflict will always present in LALR(1).
  • If there is no SR conflict in CLR(1), there will be no SR conflict in LALR(1).
  • If there is no RR conflict in CLR(1), we can't say about RR conflict in LALR(1) because there is a chance that when we merge the states, RR conflict may arise.

Some Key Points:

1. The no of states in various parser varies like this:

 CLR(1) >= ( LALR(1)=  SLR(1)  =  LR(0) )

2. The power of Parsers varies like this:

 CLR(1) > LALR(1) > SLR(1) > LR(0)

3. YAcc is a compiler that is used for LALR(1).

4. While constructing the parsing table for various parsers GOTO entry and Shift entries are identical, the only difference arises in reducing entries and error entries.

 

 

Contributor's Info

Created:
0Comment
Handles in Bottom- Up Parsing

Handle:

A substring α of the tree’s upper frontier that matches some production A → α where reducing α to A is one step in the reverse of a rightmost derivation. We call such a string a handle.

Formally, 

a handle of a right-sentential form γ is a production A → β and a position in γ where β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ i.e.,

if S ⇒∗ rm αAw ⇒rm αβw  then A → β  in the position following α is a handle of αβw
Because γ is a right-sentential form, the substring to the right of a handle contains only terminal symbols.

This all is the formal definition.

 

Informally you can understand handle like this:

for a given production, if a string is given to reduce, you start replacing the string with the right-hand side of production, the item that is removed or replaced is called handle. Reduce the given string until you get the start symbol.

example:

1. S -> aABe

    A->  Abc /b

    B-> d

The string is abbcde find the handles.

Solution:-

abbcde      

aAbcde   // after replacing the second 'b'  with 'A' from left So b is one of handle

aAde        // after replacing 'Abc' with A So Abc is the second handle.

aABe        //  after replacing 'd' with 'B' So, 'd' is third handle.

S              // after replacing 'aABe' with 'S', So 'aABe' is fourth handle.

 

 

2. E -> E+n / E * n / n

find the handles for n+n*n

Solution:   

n+n*n 

E+n*n    // replace first n with E so n is first handle.

E*n        // replace E+n with E so E+n  is the second handle.

E            // replace E*n with E so E*n is third handle.

So handles are n, E+n , E*n 

Contributor's Info

Created:
0Comment
LR(0) and SLR(1) Points
  • It is LR(0) because we don't care what is the next symbol.
  • Blank spaces always contain an error.
  • The only difference between LR(0) and SLR(1) is in the placement of reducing moves.
  • The action and goto entries are the same for both.
  • SLR(1) is more powerful than LR(0).

Is every grammar is LR(0)??

1. S-R conflict:-

let us say you at some random state Iwe have something like this :

 

So at state I we have two moves A-->d. and there is a shift move with terminal c, and since it is LR(0), then reduce move has to put throughout the row, so it will end up with two entries in a particular row at the same time. This is called an S-R conflict.

2. R-R conflict:-

let us say at state I11, we have two reduce move, again this is a violation of the rule that says that at a particular instance, we can have at most one entry for one field of a row.

=>

  • If grammar is LR(0), it will definitely SLR(1).
  • If grammar is not LR(0), it may or may not be SLR(1).
  • You need not construct a full parsing table to detect conflicts, just look at the state that contains the final item.
  • Any left recursive grammar is not LL(1).

 

Contributor's Info

Created:
0Comment
LR Parsers

They can be used to parse a large class of CFG and sometimes they are referred  as LR(k) parsing.

  • L: left-to-right scanning of the input.
  • R: Construction of the rightmost derivation in reverse.
  • k: Number of input symbols of lookahead.

 LR parsing is attractive for a variety of reasons:

  • LR parsers can be constructed to recognize virtually all programming language constructs for which CFG can be written.
  • The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers.
  • The LR parsing method can be implemented efficiently.

  Drawback:-

  • Too much complicated to construct an LR parser by hand for a typical programming language grammar.
  • One needs a specialized tool - an LR parser generator (as Yacc)

An LR parser consists of:

  • An input
  •  An output
  •  A stack
  •  A driver program
  •  A parsing table that has two parts: action and goto

 

 

Contributor's Info

Created:
0Comment
Bottom Up Parsing Introduction

Bottom-up: Start from the leaves and work up to the root We have to decide when rule A → β should be applied and we should find neighbour nodes labelled by symbols in β Reducing rule A → β results in adding a node A to the tree. A’s children are the nodes labelled by the symbols in β.

 

  • Constructs a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).
  • Reduction of a string w to the start symbol of a grammar.
  • At each reduction step, a particular substring matching the right side of a production is replaced by the symbol on the left of that production. 
  • If the substring is chosen correctly at each step, a rightmost derivation is traced out in reverse

Handles:

A handle of a string is a substring that matches the right side of a production and whose reduction to the nonterminal on the left side of the production represents one step along with the reverse of a rightmost derivation.

Grammar:

S → aABe     A → Abc | b      B → d

The sentence abbcde can be reduced as follows:

  • abbcde
  • aAbcde
  • aAde
  • aABe
  • S

 

Contributor's Info

Created:
0Comment
Bottom up Parsing

Bottom-up parsing:

In this methodology, it follows the process which is exactly controversy to the top-down parsing. It starts with the input symbol first and determines the start symbol.

Bottom-up parsing:

  • SLR
  • CLR
  • LALR

SLR:

It is one type of parsing technique in bottom-up parsing. It follows two functions to find the parse tree.

  • Closure
  • Goto

Goto:

The goto functions act as the one-step movement function. We use (.) operator in SLR. Goto function pushes the dot after the specified symbol.

Example:

S1→.S is the production.

If we specify goto (S1,S) the output will be S1àS.

Closure:

For suppose if there are productions like

Closure (I0,S) gives the production of non-terminal after the dot operator.

 S1→.S

                                     S→.CC

                                     C→.ac

                                     C→.d

For example:

S→CC

C→aC

C→d

To solve a problem, SLR follows a methodology like finding the kernel items. Kernel items are nothing but adding the dot operator before every symbol in the starting of the right hand side and adding a extra production i.e. complementary of the root gives rise to root. Converting the productions into the kernel items is as follows.

In next step we have to find the closure of newly added production.

Closure (S1→.S)

We have to take all the dot operators to the end of the production in every production.

To do this we have to do closure and goto operator.

If a set or certain productions are repeated denote with same I value.

We have to terminate the process once all the productions have the dot operator at the ending.

Flow graph will be

 

Parse table: parse table consists of action and go parts. All the terminals will be in action phase and all the non-terminals will be in go phase. All the I values obtained in production rules while finding the closure productions.

States

a

d

$

S

C

0

S3

S4

 

1

2

1

 

 

 

accept

 

2

S3

S4

 

 

5

3

S3

S4

 

 

6

4

R3

R3

R3

 

 

5

 

 

R1

 

 

6

R2

R2

R2

 

 

I0 on S gives I1

S is a non-terminal so write only state number in (0 ,S) field.

I0 on C gives I2

C is a non-terminal so write only state number in (0 ,C) field.

I0 on a gives I3

a is a terminal so write only state number along with shift(s) in (0 ,a) field.

I0 on d gives I4

d is a terminal so write  state number along with shift in (0 ,d) field.

I2 on C gives I5

C is a non-terminal so write only state number in (2 ,C) field.

I2 on a gives I3

a is a terminal so write  state number along with the shift in (2 ,a) field.

I2 on d gives I4

d is a terminal so write  state number along with the shift in (2 ,d) field.

I3 on C gives I6

C is a non-terminal so write  only state number  in (3 ,C) field.

I3 on a gives I3

a is a terminal so write  state number along with the shift in (3 ,a) field.

I3 on d gives I4

d is a terminal so write  state number along with the shift in (3 ,d) field.

up to now there are productions with which we can fill shift and we have to fill remaining fields by some process given below.

We have to find the first and follow of the productions to fill the remaining fields.

S→CC

C→aC

C→d

FIRST(S)→FIRST(C)→{a,d}

FIRST(C)→{a,d}

FIRST (a)→{a}

FIRST (d)→{d}

FOLLOW(S)→$

FOLLOW(C)→FIRST(C)→{a,d}

FOLLOW(C)→FOLLOW(S)→ {a,d,$}

FOLLOW(C)→{a,d}

We have filled the table up to 3 so consider I4 as it is final production coz dot is in the last position.

C→d.

FOLLOW (root)àFOLLOW(C)→ {a,d,$}

So in place of (4,a), (4,d), (4,$) write R3( as it is the 3rd production in the problem statement).

I5→ S→CC

FOLLOW(S)→{$}

In place of (5,$)  mark R1 as it is 1st production.

I6→C→aC

FOLLOW(C)→{a,d,$}

In place of (6,a), (6,d), (6,$) mark R2 as it is in 2nd production.

CLR:

Canonical left to right scanning and rightmost derivative in reverse order

In clr the general form of an item is A→α.Bβ,a

A→αβ is a production

a is a terminal or write marker

The component is called as the look ahead item.

Example:

S→CC

C→aC

C→d

Construct an augmented grammar from the given problem solution.

S1→.S

S→.CC

C→.aC

C→.d

Look ahead symbol of initial production will be $.

S1→.S,$

 

The following picture describes about the flow graph.

The parse table can be filled in an easy manner unlike the SLR parsing. There is no need to find the FIRST and FOLLOW sets.

Item set no

a

d

$

S

C

0

S3

S4

 

1

2

1

 

 

Accept

 

 

2

S6

S7

 

 

5

3

S3

S4

 

 

8

4

R3

R3

 

 

 

5

8

 

R1

 

 

6

S6

S7

 

 

9

7

 

 

R3

 

 

8

R2

R2

 

 

 

9

 

 

R2

 

 

I0 on S gives I1

S is a non-terminal so write only state number in (0 ,S) field.

I0 on C gives I2

C is a non-terminal so write only state number in (0 ,C) field.

I0 on a gives I3

a is a terminal so write  state number along with shift in (0 ,a) field.

I0 on d gives I4

d is a terminal so write  state number along with shift in (0 ,d) field.

I2 on C gives I5

C is a non-terminal so write only state number in (2 ,C) field.

I2 on a gives I6

a is a terminal so write state number along with shift in (2 ,a) field.

I2 on d gives I7

d is a terminal so write  state number along with the shift in (2 ,d) field.

I0 on S gives I1

S is a non-terminal so write only state number in (0 ,S) field.

I3 on C gives I8

C is a non-terminal so write only state number in (3 ,C) field.

I3 on a gives I3

a is a terminal so write  state number along with shift in (3 ,a) field.

I3 on d gives I4

d is a terminal so write  state number along with shift in (3 ,d) field.

I6 on C gives I9

C is a non-terminal so write only state number in (6 ,C) field.

I6 on a gives I6

a is a terminal so write  state number along with shift in (6 ,a) field.

I6 on d gives I7

d is a terminal so write state number along with shift in (6 ,d) field.

 S→CC is present in I5; look ahead is $ so in place (5,$) mark R1. As it is the 1st production.

C→aC is present in I8,I9; look ahead symbols are (a/d) and $ respectively place (8,a);(8,d) mark R2. As it is 2nd production.

C→d is present in I4, I7 are (a/d) and $ respectively place (4,a) (4,d) , (7,$). R3. As it is 3rd production.

 LALR:

It is the same parsing technique as like CLR but with a change of combining the item sets which are having the first term common without concentrating the lookahead symbol.

Example:

S→CC

C→aC

C→d

The process is as like

We found I3 and I6 found same but with different look-ahead symbols, so we have to club those into I36.

We found I4 and I7 found same but with different look-ahead symbols, so we have to club those into I47

 

 

We found I8 and I9 found same but with different look-ahead symbols, so we have to club those into I89

 

 

Item set no

a

d

$

S

C

0

S3

S4

 

1

2

1

 

 

Accept

 

 

2

S6

S7

 

 

5

36

S36

S47

 

 

89

47

R3

R3

 

 

 

5

8

 

R1

 

 

89

R2

R2

R2

 

 

The process is same as CLR to fill the parse table.

Conclusion:

An SLR parser generator creates an LR(0) state machine and computes the look-ahead from the grammar (This is a simplified approach and may report conflicts that do not really exist in the LR(0) state machine. But Canonical LR parser generator computes an LR (1) state machine and the look-heads are already part of the LR (1) state machine. These parser tables can be very large. An LALR parser generator creates an LR (0) state machine and computes the look-ahead from the LR (0) state machine (via the terminal transitions). This is a correct approach, but occasionally reports conflicts that would not exist in an LR(1) state machine.

 

 

 

 

 

 

 

 

 

 

Contributor's Info

Created: Edited:
0Comment