Basics of TOC

Symbol: These are the basic building blocks of a language. A symbol can be any  letter, symbol, etc. Example: a, b, c,  A, C, 0, 1, #, %.


Alphabet: A finite and non empty set of symbols is called an alphabet. We denote an alphabet by the greek letter sigma (Σ). Example: {0}, {a,b}, {0,1}, {0,1,2}, {a,bc,ef}, {ab,bc}. A point to be noted, {0, 01} is not a valid alphabet as 0  is part of 01.


String: A finite sequence of symbols from an alphabet is called String. For example : Σ={0,1}, Then 01001 is a string over the alphabet Σ. We usually use u, v, w.... to represent strings.


String concatenation: Concatenating string w to string v is appending symbols of v to symbols of w in a continuation. Example : w= abc, v= efg, then wv=abcefg.


String Reversal: If we have a string represented by w, then we usually represent its reverse by wR. It is simply writing the symbols of a string in the reverse order. Example : w=abcd, then wR=dcba.


Length of string: Length of string is denoted by |w|, where w representation here for any string. It tells us number of symbols the string contains. To be noted, ε is an empty string and  |ε|=0. Similarly, |abc|=3.


Substring: A string of consective symbols of any string is called a substring of that string.


Example : w=abc, then possible substring/substrings of w of


Length 0 is: ε

Length 1 are: a, b, c

Length 2 are: ab, bc

Length 3 is: abc


Subsequence: A string of consective or non consecutive symbols of any string following the order of the original string is called a subsequence of that string.


Example : w=abc, then possible subsequence/subsequences of w of


Length 0 is: ε

Length 1 are: a, b, c

Length 2 are: ab, bc, ac

Length 3 is: abc


Prefix and Suffix: Let us have w= abbab.

All the possible Prefixes(w)={ε, a, ab, abb, abba, abbab}

All the possible Suffixes(w)={ε, b, ab, bab, bbab, abbab}


Finding a pattern, you can see that Prefixes and suffixes, all are strings.


Points to be noted about empty string:


1. Some books use λ(lamda), whereas some use ε(episilon) to represent empty string.

2. {} or {ε} is not a valid alphabet.

3. εw=wε=w (Concatenate any string with ε, and you will result in the string itself).

4.  In simple words, ε means 'nothing'.

5. ε is prefix and suffix of every string.

6. ε is not a valid symbol.


An observation for alphabet with no overlapping symbols: Let us say we have an alphabet where no symbolsare overlapping, say {a,b}.

1. No. of String of length 2 is 22 ​=4( aa, ab, ba, bb)

2. No. of the string of length 3 is 23=8( aaa, aab, aba, abb, baa, bab, bba, bbb)

So the formula used here is |w|n 

                   |w|= no of the alphabet, n= length of string you need.


Language: It is a set of strings. We have two types of languages : Finite(having finite number of strings) and Infinite(having infinite number of strings).

let us say  Σ={a,b}. Let us define any two languages, say L1 and L2 as below:

L1=set of a strings of length 2 ={aa,ab,ba,bb}

L2=set of strings of which starts with a ={a,ab,abba, aaba,......}


Here L1 is a finite language and L2 is an infinite language.


Lexicographic Ordering: Writing of strings in a dictionary order such that the shorter strings precede the longer strings. Example: Σ={0,1}. The lexicographic ordering of language containing all the possible strings generated by the alphabet Σ={ε, 0, 1, 00, 01, 10, 11, 000, 001, ......}


** References : 

An Introduction to Formal Languages and Automata by Peter Linz: section 1.2, Three basic concepts

Introduction To The Theory Of Computation by Michael Sipser: page 13, Strings and Languages