##### Basics of TOC

**Symbol:** These are the basic building blocks of a language. A symbol can be any letter, symbol, etc. Example: a, b, c, A, C, 0, 1, #, %.

**Alphabet:** A finite and non empty set of symbols is called an alphabet. We denote an alphabet by the greek letter sigma (Σ). Example: {0}, {a,b}, {0,1}, {0,1,2}, {a,bc,ef}, {ab,bc}. A point to be noted, {0, 01} is not a valid alphabet as 0 is part of 01.

**String:** A finite sequence of symbols from an alphabet is called String. For example : Σ={0,1}, Then 01001 is a string over the alphabet Σ. We usually use u, v, w.... to represent strings.

**String concatenation:** Concatenating string w to string v is appending symbols of v to symbols of w in a continuation. Example : w= abc, v= efg, then wv=abcefg.

**String Reversal:** If we have a string represented by w, then we usually represent its reverse by w^{R}. It is simply writing the symbols of a string in the reverse order. Example : w=abcd, then w^{R}=dcba.

**Length of string: **Length of string** **is denoted by |w|, where w representation here for any string. It tells us number of symbols the string contains. To be noted, ε is an empty string and |ε|=0. Similarly, |abc|=3.

**Substring: **A string of consective symbols of any string is called a substring of that string.

Example : w=abc, then possible substring/substrings of w of

Length 0 is: ε

Length 1 are: a, b, c

Length 2 are: ab, bc

Length 3 is: abc

**Subsequence: **A string of consective or non consecutive symbols of any string following the order of the original string is called a subsequence of that string.

Example : w=abc, then possible subsequence/subsequences of w of

Length 0 is: ε

Length 1 are: a, b, c

Length 2 are: ab, bc, ac

Length 3 is: abc

**Prefix and Suffix: **Let us have w= abbab.

All the possible Prefixes(w)={ε, a, ab, abb, abba, abbab}

All the possible Suffixes(w)={ε, b, ab, bab, bbab, abbab}

Finding a pattern, you can see that Prefixes and suffixes, all are strings.

**Points to be noted about empty string:**

1. Some books use** λ**(lamda), whereas some use **ε**(episilon) to represent empty string.

2. {} or {ε} is not a valid alphabet.

3. εw=wε=w (Concatenate any string with ε, and you will result in the string itself).

4. In simple words, ε means 'nothing'.

5. ε is prefix and suffix of every string.

6. ε is not a valid symbol.

**An observation for alphabet with no overlapping symbols:** Let us say we have an alphabet where no symbolsare overlapping, say {a,b}.

1. No. of String of length 2 is 2^{2 }=4( aa, ab, ba, bb)

2. No. of the string of length 3 is 2^{3}=8( aaa, aab, aba, abb, baa, bab, bba, bbb)

So the formula used here is **|w|**^{n }

|w|= no of the alphabet, n= length of string you need.

**Language:** It is a set of strings. We have two types of languages : Finite(having finite number of strings) and Infinite(having infinite number of strings).

let us say Σ={a,b}. Let us define any two languages, say L1 and L2 as below:

**L1=set of a strings of length 2 ={aa,ab,ba,bb}**

**L2=set of strings of which starts with a ={a,ab,abba, aaba,......}**

Here __L1 is a finite language and L2 is an infinite language.__

**Lexicographic Ordering: **Writing of strings in a dictionary order such that the shorter strings precede the longer strings. Example: Σ={0,1}. The lexicographic ordering of language containing all the possible strings generated by the alphabet Σ={ε, 0, 1, 00, 01, 10, 11, 000, 001, ......}

** References :

An Introduction to Formal Languages and Automata by Peter Linz: section 1.2, Three basic concepts

Introduction To The Theory Of Computation by Michael Sipser: page 13, Strings and Languages

**