Basics of TOC
Symbol: These are the basic building blocks of a language. A symbol can be any letter, symbol, etc. Example: a, b, c, A, C, 0, 1, #, %.
Alphabet: A finite and non empty set of symbols is called an alphabet. We denote an alphabet by the greek letter sigma (Σ). Example: {0}, {a,b}, {0,1}, {0,1,2}, {a,bc,ef}, {ab,bc}. A point to be noted, {0, 01} is not a valid alphabet as 0 is part of 01.
String: A finite sequence of symbols from an alphabet is called String. For example : Σ={0,1}, Then 01001 is a string over the alphabet Σ. We usually use u, v, w.... to represent strings.
String concatenation: Concatenating string w to string v is appending symbols of v to symbols of w in a continuation. Example : w= abc, v= efg, then wv=abcefg.
String Reversal: If we have a string represented by w, then we usually represent its reverse by wR. It is simply writing the symbols of a string in the reverse order. Example : w=abcd, then wR=dcba.
Length of string: Length of string is denoted by |w|, where w representation here for any string. It tells us number of symbols the string contains. To be noted, ε is an empty string and |ε|=0. Similarly, |abc|=3.
Substring: A string of consective symbols of any string is called a substring of that string.
Example : w=abc, then possible substring/substrings of w of
Length 0 is: ε
Length 1 are: a, b, c
Length 2 are: ab, bc
Length 3 is: abc
Subsequence: A string of consective or non consecutive symbols of any string following the order of the original string is called a subsequence of that string.
Example : w=abc, then possible subsequence/subsequences of w of
Length 0 is: ε
Length 1 are: a, b, c
Length 2 are: ab, bc, ac
Length 3 is: abc
Prefix and Suffix: Let us have w= abbab.
All the possible Prefixes(w)={ε, a, ab, abb, abba, abbab}
All the possible Suffixes(w)={ε, b, ab, bab, bbab, abbab}
Finding a pattern, you can see that Prefixes and suffixes, all are strings.
Points to be noted about empty string:
1. Some books use λ(lamda), whereas some use ε(episilon) to represent empty string.
2. {} or {ε} is not a valid alphabet.
3. εw=wε=w (Concatenate any string with ε, and you will result in the string itself).
4. In simple words, ε means 'nothing'.
5. ε is prefix and suffix of every string.
6. ε is not a valid symbol.
An observation for alphabet with no overlapping symbols: Let us say we have an alphabet where no symbolsare overlapping, say {a,b}.
1. No. of String of length 2 is 22 =4( aa, ab, ba, bb)
2. No. of the string of length 3 is 23=8( aaa, aab, aba, abb, baa, bab, bba, bbb)
So the formula used here is |w|n
|w|= no of the alphabet, n= length of string you need.
Language: It is a set of strings. We have two types of languages : Finite(having finite number of strings) and Infinite(having infinite number of strings).
let us say Σ={a,b}. Let us define any two languages, say L1 and L2 as below:
L1=set of a strings of length 2 ={aa,ab,ba,bb}
L2=set of strings of which starts with a ={a,ab,abba, aaba,......}
Here L1 is a finite language and L2 is an infinite language.
Lexicographic Ordering: Writing of strings in a dictionary order such that the shorter strings precede the longer strings. Example: Σ={0,1}. The lexicographic ordering of language containing all the possible strings generated by the alphabet Σ={ε, 0, 1, 00, 01, 10, 11, 000, 001, ......}
** References :
An Introduction to Formal Languages and Automata by Peter Linz: section 1.2, Three basic concepts
Introduction To The Theory Of Computation by Michael Sipser: page 13, Strings and Languages
**