Sets & Languages
In order for a computer to organise a textual input, the computer has to be able to recognise certain strings, such as, in parsing a programming language, it has to be able to find 'if' or 'while'.
The first part of Computation deals with how we describe to a computer what string to look for, how to come up with the right description to solve the problem, and how to do more advanced things such as check whether a piece of code is correctly formed, making sure that opening and closing brackets match, or checking that a webpage is written in valid HTML.
Symbols, Words and Strings - Definitions
Input to a computer is given in symbols, such as letters and numbers. A collection of symbols is an alphabet. An alphabet is usually denoted by the symbol ∑.
Symbols can be combined into strings, which can be called 'words'. We can have words consisting of any number of symbols, including zero. A word with zero symbols is called the empty string, and is referred to by ϵ.
A space can be a symbol, and we can use the symbol ı_ı to show it. (Though in these blog posts, I'll just use _ since there's no single character for ı_ı).
The length of a string can be shown by a modulus symbol. For example, |s| is the length of the string s. |ϵ| = 0, obviously, because it is the empty string.
Languages
A language is a set of words that can be created with an alphabet. Some examples of languages are the empty language, {}, the language consisting of the empty word {ϵ}, or languages consisting of some number of strings, {ϵ,1,aba,abba,abbba,abbbba}. A language is referred to with the symbol L.
We can use set notation to define a language. For example, say we wanted the language of all words that could be made with the alphabet of the symbols 'a', 'b', and 'c'. We can define this language as {s | s is a word over the alphabet {a,b,c}}. The | should be read as 'where', so this is 'All languages consisting of the word 's', where s is a word over the alphabet {a,b,c}'
Concatenation
Two strings can be concatenated, that is, put together. So the strings Hel and lo could be concatenated into the single string Hello. To show this, we use a concatenation symbol as such: Hel · lo = Hello.
For multiple concatentation, we can use a power symbol. For example, h³ = hhh. h° = ϵ (ie. the string consisting of 0 numbers of h). We can also use n to denote an arbitrary number of concatenations. For example, an is the language {ϵ, a, aa, aaa, aaaa, aaaaa, ...}.
Using set notation, we can write it as L = {an | n ∈ N}. This can be read as 'The language L consists of all strings an where n is a whole number'. We could also exclude the empty string by saying 'where n is a positive whole number' with the symbol N+.
We can also concatenate two languages as follows: L1 · L2 = {st | s ∈ L1 and t ∈ L2}. ie. 'the set of all concatenated strings where s is a string from L1 and t is a string from L2'.
No comments:
Post a Comment