Wednesday, 9 September 2009

Computation - Sets & Languages

Sets & Languages


In order for a computer to organise a textual input, the computer has to be able to recognise certain strings, such as, in parsing a programming language, it has to be able to find 'if' or 'while'.

The first part of Computation deals with how we describe to a computer what string to look for, how to come up with the right description to solve the problem, and how to do more advanced things such as check whether a piece of code is correctly formed, making sure that opening and closing brackets match, or checking that a webpage is written in valid HTML.


Symbols, Words and Strings - Definitions


Input to a computer is given in symbols, such as letters and numbers. A collection of symbols is an alphabet.  An alphabet is usually denoted by the symbol ∑.

Symbols can be combined into strings, which can be called 'words'.  We can have words consisting of any number of symbols, including zero.  A word with zero symbols is called the empty string, and is referred to by ϵ.

A space can be a symbol, and we can use the symbol ı_ı to show it. (Though in these blog posts, I'll just use _ since there's no single character for ı_ı).

The length of a string can be shown by a modulus symbol.  For example, |s| is the length of the string s.  |ϵ| = 0, obviously, because it is the empty string.

Languages


A language is a set of words that can be created with an alphabet. Some examples of languages are the empty language, {}, the language consisting of the empty word {ϵ}, or languages consisting of some number of strings, {ϵ,1,aba,abba,abbba,abbbba}.  A language is referred to with the symbol L.

We can use set notation to define a language.  For example, say we wanted the language of all words that could be made with the alphabet of the symbols 'a', 'b', and 'c'.  We can define this language as {s | s is a word over the alphabet {a,b,c}}.  The | should be read as 'where', so this is 'All languages consisting of the word 's', where s is a word over the alphabet {a,b,c}'

Concatenation


Two strings can be concatenated, that is, put together.  So the strings Hel and lo could be concatenated into the single string Hello.  To show this, we use a concatenation symbol as such: Hel · lo = Hello.

For multiple concatentation, we can use a power symbol.  For example, h³ = hhh. h° = ϵ (ie. the string consisting of 0 numbers of h).  We can also use n to denote an arbitrary number of concatenations.  For example, an is the language {ϵ, a, aa, aaa, aaaa, aaaaa, ...}.

Using set notation, we can write it as L = {an | n ∈ N}.  This can be read as 'The language L consists of all strings an where n is a whole number'.  We could also exclude the empty string by saying 'where n is a positive whole number' with the symbol N+.

We can also concatenate two languages as follows: L1 · L2 = {st | s ∈ L1 and t ∈ L2}.  ie. 'the set of all concatenated strings where s is a string from L1 and t is a string from L2'.

No comments:

Post a Comment