Wednesday, 9 September 2009

Computation - Sets and Languages

OK, so first things first, we need to settle on some terminology here. This is because things such as symbols can also be cause characters, so I need to mention which terminology will be used here.


Symbols


Now then, take a look at your keyboard for a few seconds, that right. Now then, what do you see? Well, I’ll tell you, every time you type on your keyboard into say Microsoft Word, you are typing, what is known in the computer world, as symbols. These symbols are things such as ‘a’, ‘A’, ‘0’, ‘1’ through to ‘9’, as well as ‘%’, ‘&’, ‘?’ and ‘@’.

A Symbol is also considered the smallest unit possible, and therefore they cannot be split.



Alphabet


Now, take another look at your keyboard, and think, what else do you see? Look closer… OK, so maybe you may have guessed from the title of this part of the section. That’s right; it’s the alphabet as well as the Roman alphabet!

So, let’s say at a given time you wish to fix together a collection of symbols. We call this collection theunderlying alphabet. Such alphabets that occur frequently are:

{a, b,………, z, A, B,….,Z}



So, in this case, the English alphabet from a-z. This can be for both upper and lower case letters of the alphabet, as well as:

{0, 1,…….., 9, +, -, ·, /, (, )}




These symbols are mainly used for arithmetic expressions. Please note that, the elements of such alphabets are often called letters.

Now that we have this alphabet, how do we refer to it? We can’t always write out the whole alphabet every time, which would be stupid, imagine if the alphabet consisted of 1000 letters!

In such a case, we often refer to a fixed, but arbitrary alphabet with the Greek letter ∑ (or sigma). Also note, that when we need to refer to letters occurring in some alphabet ∑, we  typically use letters from the end of the Roman alphabet, such as x, y and z.

So, now we have all this discussed, we can safely mention this definition:

-----------------------------------------------------------------------------------------------

Definition 1: An alphabet ∑ is a finite set of characters called the letters of                                     ∑.

-----------------------------------------------------------------------------------------------

Strings and Words


So, now let’s say you wanted to write the word “hello”, how would you do it? Well, you can say that “hello” is formed by combining 5 symbols together, yes?

So we can say that symbols can be combined to form strings, or more commonly known as words over the given alphabet ∑.

Now then, we know that we can have words of say 1, 2, 3 or n letters. BUT, what about the word consisting of no letters at all? This word is usually referred to as the empty string or the empty word and is typically denoted with ε.

OK, so let’s have some examples. Here are some exercises to help you: (answers revealed later).

  1. Form all the three letter words over the alphabet {a,b,1}.

  2. Form all the words over the alphabet {}.


-----------------------------------------------------------------------------------------------

Definition 2: A word over the alphabet ∑ is a finite string

x1, x2 …. xn



-----------------------------------------------------------------------------------------------

Where the xi are letters from ∑.

This definition also counts for the empty word ε, this is made up of 0 letters, so we can say that n=0 for x1….xn.

The Space


But what about the words containing a space? While we can all recognize ‘a b’, we have to consider the words thatend with a space. In order to prevent problems like this, we use the symbol ‘_’ to denote a space. So ‘a_’ is the word consisting of the letter a followed by a space.

Length


The length of a word over some alphabet is given by the number of occurrences of symbols it consists of. So for this we use the length function |·| which takes as its inputs words, and then outputs a natural number which will be the length of that word. Here is an example:

  • |ε| = 0

  • |sx| = |s|+1 = 2


Words to form more words


Just as we saw above, we can combine symbols to form words. However, we can also combine words to form more words. This is often referred to as the concatenation of two words. Here is an example:

  • From ab and ba we can form abba


This concatenation does have a symbol to make things easier. This symbol is ‘·’. So from the above example, we can rewrite this as:

  • ab·ba = abba


It is also wise to note that concatenating a space onto the end of a word, does lengthen the word by one symbol.

However, if we append the empty word ε onto the end of a word, say ab, then we get back the word ab.

Languages


The word languages in programming languages, is often used to describe a collection of words over a given alphabet. Here are a few examples of languages of some alphabets:

  • {}

  • {ε}

  • {1, a, 1a, b, ab, abab, ababab}

  • {ε, 1, 11, 111, 1111, …..}


When we need a variable to range over some languages, we typically use ζ.

-----------------------------------------------------------------------------------------------

Definition 3: A language over an alphabet ∑ is any subset of the set of finite words over ∑.

-----------------------------------------------------------------------------------------------

By using set notation on languages, we can automatically obtain set-theoretic operations, such as union,intersections and complements of languages in precisely the same way as we do this for other sets. Hence, we obtain expressions such as:

  • ζ1 U ζ2

  • ζ1 \ ζ2


Describing Languages


In order to describe words of arbitrary length concisely, it is useful to use notations such as a3 for aaa. Lets explain this, by a3, we mean that the concatenation operation has to be applied to three copies of the symbol a to formaaa:

  • a3 = a·a·a = aaa


So, if we want to describe a word with an arbitrary number of a’s, we can write:

  • {an | n Є N+}  where n is any natural number that is greater than 0


Now, what about a1? This simply describes the concatenation of one copy of the symbol a, so the result is just a.

The real question is what about a0? This denotes the word consisting of 0 copies of the letter a. Well, quite obviously, this is the empty string, ε. So, a0 = ε.

From these rules, we can obtain useful rules such as:

  • am · an = am+n

  • (am)n = amn


Note that, (ab)n consists of n copies of ab concatenated with each other, rather than anbn.

So now for a proper example. Take the forth language that we described above:

  • {ε, 1, 11, 111,….}  =  {1n | n Є N}


Next: Regular Expressions.

No comments:

Post a Comment