Wednesday 9 September 2009

Computation - Patterns

Patterns


In the last blog post we looked at what the various components in the module, mostly languages, symbols, and alphabets.  Now we're going to look in more detail at how to define a language from symbols.

To define languages more easily we're going to look at patterns.  Patterns can also be called regular expressions.  Computers, after all, can't recognise set notation symbols such as {, }, or ∈.


The Kleene Star


So lets say we want the computer to recognise all instances of 01 repeated, so it would match 01, 0101, 010101, 01010101, 0101010101, etc.  We notice that we need to concatenate 01 an arbitrary number of times.  We can do this with (01)*.  This is a pattern.  The * is called the Kleene Star and will apply an arbitrary number of concatentations to whatever precedes it, in this case 01.  Now the computer just has to compare: Is the first string equal to 0?  Is the next equal to 1?  And so on until the end of the string.

Other examples of patterns could be 1*0, which is the language of all strings consisting of any number of 1s, followed by a 0.  Or there may be two Kleene Stars, such as the pattern 1*0*.  This is any number of 1s followed by any number of 0s.

Alternative


So we've looked at using the Kleene Star to create languages.  But that's not enough for some patterns.  For example, what if we wanted the language of all strings consisting of just 0s and just 1s, ie. the language {0, 00, 000, 1, 11, 111, 1111, 0000, ...}

We can also use the alternative, |, to represent a choice.  For example, 0|1 is the language consisting of the string 0 and the string 1.  The language 0*|1* represents the language described above - All strings that are either an arbitrary number of 0s or an arbitrary number of 1s.

Building up Patterns


We can define patterns recursively in the following way:

Base cases:

The character ø is the empty pattern.
The character ϵ is a pattern consisting of the empty word.
Every letter from a specified alphabet ∑ is a pattern.

Recursive cases:

If p and q are both patterns, then so is (pq).  This is concatenation.
If p and q are both patterns, then so is (p|q).  This is the alternative.
If p is a pattern, then so is (p*).  This is the kleene star.

Matching words with Patterns


In order to match a word with a pattern, we can do it recursively, as follows:

Base cases:

The empty word ϵ matches the pattern ϵ.
The pattern p is a single character and s is that character.

Recursive cases:

The pattern p is a concatenation p=(p1p2) and there are words s1 and s2 such that s1 matches p1, s2 matches p2, and s = s1s2.
The pattern p is an alternative p=(p1|p2) and s matches p1 or p2, or both.
The pattern p is of the form p = (q*) and s can be written as s = s1s2s3...sn such that s1, s2, s3, ..., sn all match q.

(Note that no word matches the empty pattern ø).

Languages of Patterns


Given a pattern, p, we can refer to the language of all words that match that pattern as L(p).  Or, again, we can define the language recursively, as follows:

Base cases:

L(ø) = ø.
L(ϵ) = {ϵ}.
L(x) = {x} for all x in ∑.

Recursive cases:

L(p1p2) = L(p1)L(p2)
L(p1|p2) = L(p1) ∪ L(p2)
L(p*) = (L(p))

A regular language is any language that can be described by a pattern. Note that there are some languages that can't be represented by a pattern, such as the language of all strings that consist of an arbitrary number of 0s followed by the SAME number of 1s, ie. the language {01, 0011, 000111, 00001111, ...}.  This is because there is no way of remembering how many 0s there were.  We will find other ways to express these languages later, in Context Free Grammars.

So from simple patterns we've shown how to build them up into more complex patterns, then we saw how to recursively tell if a word matches a pattern, and finally how to show and build up the languages of a pattern.  This basically concludes what patterns are, and you should by now have some idea of how to build up a pattern given a rule.

No comments:

Post a Comment