Computer Science: Source: Artificial Intelligence

Artificial Intelligence

Artificial Intelligence is the field of computer science that concerns itself with getting better and more positive responses to the question 'Can machines think?'. Over the years there have been a number of approaches to artificial intelligence, the current best method that is growing in popularity is the use of probability theory in AI, and this probabilistic reasoning is spreading to almost all areas of Artificial Intelligence.

Basics of Probability Theory

A probability distribution is a function that takes a proposition and gives a probability in the range of 0 and 1. For example, say you have a proposition that "If I toss a coin, it will come up tails". If you put this in the probability distribution, you should get an output of 0.5. Or you could use a two-headed coin, which would have a different probability distribution that would give an output of 0.

There are two basic facts about probability distributions which make it a probability distribution, given propositions 'a' and 'b'.

K1 - If a is always true, then p(a) = 1.
K2 - If a AND b are never true at the same time, then the probability of a OR b being true is p(a) + p(b).

p(a) can represent an agents degree of belief that a is true. If p(a) = 1, then the agent believes a is always true. If p(a) = 0, then the agent believes a is always false.

K1 and K2 are intuitively reasonable and can allow us to prove many more things about probabilities, such as:

The probability of a not happening is 1 - p(a).
If b is always true when a is true, then p(a) is less than or equal to p(b).
If a and b are always either true or false at the same time, then p(a) = p(b).
The probability of a OR b being true equals p(a) + p(b) - p(a AND b).

Partitions

Say you have a number of propositions, a1, a2, a3, a4, etc.

These propositions are 'mutually exclusive' if no two of the propositions is ever true at the same time.
These propositions are 'jointly exhaustive' if the sum of p(a1) and p(a2) and p(a3) etc. is 1.
These propositions form a 'partition' if they are both mutually exclusive and jointly exhaustive.

An example of a partition could be the propositions 'The coin will land on heads' and 'The coin will land on tails'. p(heads) = 0.5 and p(tails) = 0.5. They can never be true at the same time so they are mutually exclusive, and they add to 1 so they are jointly exhaustive. Therefore, these two propositions form a partition.

Conditional Probability

Conditional probability is when you have two propositions, and know that one is true, and want to know the probability of the other GIVEN that the other is true.

p(a|b) can be read as 'the probability of a given that b is known to be true'. It can be worked out with the formula p(a|b) = p(a AND b) / p(b).

Another way of thinking about it is that the proposition (a|b) is the degree of belief that an agent would have in a if and when it got the information that b was certainly true. Some basic facts that follow from this are the following:

p(a|b) is always between 0 and 1, because it is a proposition.
If p(a) is 0, then p(a|b) = 0 as well, because it doesn't matter whether b is true, a will never be true.
If b is true whenever a is true, then p(a|b) = 1, because if b is known to be true then a must be true.
If b and c are always true or false at the same time, then p(a|b) = p(a|c).

Conditionalising

So we have the probability distribution p, and we can use that to find the probability of something being true. But we're going to define another probability distribution, p_b, to be the probability distribution assuming that b is true (given that p(b) is greater than 0). So basically, p_b(a) = p(a|b). This is the probability distribution obtained by 'conditionalising' on b. Whenever an agent receives a piece of information, it conditionalises it and works out a new probability distribution in this way.

Note that it's also true that (p_c)_b(a) = p_{b AND c}(a) = (p_b)_c(a), so it doesn't matter in what order we conditionalise the information, if an agent receives more than one piece of information at a time. The fundamentals of probabilistic reasoning in AI is that an agent keeps receiving data and keeps working out a new probability distribution based on the previous probability distribution and the information that it has received.

Bayes Theorem

The next theorem is probably the most important, but most simple, equation in probability theory.

p(b|a) = p(a|b)p(b) / p(a).

Or in its alternative, often more useful, form:

p(bi|a) = p(a|bi)p(bi) / ( p(a|b1)p(b1) + p(a|b2)p(b2) + ... + p(a|bn)p(bn) )

This is typically applied when a is an observation statement and b1, b2, ..., bn are all different hypotheses that can explain that observation. We'll see what this means in practical terms in the next section on robot localisation. For now, just know that this is a very useful theorem.

Robot Localisation

Say we have a robot, in a room. The room has obstacles in. The robot has a map in its memory that tells it where the obstacles are, but the robot doesn't know where it is in the room. The robot has three sensors on the front, left, and right sides. It can move forward, or it can turn left and right.

First lets talk about the surroundings. Lets make it a coordinate system, so the robot is in a square room, from (0,0) in one corner to (99,99) in the opposite corner. In addition to this, we have to know which direction the robot is facing, so we'll also split up the direction into the range (0,99), where 0 is him facing one way, increasing as he turns more clockwise.

So lets let l_ijt be the proposition that 'the robot is currently in location (i,j) facing in direction t'.

The collection of propositions l_0,0,0, l_0,0,1, l_0,0,2, ..., l_99,99,98, l_99,99,99 form a partition. The robot can be in any one of 100x100x100 = 1,000,000 poses, but it must be in one and only one. So these propositions all add up to 1.

We can always get less accurate information, say, l_i,j which represents the probability of it being in position (i,j) regardless of the direction it is facing is equal to the total sum of l_i,j,0, l_i,j,1, ..., l_i,j,99.

Initial State of Belief

So now we know how the robot will represent its belief of what pose it is in, we now consider what the initial data should be. For simplicity, we assume that the robot has no idea where it is in the room. It could be in any pose, with equal probability. However, the robot knows that there are objects in the room and it knows where the objects in the room are. So it knows it cannot be on top of an object, obviously. So it needs to be set up with its initial beliefs such that all positions where it knows there is an object must have a probability of 0 of it being there. The poses that represent it NOT being on an object must all have the same probability and form a partition.

This might seem a bit complicated, but consider it smaller scale, say a room that's only 2x2. If there are no objects, the probability of it being in any coordinate is 1/4 because there are 4 coordinates. If there is something blocking the coordinate (1,1) then clearly it can't be on (1,1), and there are 3 coordinates that it can be on, so each will have a probability of 1/3. It's basically the same as this but on a 100x100 grid, and taking direction into account as well. Some coordinates have a probability of 0, but all the rest have the same probability and form a partition.

Sensor Readings

So we've set the initial state, now we need to consider how we're going to update the robot's belief state so that when it gets information, it updates the probability of where it is accordingly. We've already said that the robot has 3 sensors, one on the front, right, and left. So lets say that 'o' represents the proposition that a certain sensor has reported a distance. For example, 'Left sensor reports a distance of 32 to nearest object or wall'.

But sensors are not accurate, they are noisy. So we need to model the values that the sensors return. We'll say that they are representative of a normal distribution, and the sensor returns an integer between 0 and 2x (where x is the true distance), with mean x and standard deviation of about x/3.

Now we've been given an observation, o, we need to update our probability distribution, p, so that it takes o into account, ie. we need to work out the probability distribution p_o. Remember that p_o(l_i,j,t) = p(l_i,j,t|o). So we need to work this value out for every single value of i, j, and t, and update the 100x100x100 matrix storing our degrees of belief of where we are.

Using the alternative form of Bayes Theorem, we can see that:

p(l_i,j,t|o) = p(o|l_i,j,t)p(l_i,j,t) / ∑_i',j',t' p(o|l_i',j',t')p(l_i',j',t')

And we have to work this value out for every single one of our 1,000,000 poses. Which is a pretty ugly formula, but it makes sense. Lets go through it term by term, starting with the top. The top consists of p(o|l_i,j,t)p(l_i,j,t).

p(l_i,j,t) is the robots current belief that it is in pose i,j,t, so we can simply extract that value from the 100x100x100 matrix for each value.

p(o|l_i,j,t) is the probability of observation o, if we assume that the robot is currently in pose i,j,t. This is tricky and depends on the accuracy of the sensors, but we now have a model for this. Say, for example, that pose i,j,t is in the middle of the room facing a wall 50 squares away, and the forward sensor returns a value of 2, then this value will be very low because the probability of the sensor returning a value of 2 given that the robot is 50 squares away from the wall is very very low, on a normal distribution.

The bottom is the sum of, for all values of i,j,t, the above two values. So first you would loop through the matrix and work out p(o|l_i,j,t)p(l_i,j,t) for every possible pose i,j,t. Then you would sum up these numbers, and that would be the bottom of the equation every time. ∑_i',j',t'p(o|l_i',j',t')p(l_i',j',t') is a constant for each observation o.

Robot Actions

But the robot is not static, the robot can move forward, and turn left and right, as we've seen. Lets use 'a' to represent the proposition of the robot performing an action, such as 'Move forward 10 squares'. It may not be as simple as that. For example, what if the floor is wet, or the motors are running out of power, or the robot runs into a wall. It may run less than or greater than the 10 squares it's supposed to. We can model this with another normal distribution.

So if we have l_i,j,t representing the robots belief that it is at position (i,j,t), we can use l`_i,j,t to represent the proposition "After the action, the robots new pose will be (i,j,t)".

The engineer programming the robot should be able to estimate the probabilities p(l`_i,j,t|a AND l_i',j',t').

To update the matrix in response to an action, we can use the following formula:

p(l`_i,j,t|a) = ∑_i',j',t' p(l`_i,j,t|l_i',j',t' AND a)p(l_i',j',t').

Where p(l`_i,j,t|a) means the probability of the robots pose being (i,j,t) given that the action a has happened, p(l_i',j',t') is the probability of the robot being in post (i',j',t'), and p(l`_i,j,t|l_i',j',t' AND a) is the probability of the robot being in pose (i,j,t) given that action a has happened AND the robot used to be in position (i',j',t').

Summary

So to sum up,the robot should start with an equal belief that it could be in any position.

The robot has a 3 dimensional matrix, 100x100x100, storing its x-coordinate, y-coordinate, and direction.

Sensor readings and actions are not static, they may be less or more than what it should be.

The robots belief matrix is updated every time there is an observation or the robot follows some action.

To update on an observation this formula should be followed:

p(l_i,j,t|o) = p(o|l_i,j,t)p(l_i,j,t) / ∑_i',j',t' p(o|l_i',j',t')p(l_i',j',t')

And to update on an action this formula should be followed:

p(l`_i,j,t|a) = ∑_i',j',t' p(l`_i,j,t|l_i',j',t' AND a)p(l_i',j',t').

Wednesday, 9 September 2009

Artificial Intelligence - Robot Localisation