Wednesday, 6 January 2010

Machine Learning – K-Nearest Neighbour (Part 2)

Last Time


Last time we learnt about the algorithm that the K-Nearest Neighbour use’s, using Cats and Dogs, if you want more go here!:  Part 1


Decision Boundaries


But now, we are going to learn all about Decision Boundaries, yaaay! (sarcastic… ;) )

Okay, i’ll be honest, they aren’t actually hard, in fact they’re very simple. Lets take a look at the graph from Part 1, with out the star:

graphcdbasic So on this graph, at what position does a new point have to be so that it is on the border of either being a Cat or a Dog? Well, the border would look something like this:

graphcddb

Now here we have the boundary – the red line. With this we can determine what a new point is, for example, look at the square, we can quite happily say that that square is a type of cat now :D . But, what about the star? Well, this is where the algorithm will get confused and will probably get the answer right or wrong – depending on how many examples of the classes there are.

Over-Fitting


This is pretty quick and simple to explain. Over Fitting is caused when you assume that you have all the data. For example, remember that star from the first graph in Part 1? Well we said that was a Cat correct. Wrong, it was in fact suppose to be a dog. But, what dog could be that small, and that furry? The answer is a Yorkshire Terrier :)

Portrait of yorkshire Sitting in front of white background

Or a Poodle, or a puppy… meh :P

On the other hand, we could have a big cat that is not very furry. So for example those Siamese cats from Lady and the Tramp that i cant remember the names of.. erm…  if you know please tell me! :)

No comments:

Post a Comment