Wednesday, 6 January 2010

Machine Learning – K-Nearest Neighbour (Part 1)

Cats and Dogs

Lets take 2 example classes. Say we wanted to be able to recognise the difference between a Cat and a Dog, how would you do this?

Well, firstly we have to identify some features that can easily distinguish them both. Lets have a quick example of a cat and a dog – yeah, seriously :P

DogObedienceTrainingcat_0_preview  

So then, the one on the left is a dog, and on the right is a cat (how cuuuute :D ) – i hope you realised this? Note: yes i was tempted to use a lolcat ¬.¬

So then, the features. Well, generally speaking, a dog is bigger than cat, yet a cat can have more fur than a dog. So here we can have the identified features of:

  • Size
  • Furriness (I think that is a word….)

With this, we can draw up a very quick graph, with the axis of Size and Furriness.

graphcdw

So, as we can see the Cats will normally cluster around the points in the top-left, and the dogs at the bottom-right.
However, what is the star – is it a Cat or a Dog?

 

The Algorithm

This is how the K-Nearest Neighbour algorithm works. Firstly we have a variable “K”. With this, you assign “K” a value from 1 to the total number of examples. In the case above we have 7 examples (you don’t include the new example, the star)

From my experience the best value to chose for K is a low odd number – so for here I shall chose the value of 3. So K=3.

With this value of “K”, we look at the 3 closest points on the graph from the location of the new example point:

graphcdwkn Before I continue, YES THE STAR HAS MOVED … i accidentally made a stupid mistake, don’t worry though, its nothing important – but just go with this new graph :)

Ok, so from the graph above, I have drawn lines from the new point, to the 3 most closest points on the graph (because K=3, if it were 5, then there would be 5 lines, if it were 7, then there would be lines to all other points.)

The next part is to assign the new point a class, the class type will be determined from the majority of the closest points. For this it will be a Cat, since Cats out number Dogs 2-1.

 

Algorithm in Short

  1. Assign K a value – preferably a small odd number
  2. Find the closest number of K points
  3. Assign the new point from the majority of classes.

Next in Part 2 is Decision Boundaries, and what happens if everything goes horribly wrong! :D

1 comment:

  1. [...] Filed under: Computer Science, Machine Learning, Year 2 | Tagged: COMP20411, Machine Learning, Year 2 « Year 2 – Machine Learning – K-Nearest Neighbour (Part 1) [...]

    ReplyDelete