Sunday, 10 January 2010

Machine Learning – Decision Trees

I bet you thought that was going to say Decision Boundaries again :D – well… that is… if you’ve read the first 4 Machine Learning posts ;)

Nope, this time is Decision Trees, which are very similar to trees in programming – aka Binary Trees.

Beer!

No, do NOT go and get any beer… although im not exactly going to be able to stop you am i? :)

The best way to see a Decision Tree is with an example, and the best example is with Students and Beer! – no im not saying every student drinks, but seriously… the majority do… even if you’re like me who would rather have a nice cup of hot-chocolate; just go with it ;)

Lets say we have 2 kinds of people, Students and Teachers. We have 1 kind of drink; Beer. Now, a Student (go with it) only drinks beer where there isn’t an exam on. And a Teacher will only drink Beer when its the weekend. With this we can say:

  • IF student AND exam THEN beer
  • IF student AND not exam THEN no beer
  • IF teacher AND weekday THEN no beer
  • IF teacher AND weekend THEN beer

With this, we can easily construct a Decision Tree:

detresb

Here we can see, say the first example above.
If Person=student AND Exam=yes THEN Beer

 

Before anybody says anything, yes, I do realise the mistake ive made. If exam then beer should be If exam then no beer. i apologise :(

 

Over-fitting

Yep, thats right. Time to revisit and old friend :D

So, as described before, if we assume that we have all the data it can lead to over-fitting. Because of this it can mean that our techniques only work well on the Training Data. Yet, with a Decision Tree, over-fitting can lead to very large and obscure trees.

No comments:

Post a Comment