Decision tree algorithm is very basic method to predict result of given input using the pre-known data (knowledge) . Let us see how this can be used in machine learning to do predictions.
To start with, you need python scikit-learn module
aptitude install gfortran libatlas-base-dev libopenblas-dev liblapack-dev
pip3 install scikit-learn --index-url https://piwheels.org/simple
Think of a problem statement…say predicting a domestic animal with its height, length and weight.
Lets think of cats, dogs and cows.
Prepare a data set of few samples, say 3 cats, 4 dogs, 4 cows (more samples, better is the prediction).
Label each data sample with mammal you have recorded. Also, assign number representation to animals. say “0=cat, 1=dog, 2=cow”. Like:
AnimalSize = [[20,40,3.5],[24,48,4.3],[23,38,4.5], [40,80,21.4],[45.5,90.4,25],[55,100,28],[60,110,30], [110,150,350],[130,180,400],[140.5,220,500],[145,210,510]]
animals = [0,0,0,1,1,1,1,2,2,2,2]
Code that guesses species:
#!/usr/bin/python3
import sklearn
from sklearn import tree
# Size list of height, lenght, weight
AnimalSize = [[20,40,3.5],[24,48,4.3],[23,38,4.5], [40,80,21.4],[45.5,90.4,25],[55,100,28],[60,110,30], [110,150,350],[130,180,400],[140.5,220,500],[145,210,510]]
# 0=cat, 1=dog, 2=cow
animals = [0,0,0,1,1,1,1,2,2,2,2]
clafr = tree.DecisionTreeClassifier()
clafr = clafr.fit(AnimalSize, animals)
AniInp = input("What size of animal you saw ? (height, length, weight):")
ans=clafr.predict([AniInp.split(",")])
if ans == 0:
print("cat")
if ans == 1:
print("dog")
if ans == 2:
print("cow")
Lets ask above program to make predictions
./animal.py
What size of animal you saw ? (height, length, weight):20, 35, 5
cat
./animal.py
What size of animal you saw ? (height, length, weight):40, 80, 35
dog
./animal.py
What size of animal you saw ? (height, length, weight):100, 200, 420
cow
You can further improve prediction of above program by confirming its predictions and feeding back your inputs to pre-known data ie knowledge.
Now guess why social media sites ask their users to tag photos, locations and use hashtags đŸ™‚ ? They need data, more and more data to improve their knowledge and build intelligence on top of this data.