Using Decision Trees and Random Forests for Machine Learning Classification in Python

This content originally appeared on Level Up Coding - Medium and was authored by Omardonia

Decision Trees and Random Forests are powerful machine learning algorithms used for classification and regression tasks. Decision Trees create a model that predicts the value of a target variable based on several input variables, while Random Forests use multiple decision trees to make predictions. In this article, we will explore how to use Decision Trees and Random Forests in Python using the Scikit-Learn library.

Decision Trees

A Decision Tree is a tree-like model that predicts the value of a target variable based on several input variables. It splits the data based on the values of the input variables, creating a tree-like structure. The leaves of the tree contain the predicted values.

Example

Let’s take a look at an example of how to use a Decision Tree to predict whether or not a passenger on the Titanic survived. We will use the Titanic dataset, which contains information about passengers on the Titanic, including their age, sex, class, and whether or not they survived.

First, let’s load the dataset and split it into training and testing sets:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Next, we will create a Decision Tree classifier and fit it to the training data:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

We can now use the trained classifier to predict the class of the test data:

y_pred = clf.predict(X_test)

Finally, we can evaluate the performance of the classifier using accuracy:

from sklearn.metrics import accuracy_score

print("Accuracy:", accuracy_score(y_test, y_pred))

Tuning Parameters

Decision Trees have several parameters that can be tuned to improve their performance. Some of the important parameters include:

max_depth: the maximum depth of the tree
min_samples_split: the minimum number of samples required to split an internal node
min_samples_leaf: the minimum number of samples required to be at a leaf node

These parameters can be set when creating the classifier, for example:

clf = DecisionTreeClassifier(max_depth=5, min_samples_split=10, min_samples_leaf=5)

Visualization

We can also visualize the Decision Tree using the Graphviz library:

from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(clf, out_file=None, feature_names=data.feature_names, class_names=data.target_names)
graph = graphviz.Source(dot_data)
graph.render("iris")

This will create a visualization of the Decision Tree in the file “iris.pdf”.

Random Forests

Random Forests are a powerful machine learning algorithm that uses multiple Decision Trees to make predictions. Each Decision Tree is trained on a random subset of the data and a random subset of the input variables. The final prediction is made by taking the average of the predictions of all the Decision Trees.

Example

Let’s take a look at an example of how to use a Random Forest to predict whether or not a passenger on the Titanic survived. We will use the same Titanic dataset as before.

First, let’s load the dataset and split it into training and testing sets:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X = data.datapy
y = data.target

X_train, X_test, y_train, y_test = train_test

Next, we will create a Random Forest classifier and fit it to the training data:

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()
clf.fit(X_train, y_train)

We can now use the trained classifier to predict the class of the test data:

y_pred = clf.predict(X_test)

Finally, we can evaluate the performance of the classifier using accuracy:

from sklearn.metrics import accuracy_score

print("Accuracy:", accuracy_score(y_test, y_pred))

Tuning Parameters

Random Forests have several parameters that can be tuned to improve their performance. Some of the important parameters include:

n_estimators: the number of Decision Trees in the forest
max_depth: the maximum depth of each Decision Tree
min_samples_split: the minimum number of samples required to split an internal node
min_samples_leaf: the minimum number of samples required to be at a leaf node

These parameters can be set when creating the classifier, for example:

clf = RandomForestClassifier(n_estimators=100, max_depth=5, min_samples_split=10, min_samples_leaf=5)

Feature Importance

Random Forests can also be used to determine the importance of the input features. This can be useful for feature selection or understanding the underlying relationships in the data.

importances = clf.feature_importances_

The importance variable will contain an array of values indicating the importance of each feature.

Visualization

We can also visualize the Decision Trees in the Random Forest using the Graphviz library:

from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(clf.estimators_[0], out_file=None, feature_names=data.feature_names, class_names=data.target_names)
graph = graphviz.Source(dot_data)
graph.render("tree")

This will create a visualization of the first Decision Tree in the Random Forest in the file “tree.pdf”.

Conclusion

In this article, we explored how to use Decision Trees and Random Forests in Python using the Scikit-Learn library. We looked at examples of how to create and tune classifiers, as well as how to visualize the models and determine feature importance. These algorithms are powerful tools for classification and regression tasks and can be used to make predictions in a wide range of applications.

Using Decision Trees and Random Forests for Machine Learning Classification in Python was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding - Medium and was authored by Omardonia

Print Share Comment Cite Upload Translate Updates

APA

Omardonia | Sciencx (2023-03-23T23:22:11+00:00) Using Decision Trees and Random Forests for Machine Learning Classification in Python. Retrieved from https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/

MLA

" » Using Decision Trees and Random Forests for Machine Learning Classification in Python." Omardonia | Sciencx - Thursday March 23, 2023, https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/

HARVARD

Omardonia | Sciencx Thursday March 23, 2023 » Using Decision Trees and Random Forests for Machine Learning Classification in Python., viewed ,<https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/>

VANCOUVER

Omardonia | Sciencx - » Using Decision Trees and Random Forests for Machine Learning Classification in Python. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/

CHICAGO

" » Using Decision Trees and Random Forests for Machine Learning Classification in Python." Omardonia | Sciencx - Accessed . https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/

IEEE

" » Using Decision Trees and Random Forests for Machine Learning Classification in Python." Omardonia | Sciencx [Online]. Available: https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/. [Accessed: ]

rf:citation

» Using Decision Trees and Random Forests for Machine Learning Classification in Python | Omardonia | Sciencx | https://www.scien.cx/2023/03/23/using-decision-trees-and-random-forests-for-machine-learning-classification-in-python/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Decision Trees

Example

Tuning Parameters

Visualization

Random Forests

Example

Tuning Parameters

Feature Importance

Visualization

Conclusion

Related Posts