Machine Learning with Scikit‑Learn

Build and Evaluate Simple ML Models in Python

Learn how to build and evaluate simple machine learning models using Scikit‑Learn in Python. This tutorial provides practical examples and techniques for model training, prediction, and evaluation, all within a data science workflow.

Programming
Author
Affiliation
Published

February 7, 2024

Modified

February 8, 2025

Keywords

Scikit-learn tutorial, machine learning in Python, build ML model, model evaluation scikit-learn, Python ML

Introduction

Scikit‑Learn is one of Python’s most popular libraries for machine learning, offering a wide range of tools for data mining, data analysis, and model building. In this tutorial, we’ll show you how to build and evaluate simple machine learning models using Scikit‑Learn. Whether you’re new to machine learning or looking to refresh your skills, this guide will help you understand the process of model training, prediction, and evaluation—all within a data science workflow.



Building a Simple Machine Learning Model

One of the fundamental tasks in machine learning is to create a model that can learn from data and make predictions. In this section, we’ll walk through the steps involved in building a simple linear regression model.

Example: Linear Regression

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create a synthetic dataset
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X.flatten() + np.random.randn(100)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
Mean Squared Error: 0.6536995137170021

This example demonstrates how to create a synthetic dataset, train a linear regression model, and evaluate its performance using the mean squared error metric.

Evaluating Your Model

Once your model is built, it’s crucial to evaluate its performance to ensure it generalizes well to new data. Key evaluation steps include:

  • Data Splitting:
    Use techniques like train/test split or cross-validation to partition your dataset.

  • Performance Metrics:
    Depending on the model type, use appropriate metrics such as accuracy, precision, recall for classification, or mean squared error, R² for regression.

  • Validation:
    Validate your model’s predictions on unseen data to assess its effectiveness.

Conclusion

Building machine learning models with Scikit‑Learn is a straightforward yet powerful process. By following the steps outlined in this tutorial—data preparation, model training, prediction, and evaluation—you can create models that extract valuable insights from your data. Experiment with different algorithms and evaluation metrics to further refine your models.

Further Reading

Happy coding, and enjoy building your machine learning models with Scikit‑Learn!

Back to top

Reuse

Citation

BibTeX citation:
@online{kassambara2024,
  author = {Kassambara, Alboukadel},
  title = {Machine {Learning} with {Scikit‑Learn}},
  date = {2024-02-07},
  url = {https://www.datanovia.com/learn/programming/python/data-science/machine-learning-with-scikit-learn.html},
  langid = {en}
}
For attribution, please cite this work as:
Kassambara, Alboukadel. 2024. “Machine Learning with Scikit‑Learn.” February 7, 2024. https://www.datanovia.com/learn/programming/python/data-science/machine-learning-with-scikit-learn.html.