Whoever wants to learn machine learning or become a data scientist, the most obvious thing to learn first time is linear regression. Linear regression is the simplest machine learning algorithm and it is generally used for forecasting. The goal of linear regression is to find a relationship between one or more independent variables and a dependent variable by fitting the best line. This best fit line is known as regression line and defined by a linear equation Y= a *X + b.

For instance, in the case of the height of children vs their age. After collecting the data of children height and their age in months, we can plot the data in a scatter plot such as in Figure below.

Linear regression will find the relationship between age as the independent variable and height as the dependent variable. Linear regression will find the best fit line from all points on the scatter plot. Finally, it can be used as a prediction, for instance, to predict what height the children when his age enter 35 months?

How to implement this linear regression in Python?

First, to make easier, I will generate a random dataset for our experiment.

import pandas as pd
import numpy as np
np.random.seed(0)
x = np.random.rand(100, 1) #generate random number for x variable
y = 2 + 3 * x + np.random.rand(100, 1) # generate random number of y variable
x[:10], y[:10] #Show the first 10 rows of each x and y

There are many ways to build a regression model, we can build it from scratch or just use the library from Python. In this example, I use scikit-learn

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from matplotlib import pyplot as plt
# Initialize the model
model = LinearRegression()
# Train the model - fit the data to the model
model.fit(x, y)
# Predict
y_predicted = model.predict(x)
# model evaluation
rmse = mean_squared_error(y, y_predicted)
r2 = r2_score(y, y_predicted)
# printing values
print('Slope:' ,model.coef_)
print('Intercept:', model.intercept_)
print('Root mean squared error: ', rmse)
print('R2 score: ', r2)
# plotting values
plt.scatter(x, y, s=5)
plt.xlabel('x')
plt.ylabel('y')
# predicted values
plt.plot(x, y_predicted, color='r')
plt.show()