# 5.1. Linear Regression

Todo

Zrobić aby wykorzystywało szablon _template.rst

## 5.1.1. Co to jest Linear Regression?

The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.

The coefficients, the residual sum of squares and the variance score are also calculated.

Figure 162. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.

## 5.1.2. Przed zastosowaniem

• Trzeba usunac outlayery

• Trzeba sprawdzić czy są osobne klastry danych, tzn. czy linia jest przedziałami ciągła, tzn. gdyby podzielić na segmenty, to można lepiej dostosować regresję

## 5.1.3. Wyznaczanie równania prostej

Figure 163. Manipulowanie parametrami prostej (classifiera) w celu określenia funkcji.

Figure 164. Wyznaczanie równania prostej.

Todo

Pojęcia: .. glossary:

Loss Function

Parameters

Overshoot

Undershoot

Goldi Locks

Chain rule

Weight

Computatiion Graph

Forward Propagation

Backpropagation


## 5.1.4. Funkcja przedziałami liniowa

Figure 165. Funkcja przedziałami liniowa

Wykorzystanie biblioteki sklearn

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model

# Use only one feature
diabetes_features = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
features_train = diabetes_features[:-20]
features_test = diabetes_features[-20:]

# Split the targets into training/testing sets
labels_train = diabetes.target[:-20]
labels_test = diabetes.target[-20:]

# Create linear regression object
model = linear_model.LinearRegression()

# Train the model using the training sets
model.fit(features_train, labels_train)

# The coefficients
print('Coefficients: \n{model.coef_}')

# The mean squared error
print("Mean squared error: %.2f"
% np.mean((model.predict(features_test) - labels_test) ** 2))

# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % model.score(features_test, labels_test))

# Plot outputs
plt.scatter(features_test, labels_test, color='black')
plt.plot(features_test, model.predict(features_test), color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

Coefficients: [ 938.23786125]
Mean squared error: 2548.07
Variance score: 0.4


Figure 166. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.

### 5.1.5.1. Własna implementacja

import pandas as pd
from math import pow

"""
Function to calculate the mean value of the input readings
"""
return mean

"""
Calculating the variance of the readings
"""

# To calculate the variance we need the mean value
# Calculating the mean value from the cal_mean function

return variance / float(len(readings) - 1)

"""
Calculate the covariance between two different list of readings
"""
covariance = 0.0

return covariance / float(readings_size - 1)

"""
Calculating the simple linear regression coefficients (B0, B1)
"""

# Directly calling the implemented covariance and the variance functions
# To calculate the coefficient B1

# Coefficient B0 = mean of y_readings - ( B1 * the mean of the x_readings )

return b0, b1

def predict_target_value(x, b0, b1):
"""
Calculating the target (y) value using the input x and the coefficients b0, b1
"""
return b0 + b1 * x

"""
Calculating the root mean square error
"""
square_error_total = 0.0
square_error_total += pow(error, 2)
return rmse

def simple_linear_regression(dataset):
"""
Implementing simple linear regression without using any python library
"""

# Get the dataset header names

# Calculating the mean of the square feet and the price readings

# Calculating the regression
w1 = covariance_of_price_and_square_feet / float(square_feet_variance)

w0 = price_mean - (w1 * square_feet_mean)

# Predictions
dataset['Predicted_Price'] = w0 + w1 * dataset[dataset_headers[0]]

if __name__ == "__main__":
input_path = '../_data/input-data.csv'
simple_linear_regression(house_price_dataset)


## 5.1.6. Assignments

### 5.1.6.1. Least square regression 3 points

• Complexity level: easy

• Lines of code to write: 10 lines

• Estimated time of completion: 15 min

1. Consider the following set of points: $${(-2 , -1) , (1 , 1) , (3 , 2)}$$

2. Find the least square regression line for the given data points.

3. Plot the given points and the regression line in the same rectangular system of axes.

4. Napisz własny kod implementujący rozwiązanie

### 5.1.6.2. Least square regression 4 points

• Complexity level: easy

• Lines of code to write: 10 lines

• Estimated time of completion: 15 min

1. Find the least square regression line for the following set of data: $${(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}$$

2. Plot the given points and the regression line in the same rectangular system of axes.

3. Użyj kodu z przykładu własnej implementacji do rozwiązania

### 5.1.6.3. Company sales

• Complexity level: easy

• Lines of code to write: 10 lines

• Estimated time of completion: 15 min

The sales of a company (in million dollars) for each year are shown in the table below.

Todo

przepisać tabelkę

x (year)    2005    2006    2007    2008    2009
y (sales)   12      19      29      37      45

1. Find the least square regression line $$y = ax + b$$ .

2. Use the least squares regression line as a model to estimate the sales of the company in 2012.

3. Użyj biblioteki sklearn