Learning PyTorch – cristianexer

What is PyTorch ?

It’s a Python-based scientific computing package targeted at two sets of audiences: * a replacement for NumPy to use the power of GPUs * a deep learning research platform that provides maximum flexibility and speed

From NumPy to PyTorch

Tensors

Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

NumPy to Torch

np.array() == torch.Tensor()
np.ones() == torch.ones()
np.random.rand() == torch.rand()
type(array) == tensor.type
np.shape(array) == tensor.shape
np.resize(array,size) == tensor.view(size)
np.add(x,y) == torch.add(x,y)
np.sub(x,y) == torch.sub(x,y)
np.multiply(x,y) == torch.mul(x,y)
np.divide(x,y) == torch.div(x,y)
np.mean(array) == tensor.mean()
np.std(array) == tensor.std()

We can also convert arrays NumPy <-> PyTorch

Variables

A Variable wraps a Tensor. It supports nearly all the API’s defined by a Tensor. Variable also provides a backward method to perform backpropagation.

For example, to backpropagate a loss function to train model parameter x, we use a variable loss to store the value computed by a loss function.

Then, we call loss.backward which computes the gradients for all trainable parameters.

PyTorch will store the gradient results back in the corresponding variable x.

Autograd is a PyTorch package for the differentiation for all operations on Tensors. It performs the backpropagation starting from a variable.

In deep learning, this variable often holds the value of the cost function. backward executes the backward pass and computes all the backpropagation gradients automatically.

Layers

Imports

Code

import torch
import pandas as pd
import numpy as np
import os,re,sys,io
from matplotlib import pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from time import time
import seaborn as sns

scaler = MinMaxScaler()
plt.style.use('seaborn')

/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

Linear Regression

Boston Dataset

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the demand for clean air’, J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics …’, Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.

CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000 s

Code

boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	PRICE
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67	22.4
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08	20.6
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64	23.9
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48	22.0
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88	11.9

506 rows × 14 columns

It can be seen that the value range of data is different and the difference is large, so we need to make standardization.

Suppose each feature has a mean value μ and a standard deviation σ on the whole dataset. Hence we can subtract each value of the feature and then divide μ by σ to get the normalized value of each feature. (Tutorial approach)
Another option is to use the MinMaxScaler from sklearn

Code

# apply the min max scaling for each column but not PRICE
for col in boston.columns[:-1]:
  boston[[col]] = scaler.fit_transform(boston[[col]])

PyTorch Linear Rregression

Then we split the data into train/test while casting the data to numpy arrays

Code

X_train, X_test, y_train, y_test = train_test_split(boston[boston_dataset.get('feature_names')].to_numpy(), boston['PRICE'].to_numpy(), test_size=0.3, random_state=42)

Further on we cast our splitted data to tensors

Code

X_train = torch.tensor(X_train, dtype=torch.float)
X_test = torch.tensor(X_test, dtype=torch.float)

y_train = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test = torch.tensor(y_test, dtype=torch.float).view(-1, 1)

Linear Regression Model

Code

w_num = X_train.shape[1]
net = torch.nn.Sequential( # sequential layer
    torch.nn.Linear(w_num, 1) # linear layer
)
torch.nn.init.normal_(net[0].weight, mean=0, std=0.1)
torch.nn.init.constant_(net[0].bias, val=0)

Parameter containing:
tensor([0.], requires_grad=True)

Dataset Processor

Code

dataset = torch.utils.data.TensorDataset(X_train, y_train)

Data Loader

At the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset, with support for

map-style and iterable-style datasets
customizing data loading order
automatic batching
single- and multi-process data loading
automatic memory pinning

Code

train_iter = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True)

Loss Function

We can compare the predictions with the actual targets, using the following method:

Calculate the difference between the two matrices (preds and targets). Square all elements of the difference matrix to remove negative values. Calculate the average of the elements in the resulting matrix. The result is a single number, known as the mean squared error (MSE).

Code

loss = torch.nn.MSELoss()

Optimizers

Rather than manually updating the weights of the model as we have been doing, we use the optim package to define an Optimizer that will update the weights for us.

All optimizers

For this model the optimizer is SGD -> Stochastic Gradient Descent

Code

optimizer = torch.optim.SGD(net.parameters(), lr=0.05)

Training

For a number of epochs
- Get each features and label
  - Use the model to predict the features into output
  - Calculate the loss between output and real label
  - Set the otimizer’s gradient to 0
  - l.backward() back propagates the loss calculated for each x that requires gradient
  - optimizer.step() updates the weights

Code

num_epochs = 375

for epoch in range(num_epochs):

    for x, y in train_iter:

        output = net(x)

        l = loss(output, y)

        optimizer.zero_grad()

        l.backward()

        optimizer.step()

    if epoch % 25 == 0:
      
      print("epoch {} loss: {:.4f}".format(epoch, l.item()))

epoch 0 loss: 43.4257
epoch 25 loss: 40.3690
epoch 50 loss: 21.1435
epoch 75 loss: 21.3177
epoch 100 loss: 7.8272
epoch 125 loss: 176.3772
epoch 150 loss: 51.4810
epoch 175 loss: 15.8238
epoch 200 loss: 19.9211
epoch 225 loss: 5.0811
epoch 250 loss: 17.8753
epoch 275 loss: 17.3649
epoch 300 loss: 8.3193
epoch 325 loss: 11.2076
epoch 350 loss: 13.8668

Metrics

Code

pred = pd.DataFrame({
    'true': [x[0].tolist() for x in y_test],
    'predicted': [x[0].tolist() for x in net(X_test)],
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f'MSE: {loss(net(X_test), y_test).item():.4f}')

GLM Linear Regression

Code

import statsmodels.api as sm
import patsy

Load the data

Code

boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	PRICE
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67	22.4
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08	20.6
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64	23.9
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48	22.0
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88	11.9

506 rows × 14 columns

We split again the data in train/test but this time a bit different

Code

train,test = train_test_split(boston, test_size=0.3, random_state=42)

Then we create the R style formula for our matrices

Code

formula = f"PRICE ~ {' + '.join(boston.columns[:-1])}"
formula

'PRICE ~ CRIM + ZN + INDUS + CHAS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT'

Pass the formula to the dmatrices method from patsy for both train and test dataframes

Code

y_train, X_train = patsy.dmatrices(formula, train, return_type='matrix')
y_test, X_test = patsy.dmatrices(formula, test, return_type='matrix')

General Liniar Mode (GLM)

Using Gaussian Distribution

Code

glm_gaussian = sm.GLM(y_train,X_train,data=train, family=sm.families.Gaussian(sm.families.links.log())).fit()
print(glm_gaussian.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  PRICE   No. Observations:                  354
Model:                            GLM   Df Residuals:                      340
Model Family:                Gaussian   Df Model:                           13
Link Function:                    log   Scale:                          17.746
Method:                          IRLS   Log-Likelihood:                -1004.4
Date:                Mon, 08 Jun 2020   Deviance:                       6033.6
Time:                        07:54:17   Pearson chi2:                 6.03e+03
No. Iterations:                     8                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.5788      0.230     15.530      0.000       3.127       4.030
CRIM          -0.0087      0.003     -2.878      0.004      -0.015      -0.003
ZN             0.0008      0.001      1.464      0.143      -0.000       0.002
INDUS          0.0034      0.003      1.305      0.192      -0.002       0.008
CHAS           0.0648      0.030      2.142      0.032       0.006       0.124
NOX           -0.5497      0.196     -2.802      0.005      -0.934      -0.165
RM             0.1375      0.017      8.169      0.000       0.105       0.170
AGE           -0.0001      0.001     -0.218      0.827      -0.001       0.001
DIS           -0.0476      0.009     -5.450      0.000      -0.065      -0.031
RAD            0.0138      0.003      4.025      0.000       0.007       0.021
TAX           -0.0004      0.000     -2.333      0.020      -0.001   -6.53e-05
PTRATIO       -0.0343      0.006     -6.234      0.000      -0.045      -0.024
B              0.0006      0.000      2.828      0.005       0.000       0.001
LSTAT         -0.0368      0.003    -12.897      0.000      -0.042      -0.031
==============================================================================

Metrics

Code

pred = pd.DataFrame({
    'true': [x[0] for x in y_test.tolist()],
    'predicted': glm_gaussian.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")

Using Gamma Distribution

Code

glm_gamma = sm.GLM(y_train,X_train,data=train, family=sm.families.Gamma(sm.families.links.log())).fit()
print(glm_gamma.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  PRICE   No. Observations:                  354
Model:                            GLM   Df Residuals:                      340
Model Family:                   Gamma   Df Model:                           13
Link Function:                    log   Scale:                        0.040116
Method:                          IRLS   Log-Likelihood:                -997.01
Date:                Mon, 08 Jun 2020   Deviance:                       12.717
Time:                        07:54:34   Pearson chi2:                     13.6
No. Iterations:                    19                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.9604      0.250     15.820      0.000       3.470       4.451
CRIM          -0.0105      0.002     -6.236      0.000      -0.014      -0.007
ZN             0.0010      0.001      1.387      0.165      -0.000       0.002
INDUS          0.0027      0.003      0.889      0.374      -0.003       0.009
CHAS           0.1144      0.043      2.670      0.008       0.030       0.198
NOX           -0.6999      0.196     -3.564      0.000      -1.085      -0.315
RM             0.0902      0.021      4.399      0.000       0.050       0.130
AGE           -0.0002      0.001     -0.311      0.756      -0.002       0.001
DIS           -0.0495      0.010     -4.951      0.000      -0.069      -0.030
RAD            0.0120      0.003      3.546      0.000       0.005       0.019
TAX           -0.0005      0.000     -2.361      0.018      -0.001   -7.64e-05
PTRATIO       -0.0363      0.006     -5.693      0.000      -0.049      -0.024
B              0.0006      0.000      4.484      0.000       0.000       0.001
LSTAT         -0.0298      0.002    -12.150      0.000      -0.035      -0.025
==============================================================================

Metrics

Code

pred = pd.DataFrame({
    'true': [x[0] for x in y_test.tolist()],
    'predicted': glm_gamma.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")

XGBoost Linear Regression

Code

import xgboost as xgb

Load the data

Code

boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	PRICE
0	0.00632	18.0	2.31	0.0	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.0	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.0	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.0	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.0	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
501	0.06263	0.0	11.93	0.0	0.573	6.593	69.1	2.4786	1.0	273.0	21.0	391.99	9.67	22.4
502	0.04527	0.0	11.93	0.0	0.573	6.120	76.7	2.2875	1.0	273.0	21.0	396.90	9.08	20.6
503	0.06076	0.0	11.93	0.0	0.573	6.976	91.0	2.1675	1.0	273.0	21.0	396.90	5.64	23.9
504	0.10959	0.0	11.93	0.0	0.573	6.794	89.3	2.3889	1.0	273.0	21.0	393.45	6.48	22.0
505	0.04741	0.0	11.93	0.0	0.573	6.030	80.8	2.5050	1.0	273.0	21.0	396.90	7.88	11.9

506 rows × 14 columns

Train/Test Split

Code

X_train, X_test, y_train, y_test = train_test_split(boston[boston_dataset.get('feature_names')], boston['PRICE'], test_size=0.3, random_state=42)

Train a Regression model using reg:gamma for objective aka Gamma distribution

Code

reg = xgb.XGBRegressor(n_estimators=1000,objective='reg:gamma')
reg.fit(X_train, y_train,eval_set=[(X_test, y_test)],eval_metric='rmse',verbose=100)

[0] validation_0-rmse:22.5723
[100]   validation_0-rmse:3.35307
[200]   validation_0-rmse:3.17188
[300]   validation_0-rmse:3.14021
[400]   validation_0-rmse:3.12192
[500]   validation_0-rmse:3.1099
[600]   validation_0-rmse:3.11599
[700]   validation_0-rmse:3.12263
[800]   validation_0-rmse:3.12669
[900]   validation_0-rmse:3.13014
[999]   validation_0-rmse:3.13269

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=1000,
             n_jobs=1, nthread=None, objective='reg:gamma', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

Metrics

Code

pred = pd.DataFrame({
    'true': y_test.values,
    'predicted': reg.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")

Classification

MNIST Dataset

Training: 60,000 samples

Test: 10,000 samples

Image Dimension: 28x28

Labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Dataset Original Link

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems

Code

from torchvision import datasets, transforms

Compose a transformer to nomralize the data

transforms.ToTensor() converts the image into numbers, that are understandable by the system. It separates the image into three color channels (separate images): red, green & blue. Then it converts the pixels of each image to the brightness of their color between 0 and 255. These values are then scaled down to a range between 0 and 1.
transforms.Normalize() normalizes the tensor with a mean and standard deviation which goes as the two parameters respectively.

Code

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])

Download the dataset and normalize it

Code

# Download and load the training data
trainset = datasets.MNIST('drive/My Drive/mnist/MNIST_data/', download=True, train=True, transform=transform)
valset = datasets.MNIST('drive/My Drive/mnist/MNIST_data/', download=True, train=False, transform=transform)

Load the train/test into a DataLoader

Code

trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=64, shuffle=True)

Code

dataiter = iter(trainloader)
images, labels = dataiter.next()
print(type(images))
print(images.shape)
print(labels.shape)

<class 'torch.Tensor'>
torch.Size([64, 1, 28, 28])
torch.Size([64])

Code

def plot_digit(digit,**fig_params):
  plt.figure(**fig_params)
  plt.axis('off')
  plt.imshow(digit.numpy().squeeze(), cmap='gray_r')

Code

plot_digit(images[13],figsize=(1,1))

PyTorch Classification

Sources:

Then we define a Seqeuntial model with 3 levels of layers, Linear which applies a linear transformation and ReLu which applies the rectified linear, the output of this chain of transofrmation being passed into LogSoftmax activation function

Code

# Layer details for the neural network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = torch.nn.Sequential(torch.nn.Linear(input_size, hidden_sizes[0]),
                      torch.nn.ReLU(),
                      torch.nn.Linear(hidden_sizes[0], hidden_sizes[1]),
                      torch.nn.ReLU(),
                      torch.nn.Linear(hidden_sizes[1], output_size),
                      torch.nn.LogSoftmax(dim=1))
print(model)

Sequential(
  (0): Linear(in_features=784, out_features=128, bias=True)
  (1): ReLU()
  (2): Linear(in_features=128, out_features=64, bias=True)
  (3): ReLU()
  (4): Linear(in_features=64, out_features=10, bias=True)
  (5): LogSoftmax()
)

Our loss function is NegativeLogLoss

Code

loss = torch.nn.NLLLoss()

For this model the optimizer is SGD -> Stochastic Gradient Descent

Code

optimizer = torch.optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

This time on the training process we reshape each image matrix into one 1x1 array

Code

time0 = time()
epochs = 15
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        # Flatten MNIST images into a 784 long vector
        images = images.view(images.shape[0], -1)
    
        optimizer.zero_grad()
        
        output = model(images)
        
        l = loss(output, labels)
        
        l.backward()

        optimizer.step()
        
        running_loss += l.item()
    else:
        print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)

Epoch 0 - Training loss: 0.6175047809492423
Epoch 1 - Training loss: 0.27926216799535475
Epoch 2 - Training loss: 0.21707710018877918
Epoch 3 - Training loss: 0.17828493828434488
Epoch 4 - Training loss: 0.14850337554349194
Epoch 5 - Training loss: 0.12661053494476815
Epoch 6 - Training loss: 0.11251892998659693
Epoch 7 - Training loss: 0.10000329123718589
Epoch 8 - Training loss: 0.08876785766164552
Epoch 9 - Training loss: 0.08140811096054754
Epoch 10 - Training loss: 0.07434628869015683
Epoch 11 - Training loss: 0.06872579670681962
Epoch 12 - Training loss: 0.06227882651151466
Epoch 13 - Training loss: 0.05694495400846767
Epoch 14 - Training loss: 0.05275964385930147

Training Time (in minutes) = 3.717697087923686

Validation Process

Code

y_true,y_pred = list(),list()
correct_count, all_count = 0, 0
for images,labels in valloader:
  for i in range(len(labels)):
    img = images[i].view(1, 784)
    # Turn off gradients to speed up this part
    with torch.no_grad():
        logps = model(img)

    # Output of the network are log-probabilities, need to take exponential for probabilities
    ps = torch.exp(logps)
    probab = list(ps.numpy()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.numpy()[i]
    y_true.append(true_label)
    y_pred.append(pred_label)
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

Number Of Images Tested = 10000

Model Accuracy = 0.9745

Classification Report

Code

print(metrics.classification_report(y_true=y_true,y_pred=y_pred))

              precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.98      0.99      0.99      1135
           2       0.98      0.97      0.97      1032
           3       0.97      0.98      0.97      1010
           4       0.97      0.97      0.97       982
           5       0.97      0.96      0.97       892
           6       0.98      0.97      0.98       958
           7       0.96      0.98      0.97      1028
           8       0.98      0.97      0.97       974
           9       0.98      0.95      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

Confusion Matrix

Code

pd.DataFrame(data=metrics.confusion_matrix(y_true=y_true,y_pred=y_pred))

	0	1	2	3	4	5	6	7	8	9
0	968	0	0	1	0	4	2	2	2	1
1	0	1124	3	2	0	1	1	1	3	0
2	4	1	1005	5	3	0	1	9	4	0
3	0	0	3	991	0	5	0	6	4	1
4	0	0	7	1	953	0	1	6	1	13
5	4	2	0	11	2	860	6	1	2	4
6	5	4	1	0	4	7	932	0	5	0
7	1	6	7	1	0	0	0	1010	0	3
8	3	2	4	4	4	3	3	3	947	1
9	2	5	0	7	14	3	1	19	3	955

Code

plt.figure(figsize=(15,10))
sns.heatmap(pd.DataFrame(data=metrics.confusion_matrix(y_true=y_true,y_pred=y_pred),columns=list(range(10)),index=list(range(10))),cmap='Blues',annot=True, fmt="d")

XGBoost Classification

Sources: * Ensemble Learning case study: Running XGBoost on Google Colab free GPU * Multiclass & Multilabel Classification with XGBoost

Code

import xgboost as xgb

Extract train/test data from the data loaders

Code

X_train = np.array([x.flatten() for x in trainset.data.numpy()])
y_train = np.array([x.flatten() for x in trainset.targets.numpy()])

X_test = np.array([x.flatten() for x in valset.data.numpy()])
y_test = np.array([x.flatten() for x in valset.targets.numpy()])

For the XGBClassifier the multi:softmax objective is used to permit training on multiple label classificaiton

Code

%%time
xgc = xgb.XGBClassifier(n_jobs=-1,objective='multi:softmax',num_class=10 ,max_depth=4)
xgc.fit(X_train,y_train.ravel())

CPU times: user 2min 48s, sys: 1min 55s, total: 4min 44s
Wall time: 4min 44s

Classification Report

Code

preds = pd.DataFrame({
    'true': y_test.ravel(),
    'preds': xgc.predict(X_test)
})
print(metrics.classification_report(y_true=preds['true'],y_pred=preds['preds']))

              precision    recall  f1-score   support

           0       0.96      0.99      0.97       980
           1       0.98      0.99      0.99      1135
           2       0.95      0.95      0.95      1032
           3       0.96      0.95      0.95      1010
           4       0.96      0.94      0.95       982
           5       0.96      0.95      0.95       892
           6       0.97      0.97      0.97       958
           7       0.96      0.94      0.95      1028
           8       0.94      0.94      0.94       974
           9       0.91      0.94      0.93      1009

    accuracy                           0.95     10000
   macro avg       0.95      0.95      0.95     10000
weighted avg       0.95      0.95      0.95     10000

Confusion Matrix

Code

pd.DataFrame(data=metrics.confusion_matrix(y_true=preds['true'],y_pred=preds['preds']))

	0	1	2	3	4	5	6	7	8	9
0	969	0	1	0	0	1	3	1	4	1
1	0	1123	2	1	0	1	4	1	3	0
2	9	0	978	12	9	0	1	11	10	2
3	2	0	14	956	0	11	2	9	8	8
4	2	0	4	0	922	1	7	1	5	40
5	5	3	2	13	1	843	6	4	10	5
6	7	3	0	0	5	11	925	0	7	0
7	2	6	24	3	3	1	0	963	6	20
8	5	2	5	5	4	5	6	7	918	17
9	8	7	2	8	15	5	0	6	7	951

Code

plt.figure(figsize=(15,10))
sns.heatmap(pd.DataFrame(data=metrics.confusion_matrix(y_true=preds['true'],y_pred=preds['preds']),columns=list(range(10)),index=list(range(10))),cmap='Blues',annot=True, fmt="d")

What is PyTorch ?

From NumPy to PyTorch

Tensors

Variables

Layers

Imports

Linear Regression

Boston Dataset

PyTorch Linear Rregression

GLM Linear Regression

XGBoost Linear Regression

Classification

MNIST Dataset

PyTorch Classification

XGBoost Classification

References