🐍 PyTorch 🔥
Tensors
Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
NumPy
to Torch
np.array()
==torch.Tensor()
np.ones()
==torch.ones()
np.random.rand()
==torch.rand()
type(array)
==tensor.type
np.shape(array)
==tensor.shape
np.resize(array,size)
==tensor.view(size)
np.add(x,y)
==torch.add(x,y)
np.sub(x,y)
==torch.sub(x,y)
np.multiply(x,y)
==torch.mul(x,y)
np.divide(x,y)
==torch.div(x,y)
np.mean(array)
==tensor.mean()
np.std(array)
==tensor.std()
We can also convert arrays NumPy
<-> PyTorch
Variables
A Variable
wraps a Tensor. It supports nearly all the API’s defined by a Tensor. Variable
also provides a backward method to perform backpropagation.
For example, to backpropagate a loss function to train model parameter x, we use a variable loss to store the value computed by a loss function.
Then, we call loss.backward which computes the gradients for all trainable parameters.
PyTorch will store the gradient results back in the corresponding variable x.
Autograd is a PyTorch package for the differentiation for all operations on Tensors. It performs the backpropagation starting from a variable.
In deep learning, this variable often holds the value of the cost function. backward executes the backward pass and computes all the backpropagation gradients automatically.
import torch
import pandas as pd
import numpy as np
import os,re,sys,io
from matplotlib import pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from time import time
import seaborn as sns
scaler = MinMaxScaler()
plt.style.use('seaborn')
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. ‘Hedonic prices and the demand for clean air’, J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, ‘Regression diagnostics …’, Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.
CRIM
per capita crime rate by townZN
proportion of residential land zoned for lots over 25,000 sq.ft.INDUS
proportion of non-retail business acres per townCHAS
Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)NOX
nitric oxides concentration (parts per 10 million)RM
average number of rooms per dwellingAGE
proportion of owner-occupied units built prior to 1940DIS
weighted distances to five Boston employment centresRAD
index of accessibility to radial highwaysTAX
full-value property-tax rate per $10,000PTRATIO
pupil-teacher ratio by townB
1000(Bk - 0.63)^2 where Bk is the proportion of blacks by townLSTAT
% lower status of the populationMEDV
Median value of owner-occupied homes in $1000 s
boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston
It can be seen that the value range of data is different and the difference is large, so we need to make standardization.
Suppose each feature has a mean value μ and a standard deviation σ on the whole dataset. Hence we can subtract each value of the feature and then divide μ by σ to get the normalized value of each feature. (
Tutorial approach
)Another option is to use the
MinMaxScaler
fromsklearn
for col in boston.columns[:-1]:
boston[[col]] = scaler.fit_transform(boston[[col]])
Then we split the data into train/test
while casting the data to numpy
arrays
X_train, X_test, y_train, y_test = train_test_split(boston[boston_dataset.get('feature_names')].to_numpy(), boston['PRICE'].to_numpy(), test_size=0.3, random_state=42)
Further on we cast our splitted data to tensors
X_train = torch.tensor(X_train, dtype=torch.float)
X_test = torch.tensor(X_test, dtype=torch.float)
y_train = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test = torch.tensor(y_test, dtype=torch.float).view(-1, 1)
Linear Regression Model
w_num = X_train.shape[1]
net = torch.nn.Sequential( # sequential layer
torch.nn.Linear(w_num, 1) # linear layer
)
torch.nn.init.normal_(net[0].weight, mean=0, std=0.1)
torch.nn.init.constant_(net[0].bias, val=0)
Dataset Processor
dataset = torch.utils.data.TensorDataset(X_train, y_train)
Data Loader
At the heart of PyTorch data loading utility is the torch.utils.data.DataLoader
class. It represents a Python iterable over a dataset, with support for
- map-style and iterable-style datasets
- customizing data loading order
- automatic batching
- single- and multi-process data loading
- automatic memory pinning
train_iter = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True)
Loss Function
We can compare the predictions with the actual targets, using the following method:
Calculate the difference between the two matrices (preds and targets). Square all elements of the difference matrix to remove negative values. Calculate the average of the elements in the resulting matrix. The result is a single number, known as the mean squared error (MSE).
loss = torch.nn.MSELoss()
Optimizers
Rather than manually updating the weights of the model as we have been doing, we use the optim package to define an Optimizer that will update the weights for us.
For this model the optimizer is SGD
-> Stochastic Gradient Descent
optimizer = torch.optim.SGD(net.parameters(), lr=0.05)
Training
- For a number of
epochs
- Get each
features
andlabel
- Use the
model
topredict
thefeatures
intooutput
- Calculate the
loss
betweenoutput
and reallabel
- Set the otimizer's
gradient
to 0 l.backward()
back propagates the loss calculated for eachx
that requires gradientoptimizer.step()
updates the weights
- Use the
- Get each
num_epochs = 375
for epoch in range(num_epochs):
for x, y in train_iter:
output = net(x)
l = loss(output, y)
optimizer.zero_grad()
l.backward()
optimizer.step()
if epoch % 25 == 0:
print("epoch {} loss: {:.4f}".format(epoch, l.item()))
Metrics
pred = pd.DataFrame({
'true': [x[0].tolist() for x in y_test],
'predicted': [x[0].tolist() for x in net(X_test)],
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f'MSE: {loss(net(X_test), y_test).item():.4f}')
import statsmodels.api as sm
import patsy
Load the data
boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston
We split again the data in train/test
but this time a bit different
train,test = train_test_split(boston, test_size=0.3, random_state=42)
Then we create the R
style formula for our matrices
formula = f"PRICE ~ {' + '.join(boston.columns[:-1])}"
formula
Pass the formula to the dmatrices
method from patsy
for both train
and test
dataframes
y_train, X_train = patsy.dmatrices(formula, train, return_type='matrix')
y_test, X_test = patsy.dmatrices(formula, test, return_type='matrix')
General Liniar Mode (GLM)
Using Gaussian
Distribution
glm_gaussian = sm.GLM(y_train,X_train,data=train, family=sm.families.Gaussian(sm.families.links.log())).fit()
print(glm_gaussian.summary())
Metrics
pred = pd.DataFrame({
'true': [x[0] for x in y_test.tolist()],
'predicted': glm_gaussian.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")
Using Gamma
Distribution
glm_gamma = sm.GLM(y_train,X_train,data=train, family=sm.families.Gamma(sm.families.links.log())).fit()
print(glm_gamma.summary())
Metrics
pred = pd.DataFrame({
'true': [x[0] for x in y_test.tolist()],
'predicted': glm_gamma.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")
import xgboost as xgb
Load the data
boston_dataset = load_boston()
boston = pd.DataFrame(data=boston_dataset.get('data'),columns=boston_dataset.get('feature_names'))
boston['PRICE'] = boston_dataset.get('target')
boston
Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(boston[boston_dataset.get('feature_names')], boston['PRICE'], test_size=0.3, random_state=42)
Train a Regression
model using reg:gamma
for objective aka Gamma
distribution
reg = xgb.XGBRegressor(n_estimators=1000,objective='reg:gamma')
reg.fit(X_train, y_train,eval_set=[(X_test, y_test)],eval_metric='rmse',verbose=100)
Metrics
pred = pd.DataFrame({
'true': y_test.values,
'predicted': reg.predict(X_test)
})
pred.plot.hist(alpha=0.5,figsize=(16,7),title=f"MSE: {metrics.mean_squared_error(pred['true'],pred.predicted):.4f}")
Training: 60,000
samples
Test: 10,000
samples
Image Dimension: 28x28
Labels: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems
from torchvision import datasets, transforms
Compose
a transformer to nomralize the data
transforms.ToTensor()
converts the image into numbers, that are understandable by the system. It separates the image into three color channels (separate images): red, green & blue. Then it converts the pixels of each image to the brightness of their color between 0 and 255. These values are then scaled down to a range between 0 and 1.transforms.Normalize()
normalizes the tensor with a mean and standard deviation which goes as the two parameters respectively.
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])
Download the dataset and normalize it
trainset = datasets.MNIST('drive/My Drive/mnist/MNIST_data/', download=True, train=True, transform=transform)
valset = datasets.MNIST('drive/My Drive/mnist/MNIST_data/', download=True, train=False, transform=transform)
Load the train
/test
into a DataLoader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=64, shuffle=True)
dataiter = iter(trainloader)
images, labels = dataiter.next()
print(type(images))
print(images.shape)
print(labels.shape)
def plot_digit(digit,**fig_params):
plt.figure(**fig_params)
plt.axis('off')
plt.imshow(digit.numpy().squeeze(), cmap='gray_r')
plot_digit(images[13],figsize=(1,1))
Then we define a Seqeuntial
model with 3 levels of layers, Linear
which applies a linear transformation and ReLu
which applies the rectified linear, the output of this chain of transofrmation being passed into LogSoftmax
activation function
input_size = 784
hidden_sizes = [128, 64]
output_size = 10
# Build a feed-forward network
model = torch.nn.Sequential(torch.nn.Linear(input_size, hidden_sizes[0]),
torch.nn.ReLU(),
torch.nn.Linear(hidden_sizes[0], hidden_sizes[1]),
torch.nn.ReLU(),
torch.nn.Linear(hidden_sizes[1], output_size),
torch.nn.LogSoftmax(dim=1))
print(model)
Our loss function is NegativeLogLoss
loss = torch.nn.NLLLoss()
For this model the optimizer is SGD
-> Stochastic Gradient Descent
optimizer = torch.optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
This time on the training process we reshape each image matrix into one 1x1 array
time0 = time()
epochs = 15
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
# Flatten MNIST images into a 784 long vector
images = images.view(images.shape[0], -1)
optimizer.zero_grad()
output = model(images)
l = loss(output, labels)
l.backward()
optimizer.step()
running_loss += l.item()
else:
print("Epoch {} - Training loss: {}".format(e, running_loss/len(trainloader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)
Validation Process
y_true,y_pred = list(),list()
correct_count, all_count = 0, 0
for images,labels in valloader:
for i in range(len(labels)):
img = images[i].view(1, 784)
# Turn off gradients to speed up this part
with torch.no_grad():
logps = model(img)
# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
probab = list(ps.numpy()[0])
pred_label = probab.index(max(probab))
true_label = labels.numpy()[i]
y_true.append(true_label)
y_pred.append(pred_label)
if(true_label == pred_label):
correct_count += 1
all_count += 1
print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))
Classification Report
print(metrics.classification_report(y_true=y_true,y_pred=y_pred))
Confusion Matrix
pd.DataFrame(data=metrics.confusion_matrix(y_true=y_true,y_pred=y_pred))
plt.figure(figsize=(15,10))
sns.heatmap(pd.DataFrame(data=metrics.confusion_matrix(y_true=y_true,y_pred=y_pred),columns=list(range(10)),index=list(range(10))),cmap='Blues',annot=True, fmt="d")
import xgboost as xgb
Extract train
/test
data from the data loaders
X_train = np.array([x.flatten() for x in trainset.data.numpy()])
y_train = np.array([x.flatten() for x in trainset.targets.numpy()])
X_test = np.array([x.flatten() for x in valset.data.numpy()])
y_test = np.array([x.flatten() for x in valset.targets.numpy()])
For the XGBClassifier
the multi:softmax
objective is used to permit training on multiple label classificaiton
%%time
xgc = xgb.XGBClassifier(n_jobs=-1,objective='multi:softmax',num_class=10 ,max_depth=4)
xgc.fit(X_train,y_train.ravel())
Classification Report
preds = pd.DataFrame({
'true': y_test.ravel(),
'preds': xgc.predict(X_test)
})
print(metrics.classification_report(y_true=preds['true'],y_pred=preds['preds']))
Confusion Matrix
pd.DataFrame(data=metrics.confusion_matrix(y_true=preds['true'],y_pred=preds['preds']))
plt.figure(figsize=(15,10))
sns.heatmap(pd.DataFrame(data=metrics.confusion_matrix(y_true=preds['true'],y_pred=preds['preds']),columns=list(range(10)),index=list(range(10))),cmap='Blues',annot=True, fmt="d")