Practical: PyTorch
In this practical, we will learn how to use PyTorch for some machine learning tasks.
PyTorch is a popular open-source machine learning library, particularly known for its flexibility and Pythonic feel. It’s widely used for building and training neural networks. Let’s cover the fundamental building blocks you’ll encounter. It is widely used in academia and industry for deep learning applications: like computer vision, natural language processing, and reinforcement learning.
Features¶
- Tensors: PyTorch provides a powerful tensor library that allows for efficient computation on multi-dimensional arrays. Tensors are similar to NumPy arrays but can be used on GPUs for faster computation.
- Autograd: PyTorch has a built-in automatic differentiation engine that allows for easy computation of gradients. This is particularly useful for training neural networks using backpropagation.
- Dynamic computation graph: PyTorch uses a dynamic computation graph, which means that the graph is built on-the-fly as operations are performed. This allows for more flexibility in building complex models and makes debugging easier.
- Neural networks: PyTorch provides a high-level API for building and training neural networks. It includes pre-defined layers, loss functions, and optimizers that make it easy to build complex models.
- GPU support: PyTorch can easily switch between CPU and GPU computation, making it easy to take advantage of the speedup provided by GPUs.
- Community: PyTorch has a large and active community, which means that there are many resources available for learning and troubleshooting. There are also many pre-trained models and libraries available for use.
# Installation
!pip3 install torch torchvision torchaudio torch_geometricTensors: The Core Data Structure¶
Tensors are the central data structure in PyTorch, similar to NumPy arrays but with added capabilities, notably the ability to run on GPUs for accelerated computation and automatic differentiation.
import torch
import numpy as np
# --- Creating Tensors ---
# From a list
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
print(f"Tensor from list:\n{x_data}\n")
# From a NumPy array (shares memory by default!)
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print(f"Tensor from NumPy array:\n{x_np}\n")
# Creating tensors with specific shapes and values
shape = (2, 3,)
rand_tensor = torch.rand(shape) # Random values between 0 and 1
ones_tensor = torch.ones(shape) # All ones
zeros_tensor = torch.zeros(shape) # All zeros
print(f"Random Tensor:\n {rand_tensor} \n")
print(f"Ones Tensor:\n {ones_tensor} \n")
print(f"Zeros Tensor:\n {zeros_tensor}\n")
# --- Tensor Attributes ---
tensor = torch.rand(3, 4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}") # Default is CPU
# --- Moving Tensors to GPU (if available) ---
if torch.cuda.is_available():
print("CUDA (GPU) is available!")
device = torch.device("cuda")
# Move tensor to GPU
tensor_gpu = tensor.to(device)
print(f"Device tensor_gpu is stored on: {tensor_gpu.device}")
# Operations on tensor_gpu will run on the GPU
# Move back to CPU (e.g., for use with NumPy)
tensor_cpu = tensor_gpu.to("cpu")
# NOTE: NumPy cannot handle GPU tensors directly.
elif torch.backends.mps.is_available():
print("Apple Silicon GPU (MPS) is available!")
device = torch.device("mps")
# Move tensor to MPS
tensor_mps = tensor.to(device)
print(f"Device tensor_mps is stored on: {tensor_mps.device}")
# Operations on tensor_mps will run on the Apple Silicon GPU
# Move back to CPU (e.g., for use with NumPy)
tensor_cpu = tensor_mps.to("cpu")
# NOTE: NumPy cannot handle MPS tensors directly.
else:
print("CUDA or Apple Silicon (GPU) not available, using CPU.")
device = torch.device("cpu") # Use CPU if GPU not available
# --- Basic Operations ---
# Similar syntax to NumPy
tensor_a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
tensor_b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)
# Element-wise addition
sum_tensor = tensor_a + tensor_b
# or torch.add(tensor_a, tensor_b)
print(f"Addition:\n{sum_tensor}\n")
# Element-wise multiplication
mul_tensor = tensor_a * tensor_b
# or torch.mul(tensor_a, tensor_b)
print(f"Element-wise Multiplication:\n{mul_tensor}\n")
# Matrix multiplication
matmul_tensor = tensor_a @ tensor_b
# or torch.matmul(tensor_a, tensor_b)
print(f"Matrix Multiplication:\n{matmul_tensor}\n")
# --- NumPy Bridge ---
# Tensor to NumPy array
numpy_array_again = sum_tensor.numpy() # Only works for CPU tensors
print(f"Tensor converted back to NumPy:\n{numpy_array_again}\n")Tensor from list:
tensor([[1, 2],
[3, 4]])
Tensor from NumPy array:
tensor([[1, 2],
[3, 4]])
Random Tensor:
tensor([[0.2063, 0.8167, 0.6958],
[0.4978, 0.8836, 0.5970]])
Ones Tensor:
tensor([[1., 1., 1.],
[1., 1., 1.]])
Zeros Tensor:
tensor([[0., 0., 0.],
[0., 0., 0.]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu
Apple Silicon GPU (MPS) is available!
Device tensor_mps is stored on: mps:0
Addition:
tensor([[ 6., 8.],
[10., 12.]])
Element-wise Multiplication:
tensor([[ 5., 12.],
[21., 32.]])
Matrix Multiplication:
tensor([[19., 22.],
[43., 50.]])
Tensor converted back to NumPy:
[[ 6. 8.]
[10. 12.]]
Autograd: Automatic Differentiation¶
This is one of PyTorch’s most powerful features. If a tensor is created with requires_grad=True, PyTorch tracks all operations performed on it. When you finish your computation (e.g., calculating a loss), you can automatically compute gradients using .backward().
# Create tensors that require gradient tracking
x = torch.ones(2, 2, requires_grad=True)
print(f"x:\n{x}\n")
# Perform an operation
y = x + 2
print(f"y = x + 2:\n{y}\n")
# y was created as a result of an operation involving x, so it has a grad_fn
print(f"y.grad_fn: {y.grad_fn}\n")
z = y * y * 3
out = z.mean() # Calculate a scalar value
print(f"z = y*y*3:\n{z}\n")
print(f"out = z.mean(): {out}\n")
# --- Compute gradients ---
# out is a scalar, so we can call backward() directly
out.backward()
# Gradients are accumulated in the .grad attribute of the tensors
# d(out)/dx is computed
print(f"Gradient of out w.r.t. x (d(out)/dx):\n{x.grad}\n")
# Let's verify manually for one element:
# out = (1/4) * sum(z) = (1/4) * sum(3 * (x+2)^2)
# d(out)/dx_ij = (1/4) * d/dx_ij [ 3 * (x_ij+2)^2 ]
# = (1/4) * 3 * 2 * (x_ij+2) * 1
# = (3/2) * (x_ij+2)
# Since x starts as ones(2,2), x_ij = 1.
# d(out)/dx_ij = (3/2) * (1+2) = (3/2) * 3 = 4.5
# The output tensor x.grad should be [[4.5, 4.5], [4.5, 4.5]]
# Turn off gradient tracking when not needed (e.g., during evaluation)
with torch.no_grad():
y_no_grad = x + 2
print(f"y_no_grad.requires_grad: {y_no_grad.requires_grad}")x:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
y = x + 2:
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
y.grad_fn: <AddBackward0 object at 0x11fd3be20>
z = y*y*3:
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>)
out = z.mean(): 27.0
Gradient of out w.r.t. x (d(out)/dx):
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
y_no_grad.requires_grad: False
nn.Module: Building Neural Networks¶
PyTorch provides the torch.nn package to build neural networks. You typically define your network as a class inheriting from nn.Module.
__init__(self): Define the layers of your network here (e.g., linear layers, activation functions).forward(self, x): Define how input x flows through the layers defined in init to produce the output. Autograd automatically builds the computation graph based on this forward pass.
import torch.nn as nn
# Example: A simple Linear Regression Model (y = Wx + b)
class LinearRegression(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegression, self).__init__()
# Define the layer(s)
self.linear = nn.Linear(input_dim, output_dim) # One linear layer
def forward(self, x):
# Define the forward pass
out = self.linear(x)
return out
# Example: A Multi-Layer Perceptron (like used in the demos)
class SimpleMLP(nn.Module):
def __init__(self, input_size, output_size=1, hidden_layers=[64, 32]):
super(SimpleMLP, self).__init__()
layers = []
current_size = input_size
# Dynamically create hidden layers based on the list
for hidden_size in hidden_layers:
layers.append(nn.Linear(current_size, hidden_size))
layers.append(nn.ReLU()) # Add ReLU activation after each hidden layer
current_size = hidden_size
# Add the final output layer
layers.append(nn.Linear(current_size, output_size))
# Use nn.Sequential to easily chain the layers
self.network = nn.Sequential(*layers)
def forward(self, x):
# Pass input through the sequential network
return self.network(x)
# Instantiate the MLP model
input_features = 10 # Example input size
output_value = 1 # Example output size (for regression)
model_mlp = SimpleMLP(input_features, output_value)
print("MLP Model Structure:")
print(model_mlp)
# You can easily inspect model parameters
print("\nModel Parameters:")
for name, param in model_mlp.named_parameters():
if param.requires_grad:
print(name, param.shape)MLP Model Structure:
SimpleMLP(
(network): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=32, bias=True)
(3): ReLU()
(4): Linear(in_features=32, out_features=1, bias=True)
)
)
Model Parameters:
network.0.weight torch.Size([64, 10])
network.0.bias torch.Size([64])
network.2.weight torch.Size([32, 64])
network.2.bias torch.Size([32])
network.4.weight torch.Size([1, 32])
network.4.bias torch.Size([1])
Loss Functions¶
A loss function measures how far the model’s output is from the target value. PyTorch provides many common loss functions in torch.nn.
from torch import nn
import torch
# Example Loss Functions
criterion_mse = nn.MSELoss() # Mean Squared Error: For regression
criterion_bce = nn.BCELoss() # Binary Cross Entropy: For binary classification (with Sigmoid output)
criterion_ce = nn.CrossEntropyLoss() # Cross Entropy Loss: For multi-class classification (expects raw logits)
# Example usage (assuming model_output and target are tensors)
model_output_reg = torch.randn(10, 1, requires_grad=True) # Example regression output
target_reg = torch.randn(10, 1)
loss_mse = criterion_mse(model_output_reg, target_reg)
print(f"Example MSE Loss: {loss_mse.item()}") # .item() gets scalar value
model_output_cls = torch.sigmoid(torch.randn(10, 1, requires_grad=True)) # Example classification output (post-sigmoid)
target_cls = torch.randint(0, 2, (10, 1)).float() # Example binary targets (0 or 1)
loss_bce = criterion_bce(model_output_cls, target_cls)
print(f"Example BCE Loss: {loss_bce.item()}")Example MSE Loss: 2.7126641273498535
Example BCE Loss: 0.7222577929496765
Optimizers¶
An optimizer implements an algorithm to update the model’s parameters (weights and biases) based on the computed gradients, aiming to minimize the loss function. Common optimizers are found in torch.optim.
import torch.optim as optim
# Use the MLP model we defined earlier
model_to_train = SimpleMLP(10, 1)
# --- Choose an Optimizer ---
# Adam is a popular and often effective choice
optimizer_adam = optim.Adam(model_to_train.parameters(), lr=0.001) # lr is the learning rate
# Stochastic Gradient Descent (SGD) is another common one
# optimizer_sgd = optim.SGD(model_to_train.parameters(), lr=0.01, momentum=0.9)
# --- Optimizer Steps (within a training loop) ---
# 1. Zero the gradients before calculating loss for a new batch
# optimizer_adam.zero_grad()
#
# 2. Calculate the loss
# loss = criterion(outputs, targets)
#
# 3. Compute gradients w.r.t. parameters
# loss.backward()
#
# 4. Update the model parameters based on gradients
# optimizer_adam.step()Datasets and DataLoaders¶
torch.utils.data.Dataset and torch.utils.data.DataLoader provide convenient ways to handle data, batching, shuffling, and parallel loading.
Dataset: Stores the samples and their corresponding labels. You can create custom datasets or use built-in ones.TensorDatasetis useful when your data already exists as tensors.DataLoader: Wraps an iterable around theDatasetto enable easy access to batches of data.
from torch.utils.data import TensorDataset, DataLoader
# Example data (already tensors)
features = torch.randn(100, 10) # 100 samples, 10 features each
labels = torch.randn(100, 1) # 100 corresponding labels
# Create a TensorDataset
dataset = TensorDataset(features, labels)
# Access a single sample
first_sample_features, first_sample_label = dataset[0]
print(f"First sample features shape: {first_sample_features.shape}")
print(f"First sample label shape: {first_sample_label.shape}")
# Create a DataLoader
batch_size = 16
# shuffle=True is important for training to mix data between epochs
# num_workers > 0 can speed up loading using parallel processes (use cautiously in notebooks)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)
# Iterate over the DataLoader to get batches
print("\nIterating through DataLoader:")
for batch_idx, (batch_features, batch_labels) in enumerate(dataloader):
print(f"Batch {batch_idx+1}:")
print(f" Features shape: {batch_features.shape}") # Should be [batch_size, 10]
print(f" Labels shape: {batch_labels.shape}") # Should be [batch_size, 1]
# --- Inside the training loop, you'd process this batch ---
# model(batch_features) -> calculate loss -> loss.backward() -> optimizer.step()
if batch_idx >= 2: # Stop after showing a few batches
breakFirst sample features shape: torch.Size([10])
First sample label shape: torch.Size([1])
Iterating through DataLoader:
Batch 1:
Features shape: torch.Size([16, 10])
Labels shape: torch.Size([16, 1])
Batch 2:
Features shape: torch.Size([16, 10])
Labels shape: torch.Size([16, 1])
Batch 3:
Features shape: torch.Size([16, 10])
Labels shape: torch.Size([16, 1])
The Basic Training Loop Structure¶
Putting it all together, a typical PyTorch training loop looks like this:
# --- Setup (Assume model, criterion, optimizer, train_loader are defined) ---
# model = YourModel(...)
# criterion = YourLossFunction()
# optimizer = YourOptimizer(model.parameters(), lr=...)
# train_loader = DataLoader(your_train_dataset, ...)
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model.to(device) # Move model to the appropriate device
# Let's create dummy versions for illustration:
model = LinearRegression(5, 1).to(device) # Simple model: 5 features -> 1 output
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
dummy_features = torch.randn(100, 5)
dummy_labels = torch.randn(100, 1) * 3 + 2 # y = 3x + 2 + noise approx
dummy_dataset = TensorDataset(dummy_features, dummy_labels)
train_loader = DataLoader(dummy_dataset, batch_size=10, shuffle=True)
# --- Training Loop ---
num_epochs = 10
print("\nStarting Dummy Training Loop:")
model.train() # Set the model to training mode (important for layers like dropout, batchnorm)
for epoch in range(num_epochs):
running_loss = 0.0
for i, (inputs, targets) in enumerate(train_loader):
# Move data to the same device as the model
inputs = inputs.to(device)
targets = targets.to(device)
# 1. Zero the parameter gradients
optimizer.zero_grad()
# 2. Forward pass: compute predicted outputs
outputs = model(inputs)
# 3. Calculate the loss
loss = criterion(outputs, targets)
# 4. Backward pass: compute gradient of the loss w.r.t. parameters
loss.backward()
# 5. Perform a single optimization step (parameter update)
optimizer.step()
# Accumulate loss for reporting
running_loss += loss.item() * inputs.size(0) # loss.item() gets scalar loss
epoch_loss = running_loss / len(train_loader.dataset)
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}')
print('Finished Training')
# After training, you typically evaluate on a separate test set
# model.eval() # Set model to evaluation mode
# with torch.no_grad(): # Disable gradient calculation for evaluation
# # ... loop through test_loader and calculate metrics ...
Starting Dummy Training Loop:
Epoch [1/10], Loss: 14.0250
Epoch [2/10], Loss: 13.7025
Epoch [3/10], Loss: 13.4990
Epoch [4/10], Loss: 13.3793
Epoch [5/10], Loss: 13.1886
Epoch [6/10], Loss: 13.0974
Epoch [7/10], Loss: 12.9531
Epoch [8/10], Loss: 12.8884
Epoch [9/10], Loss: 12.7939
Epoch [10/10], Loss: 12.7382
Finished Training