PyTorch

—
What is PyTorch?
- — Framework for doing tensor computations
—
Components
- — Tensor procesors
- — Automatic differentiation engine
- — Deep learning utility functions
—
What is a tensor?
- — Numbers in dimensions higher than 2
  - — 0 dimension is a scaler. 1 dimension is a vector. 2 dimensino is a matrix.
—
How are tensor calculations made efficient?
—
Automatic differentian engine
- — autograd
- — Computation graph is derived from any calculation operation done to tensors
  - — Happens if any one of the terminal node in the graph has requires_grad set to True
- — grad(L, w) function shows the gradient
- — loss_tensor.backward() propagates the gradienets of loss backward
  - — w.grad shows you the gradient for that tensor
—
Interfaces
- — Tensor
  - — torch.tensor()
    - — constructor
  - — tensor.dtype
  - — tensor.shape
  - — tensor.reshape(newShape)
  - — tensor.view(shape)
    - — more common way to reshape.
  - — tensor.T
    - — transpose
  - — tensor1.matmul(tensor2) or tensor2 @ tensor2
- — Module
  - — torch.nn.Module
  - — A piece of the neural network
  - — You implement it and fill in the forward() method and __init__() method
  - — We don't typically implement the backward() method
  - — __init__()
    - — initialize all the other modules and other tensors
  - — forward()
    - — take in tensor(s), do the operations and return tensor(s)
  - — print(model) gives the summary
  - — calling .parameters give information about parametes.
    - — can call p.numel() and p.requiers_grad to understand pieces inside
  - — Input to output is done with calling hte model. y = model(X)
  - — model.train()
    - — puts in training mode
  - — model.eval()
    - — turns off gradient memory and also dropouts(?) and batch normalization?
- — Sequential
  - — Modules arranges in order to pipe through
- — Dataset
  - — __init__()
  - — __get_item__()
  - — __len__()
- — DataLoader
  - — get accessor by passing dataset and configuratoin options to DataLoder
  - — enumerage methods
  - — shuffle
  - — drop_last
    - — the last batch will have smaller number. That can be dropped because it is
  - — num_loaders
    - — 0 means all in one thread and there will be bottlenecks
—
Typical training loop
- — Fix dataset and data loader
- — for each epoch
  - — enumerate over data loader
    - — for each batch
      - — pass input to the model
      - — get the output
      - — get the loss tensor
      - — reset the gradients
        
        — optimizer.zero_grad()
      - — do backpropagation
        
        — loss.backward()
      - — use optimizer to change weights
        
        — optimizer.step()
    - — at certain intervals
      - — get training and validation metrics
        
        — add to array to graph
—
Saving and loading models
- — torch.save(model.state_dict(), path)
- — a model can be represented as a dictionary
  - — when inflating agian, you get the model with all the inner models and the weights
—
GPU
- — torch.cuda.is_available()
- — tensor.to('cuda:0')
- — torch.backends.mps.is_available()