「PyTorch」:4-Neural Network Design
PyTorch框架学习。
这篇文章主要介绍如何用PyTorch设计实现一个NN。
colab笔记:
Neural Network Design 1: The Layers
Neural Network Design 2: Callable Neural Networks
Neural Network Design 3: CNN Forward Method
Neural Network Design 4: Pass A Batch of Images
以CNN为例,讲解PyTorch中的layer、weight、
Overview:
- Build PyTorch CNN - Object Oriented Neural Networks
- CNN Layers - Deep Neural Network Architecture
- CNN Weights - Learnable Parameters in Neural Networks
- Callable Neural Networks - Linear Layers in Depth
- CNN Forward Method - Deep Learning Implementation
- Forward Propagation Explained - Pass Image to PyTorch Neural Network
- Neural Network Batch Processing - Pass Image Batch to PyTorch CNN
- CNN Output Size Formula - Bonus Neural Network Debugging Session
Building Neural Networks With PyTorch
From a high-level perspective or bird’s eye view of our deep learning project, we prepared our data, and now, we are ready to build our model.
【从高层次看,这一部分主要讲解如何用PyTorch设计model】
- Prepare the data
- Build the model
- Train the model
- Analyze the model’s results
We’ll do a quick OOP review in this post to cover the details needed for working with PyTorch neural networks, but if you find that you need more, the Python docs have an overview tutorial here.
【OOP的细节】
PyTorch’s torch.nn
Package
To build neural networks in PyTorch, we use the torch.nn
package, which is PyTorch’s neural network (nn) library. We typically import the package like so:
1 | import torch.nn as nn |
PyTorch’s neural network library contains all of the typical components needed to build neural networks.
【nn库包含所有构建NN的典型组件】
PyTorch’s nn.Module
Class
As we know, deep neural networks are built using multiple layers. This is what makes the network deep. Each layer in a neural network has two primary components:
【NN中的每一layer都由代码(input tensor到output tensor 的转换)和权重(weights)组成,因此可以用OOP的思想来抽象表示。】
- A transformation (code)
- A collection of weights (data)
In fact, this is the case with PyTorch. Within the nn
package, there is a class called Module
, and it is the base class for all of neural network modules which includes layers.
【nn库的Module类是所有NN模型中Layers的父类,即所有networks都要继承nn.Modules类】
PyTorch nn.Module
s Have A forward()
Method
When we pass a tensor to our network as input, the tensor flows forward though each layer transformation until the tensor reaches the output layer. This process of a tensor flowing forward though the network is known as a forward pass.
【forward pass:tensor向前流,直至输出层】
Every PyTorch nn.Module
has a forward()
method, and so when we are building layers and networks, we must provide an implementation of the forward()
method. The forward method is the actual transformation.
【所有layers 和 networks在继承nn.Module时,都要实现forward()接口,这个forward方法就是实际的输入到输出的转换】
PyTorch’s nn.functional
Package
When we implement the forward()
method of our nn.Module
subclass, we will typically use functions from the nn.functional
package. This package provides us with many neural network operations that we can use for building layers. In fact, many of the nn.Module
layer classes use nn.functional
functions to perform their operations.
【nn.functional包有很多实用的函数操作,可以帮助我们构建layers。事实上,nn.Module的许多子类就使用了nn.functional的方法来完成他们的操作。】
Building A Neural Network In PyTorch
We now have enough information to provide an outline for building neural networks in PyTorch. The steps are as follows:
Short version:
- Extend the
nn.Module
base class.【继承nn.Module类】 - Define layers as class attributes.【定义layers作为该类的属性】
- Implement the
forward()
method.【实现forward()接口】
More detailed version:
- Create a neural network class that extends the
nn.Module
base class. - In the class constructor, define the network’s layers as class attributes using pre-built layers from
torch.nn
. - Use the network’s layer attributes as well as operations from the
nn.functional
API to define the network’s forward pass.
Define The Network’s Layers As Class Attributes
We’re building a CNN, so the two types of layers we’ll use are linear layers and convolutional layers.
1 | class Network(nn.Module): |
Inside of our Network
class, we have five layers that are defined as attributes. We have two convolutional layers, self.conv1
and self.conv2
, and three linear layers, self.fc1
, self.fc2
, self.out
.
【在Network中,有5个layers作为该类的attributes】
We used the abbreviation fc
in fc1
and fc2
because linear layers are also called fully connected layers. They also have a third name that we may hear sometimes called dense. So linear, dense, and fully connected are all ways to refer to the same type of layer. PyTorch uses the word linear, hence the nn.Linear
class name.
We used the name out
for the last linear layer because the last layer in the network is the output layer.
【fc是fully connected layers的缩写,全连接层,nn.Linear】
【out是输出层。】
Our CNN Layers
Each of our layers extends PyTorch’s neural network Module
class. For each layer, there are two primary items encapsulated inside, a forward function definition and a weight tensor.
【每一layer都会继承PyTorch的Module类。对于每一layer,都会封装两个组件:forward函数和权重tensor】
The weight tensor inside each layer contains the weight values that are updated as the network learns during the training process.
【每一层的weitght tensor都包含在NN训练中更新的权重参数。】
PyTorch’s neural network Module
class keeps track of the weight tensors inside each layer. The code that does this tracking lives inside the nn.Module
class, and since we are extending the neural network module class, we inherit this functionality automatically.
【其中,权重tensor就是在训练NN中会更新的参数,Module类会自动跟踪其每一层的weight tensor】
CNN Layer Parameters
Parameter Vs Argument
We’ll parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.
【Parameters作为占位符用于函数定义,而Arguments是传递给函数的实际的值。】
Two Types Of Parameters
To better understand the argument values for these parameters, let’s consider two categories or types of parameters that we used when constructing our layers.
【在构建layers时,有两种参数】
- Hyperparameters【超参数】
- Data dependent hyperparameters【数据依赖超参数】
When we construct a layer, we pass values for each parameter to the layer’s constructor. With our convolutional layers have three parameters and the linear layers have two parameters.
【在构建layer时,我们向layer constructor传递参数。】
- Convolutional layers
in_channels
out_channels
kernel_size
- Linear layers
in_features
out_features
Hyperparameters
In general, hyperparameters are parameters whose values are chosen manually and arbitrarily.
【hyperparameters是手动主观确定的参数。】
As neural network programmers, we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past. For building our CNN layers, these are the parameters we choose manually.
【超参数往往是基于经验trial和误差error确定的.】
【在CNN中,我们需要确定这些参数。】
Parameter | Description |
---|---|
kernel_size |
Sets the filter size. The words kernel and filter are interchangeable. |
out_channels |
Sets the number of filters. One filter produces one output channel. |
out_features |
Sets the size of the output tensor. |
Data Dependent Hyperparameters
Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the in_channels
of the first convolutional layer, and the out_features
of the output layer.
【依赖于数据的超参数。比如在第一个卷积层的in_channles和输出层的out_features参数的确定都依赖于数据。】
In general, the input to one layer is the output from the previous layer, and so all of the in_channels
in the conv layers and in_features
in the linear layers depend on the data coming from the previous layer.
【一个layer的输入依赖于前一层的输出。】
When we switch from a conv layer to a linear layer, we have to flatten our tensor. This is why we have 12*4*4
.
Summary Of Layer Parameters
1 | self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5) |
Layer | Param name | Param value | The param value is |
---|---|---|---|
conv1 | in_channels | 1 | the number of color channels in the input image. |
conv1 | kernel_size | 5 | a hyperparameter. |
conv1 | out_channels | 6 | a hyperparameter. |
conv2 | in_channels | 6 | the number of out_channels in previous layer. |
conv2 | kernel_size | 5 | a hyperparameter. |
conv2 | out_channels | 12 | a hyperparameter (higher than previous conv layer). |
fc1 | in_features | 12 * 4 * 4 | the length of the flattened output from previous layer. |
fc1 | out_features | 120 | a hyperparameter. |
fc2 | in_features | 120 | the number of out_features of previous layer. |
fc2 | out_features | 60 | a hyperparameter (lower than previous linear layer). |
out | in_features | 60 | the number of out_channels in previous layer. |
out | out_features | 10 | the number of prediction classes. |
CNN Weights - Learnable Parameters In Neural Networks
Colab: Neural Network Design: The Layers
Learnable Parameters
Learnable parameters are parameters whose values are learned during the training process.
【Learnable parameters是在训练中可以学习的参数。】
With learnable parameters, we typically start out with a set of arbitrary values, and these values then get updated in an iterative fashion as the network learns.
【从一个主观确定的值开始,在网络学习中迭代更新。】
Where are the learnable parameters?
We’ll the learnable parameters are the weights inside our network, and they live inside each layer.
【Learnable parameters存在在网络中的每一层,是在我们网络中的权重参数。】
Getting An Instance The Network
Let’s grab an instance of our network class and see this.
1 | network = Network() |
After the object is initialized, we can then access our object using the network variable.
【获得一个网络的实例,即会自动运行__init__
对其初始化。】
How Overriding Works
All Python classes automatically extend the object class. If we want to provide a custom string representation for our object, we can do it, but we need to introduce another object oriented concept called overriding.
【所有Python的类都会继承oobject class,可以重写该类的字符表达(string representation)。】
We can override Python’s default string representation using the __repr__
function. This name is short for representation.
【重写__repr__
函数】
1 | network = Network() |
What’s In The String Representation?
Convolutional Layers
For the convolutional layers, the kernel_size argument is a Python tuple (5,5)
even though we only passed the number 5
in the constructor.
【kernel_size的值,确定filter的大小。当传递一个值时,默认为square filter。】
The stride is an additional parameter that we could have set, but we left it out. When the stride is not specified in the layer constructor the layer automatically sets it.
【kernel移动的stride如果没有设置会自动设置。】
Linear Layers
For the linear layers, we have an additional parameter called bias which has a default parameter value of true. It is possible to turn this off by setting it to false.
【linear layers还有一个自动设置为True的bias参数。】
Accessing The Network’s Layers
Well, now that we’ve got an instance of our network and we’ve reviewed our layers, let’s see how we can access them in code.
【如何访问NN中的layers:当一般属性访问,每一层都会返回一个字符表达】
1 | > network.conv1 |
Accessing The Layer Weights
Now that we have access to each of our layers, we can access the weights inside each layer.
【访问每一层的权重参数。】
Colab: Neural Network Design: The Layers
1 | > network.conv1.weight |
PyTorch Parameter Class
PyTorch has a special class called Parameter
. The Parameter
class extends the tensor class, and so the weight tensor inside every layer is an instance of this Parameter
class.
【PyTorch还有一个特殊的类:Parameter类。这个类继承了tensor类,所以每一层中的weight tensor实则都是Parameter类的实例。】
Weight Tensor Shape
For the convolutional layers, the weight values live inside the filters, and in code, the filters are actually the weight tensors themselves.
【对卷积层来说,weight是在filter中的,而filter在代码中的体现就是weight tensor。】
The convolution operation inside a layer is an operation between the input channels to the layer and the filter inside the layer. This means that what we really have is an operation between two tensors.
【卷积操作实则就是input tensor 和filter 的weight tensor之间的操作。】
For the first conv layer, we have 1
color channel that should be convolved by 6
filters of size 5x5
to produce 6
output channels. This is how we interpret the values inside our layer constructor.
【对于第一个卷积层来说,有6个 5 * 5的filer,会生成6个输出channel。】
1 | > network.conv1 |
Inside our layer though, we don’t explicitly have 6
weight tensors for each of the 6
filters. We actually represent all 6
filters using a single weight tensor whose shape reflects or accounts for the 6
filters.
【但我们不会使用6个weight tensor来表示该卷积层的6个filters,而是使用一个单独的weight tensor来表示该层的所有filers。】
1 | > network.conv1.weight.shape |
The second axis has a length of 1
which accounts for the single input channel, and the last two axes account for the height and width of the filter.
【weight tensor的第一维度表示filters的数量,该tensor把所有filters都打包在一起。第二维度认为是filter的depth,和输入tensor的channel相同,最后两维为height 和width】
The two main takeaways about these convolutional layers is that our filters are represented using a single tensor and that each filter inside the tensor also has a depth that accounts for the input channels that are being convolved.
【卷积层的两个要点(takeways)】
- All filters are represented using a single tensor.【用一个tensor表示该层的所有filters】
- Filters have depth that accounts for the input channels.【其中每一个filter有depth,值等于输入的channels】
卷积层Weight Tensor的shape:(Number of filters, Depth, Height, Width)
Weight Matrix
With linear layers or fully connected layers, we have flattened rank-1 tensors as input and as output. The way we transform the in_features to the out_features in a linear layer is by using a rank-2 tensor that is commonly called a weight matrix.
【对于全连接层,我们需要拉直(flatten)输入/输出为rank-1的tensor】
【这种在全连接层中in_features到out_features的转换,使用weight matrix来实现,所以该层的参数就是一个rank-2的tensor】
Linear Function Represented Using A Matrix
The important thing about matrix multiplications like this is that they represent linear functions that we can use to build up our neural network.
Specifically, the weight matrix is a linear function also called a linear map that maps a vector space of 4
dimensions to a vector space of 3
dimensions.
【矩阵乘法实则是线性函数。具体来说,矩阵乘法也称为一个线性映射,将一个4D vector映射为一个3D vector。】
Accessing The Networks Parameters
【访问NN的所有参数】
The first example is the most common way, and we’ll use this to iterate over our weights when we update them during the training process.
【遍历network.parameters()
】
1 | for param in network.parameters(): |
The second way is just to show how we can see the name as well. This reveals something that we won’t cover in detail, the bias is also a learnable parameter. Each layer has a bias by default, so for each layer we have a weight tensor and a bias tensor.
【每一层中的bias也是一个可学习的参数。】
【遍历network.named_parameters()
】
1 | for name, param in network.named_parameters(): |
Callable Neural Networks - Linear Layers In Depth
Colab: Neural Network Design 2: Callable Neural Networks
In this one, we’ll learn about how PyTorch neural network modules are callable, what this means, and how it informs us about how our network and layer forward methods are called.
【在这一节中,我们能知道在network和layer中forward方法是如何调用的?】
How Linear Layers Work
Transform Using A Matrix
【使用矩阵乘法来转换】
1 | in_features = torch.tensor([1,2,3,4], dtype=torch.float32) |
Transform Using A PyTorch Linear Layer
【使用PyTorch Linear Layer来转换。】
1 | fc = nn.Linear(in_features=4, out_features=3, bias=False) |
【根据源码,在LinearLayer中会有一个3 * 4 的weight matrix】
1 | # torch/nn/modules/linear.py (version 1.0.1) |
Let’s see how we can call our layer now by passing the in_features
tensor.
【直接传tensor来调用该layer】
1 | > fc(in_features) |
We can call the object instance like this because PyTorch neural network modules are callable Python objects.
【因为PyTorch中的module是可以调用的类,即类中有__call__
方法。
Let’s explicitly set the weight matrix of the linear layer to be the same as the one we used in our other example.
【可以单独设置linear layer中weight matrix的值】
1 | fc.weight = nn.Parameter(weight_matrix) |
Callable Layers And Neural Networks
【可调用的Layers和NN】
We pointed out before how it was kind of strange that we called the layer object instance as if it were a function.
【为什么可以将实例作为函数调用?】
1 | > fc(in_features) |
What makes this possible is that PyTorch module classes implement another special Python function called __call__()
. If a class implements the __call__()
method, the special call method will be invoked anytime the object instance is called.
【如果该类实现了__call()__
方法,那么该类的实例就可以作为函数调用】
This fact is an important PyTorch concept because of the way the __call__()
method interacts with the forward()
method for our layers and networks.
【而PyTorch中该类的__call__
方法是和forward()
方法交互的】
1 | def __call__(self, *input, **kwargs): |
The extra code that PyTorch runs inside the __call__()
method is why we never invoke the forward()
method directly. If we did, the additional PyTorch code would not be executed. As a result, any time we want to invoke our forward()
method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.
【因为有__call__()
,所以不需要直接调用forward()
方法。所以,如果任何时候我们想要调用forward()
方法时,我们都调用对象实例。】
CNN Forward Method - PyTorch Deep Learning Implementation
Colab: CNN Forward Method
We created our network by extending the nn.Module
PyTorch base class, and then, in the class constructor, we defined the network’s layers as class attributes. Now, we need to implement our network’s forward()
method, and then, finally, we’ll be ready to train our model.
【前面通过继承nn.Module
来构建model,在model的构造器中,定义网络的layer作为model的属性。而构建model的最后一步是实现model中的forward()
方法】
【步骤】
Prepare the data
Build the model
Create a neural network class that extends the
nn.Module
base class.In the class constructor, define the network’s layers as class attributes.
Use the network’s layer attributes as well
nn.functional
API operations to define the network’s forward pass.【用网络的layer属性和nn.functional 库的激活函数等来定义网络的前向传播】
Train the model
Analyze the model’s results
Implementing The forward()
Method
Colab: Neural Network Design 3: CNN Forward Method
【实现forward()方法】
Input Layer #1
The input layer of any neural network is determined by the input data.
【input layer 依赖于输入的数据】
For this reason, we can think of the input layer as the identity transformation. Mathematically, this is the function, $f(x)=x$ .
【可以把input layer看作identification transformation】
Hidden Convolutional Layers: Layers #2 And #3
1 | # (2) hidden conv layer |
Each of these layers is comprised of a collection of weights (data) and a collection operations (code). The weights are encapsulated inside the nn.Conv2d()
class instance. The relu()
and the max_pool2d()
calls are just pure operations.
【每一层都是weights和operations的组合,weights封装在nn.Conv2d实例中,而relu()和max_pool2d()都是单纯的operations】
For example, we’ll say that the second layer in our network is a convolutional layer that contains a collection of weights, and preforms three operations, a convolution operation, the relu activation operation, and the max pooling operation.
【只是其中的一种表示:认为卷积层有一组weights(layer中包含的weights),三组操作:卷积操作、relu操作和max pooling 操作。】
Mathematically, the entire network is just a composition of functions, and a composition of functions is a function itself. So a network is just a function. All the terms like layers, activation functions, and weights, are just used to help describe the different parts.
【整个网络,其实就是functions的组合。因此,network本身就是一个function。layers, activation functions, weights只是来帮助描述这个function】
Hidden Linear Layers: Layers #4 And #5
Before we pass our input to the first hidden linear layer, we must reshape()
or flatten our tensor. This will be the case any time we are passing output from a convolutional layer as input to a linear layer.
【在将卷积层的输出传递给全连接层之前,需要将他flatten】
1 | # (4) hidden linear layer |
Output Layer #6
The sixth and last layer of our network is a linear layer we call the output layer. When we pass our tensor to the output layer, the result will be the prediction tensor.
【第六层是输出层,该层的输出是一个有10个元素的tensor】
Inside the network we usually use relu()
as our non-linear activation function, but for the output layer, whenever we have a single category that we are trying to predict, we use softmax()
. The softmax function returns a positive probability for each of the prediction classes, and the probabilities sum to 1
.
【前面的层我们都是用relu()来作为非线性函数,但输出层需要得到每一类的预测值,因此使用softmax()】
【softmax能返回每一类的预测概率,所有类的概率和为1】
1 | def forward(self, t): |
Forward Propagation Explained
Forward Propagation Explained
Forward propagation is the process of transforming an input tensor to an output tensor.
【前馈传播是将输入tensor转换为输出tensor的过程。】
Predicting With The Network: Forward Pass
Before we being, we are going to turn off PyTorch’s gradient calculation feature. This will stop PyTorch from automatically building a computation graph as our tensor flows through the network.
【在开始之前,我们需要关闭PyTorch的gradient计算。当tensor流过网络图时,这会阻止PyTorch自动构建计算图。】
The computation graph keeps track of the network’s mapping by tracking each computation that happens. The graph is used during the training process to calculate the derivative (gradient) of the loss function with respect to the network’s weights.
【计算图通过跟踪计算来跟踪网络图,该计算图在训练过程中用于计算损失函数对权重参数的梯度。】
Since we are not training the network yet, we aren’t planning on updating the weights, and so we don’t require gradient calculations. We will turn this back on when training begins.
【因为我们还没有训练网络,所以我们并不打算更新参数,也就不需要梯度计算。】
Passing A Single Image To The Network
Let’s continue by creating an instance of our Network class:
【创建NN实例】
1 | > network = Network() |
Next, we’ll procure a single sample from our training set, unpack the image and the label, and verify the image’s shape:
【从training set中生成一个单独的例子。】
1 | > sample = next(iter(train_set)) |
Now, there’s a second step we must preform before simply passing this tensor to our network. When we pass a tensor to our network, the network is expecting a batch, so even if we want to pass a single image, we still need a batch.
【第二步,网络期望传递的tensor是一批,因此需要将单独的例子也打包。】
1 | > pred = network(image.unsqueeze(0)) # image shape needs to be (batch_size × in_channels × H × W) |
For each input in the batch, and for each prediction class, we have a prediction value. If we wanted these values to be probabilities, we could just the softmax()
function from the nn.functional
package.
【用F.softmax()将预测值转换为概率。】
1 | > F.softmax(pred, dim=1) |
Neural Network Batch Processing With PyTorch
- Prepare the data
- Build the model
- Understand how batches are passed to the network
- Train the model
- Analyze the model’s results
Colab: Pass A Batch of Images
Using Argmax: Prediction Vs Label
Colab: Pass A Batch of Images
CNN Output Size Formula
CNN Output Size Formula
CNN Output Size Formula (Square)
- Suppose we have an $n\times n$ input.【输入尺寸】
- Suppose we have an $f\times f$ filter.【filter尺寸】
- Suppose we have a padding of $p$ and a stride of $s$ .【padding和stride】
The output size $O$ is given by this formula: $O = \frac{n-f+2p}{s}+1$ 【输出】
CNN Output Size Formula (Non-Square)
- Suppose we have an $n_h×n_w$ input.
- Suppose we have an $f_h×f_w$ filter.
- Suppose we have a padding of $p$ and a stride of $s$.
The height of the output size $O_h$ is given by this formula:$O_h = \frac{n_h-f_h+2p}{s}+1$
The width of the output size $O_w$ is given by this formula: $O_w = \frac{n_w-f_w+2p}{s}+1$
「PyTorch」:4-Neural Network Design