「PyTorch」:2-Tensors Explained And Operations

「PyTorch」:2-Tensors Explained And Operations


本篇文章主要介绍PyTorch中的Tensor及其基本操作,主要分为四个方面:Reshape, Element-wise, Reduction和Access。


PyTorch Tensors Explained

Tensor Operations: Reshape

Tensor Operations: Element-wise

Tensor Operation: Reduction and Access


Introducing Tensors

Tensor Explained - Data Structures of Deep Learning

What Is A Tensor?

A tensor is the primary data structure used by neural networks.


Indexes Required To Access An Element

The relationship within each of these pairs is that both elements require the same number of indexes to refer to a specific element within the data structure.



Indexes required Computer science Mathematics
0 number scalar
1 array vector
2 2d-array matrix

Tensors Are Generalizations

When more than two indexes are required to access a specific element, we stop giving specific names to the structures, and we begin using more general language.


In mathematics, we stop using words like scalar, vector, and matrix, and we start using the word tensor or nd-tensor. The n tells us the number of indexes required to access a specific element within the structure.


Computer Science

In computer science, we stop using words like, number, array, 2d-array, and start using the word multidimensional array or nd-array. The n tells us the number of indexes required to access a specific element within the structure.


Indexes required Computer science Mathematics
n nd-array nd-tensor

Tensors and nd-arrays are the same thing!

One thing to note about the dimension of a tensor is that it differs from what we mean when we refer to the dimension of a vector in a vector space. The dimension of a tensor does not tell us how many components exist within the tensor.


Rank, Axes, And Shape Explained

【下文会详细解释深度学习tensor的几个重要性质:Rank, Axes, Shape.】

The concepts of rank, axes, and shape are the tensor attributes that will concern us most in deep learning.

  • Rank
  • Axes
  • Shape

Rank And Indexes

We are introducing the word rank here because it is commonly used in deep learning when referring to the number of dimensions present within a given tensor.

The rank of a tensor tells us how many indexes are required to access (refer to) a specific data element contained within the tensor data structure.

A tensor’s rank tells us how many indexes are needed to refer to a specific element within the tensor.



Axes Of A Tensor

If we have a tensor, and we want to refer to a specific dimension, we use the word axis in deep learning.

An axis of a tensor is a specific dimension of a tensor.

Elements are said to exist or run along an axis. This running is constrained by the length of each axis. Let’s look at the length of an axis now.

Length Of An Axis

The length of each axis tells us how many indexes are available along each axis.




Shape Of A Tensor

The shape of a tensor is determined by the length of each axis, so if we know the shape of a given tensor, then we know the length of each axis, and this tells us how many indexes are available along each axis.

The shape of a tensor gives us the length of each axis of the tensor.


Additionally, one of the types of operations we must perform frequently when we are programming our neural networks is called reshaping.

Reshaping changes the shape but not the underlying data elements.


CNN Tensors Shape Explained

CNN的相关介绍,可见 这篇文章

What I want to do now is put the concepts of rank, axes, and shape to use with a practical example. To do this, we’ll consider an image input as a tensor to a CNN.

Remember that the shape of a tensor encodes all the relevant information about a tensor’s axes, rank, and indexes, so we’ll consider the shape in our example, and this will enable us to work out the other values.


【以CNN为例来说明rank, axes, shape.】

Shape Of A CNN Input

The shape of a CNN input typically has a length of four. This means that we have a rank-4 tensor with four axes. Each index in the tensor’s shape represents a specific axis, and the value at each index gives us the length of the corresponding axis.

【CNN的input 是一个rank4-tensor.】

Each axis of a tensor usually represents some type of real world or logical feature of the input data. If we understand each of these features and their axis location within the tensor, then we can have a pretty good understanding of the tensor data structure overall.


Image Height And Width

To represent two dimensions, we need two axes.

The image height and width are represented on the last two axes.


Image Color Channels

The next axis represents the color channels. Typical values here are 3 for RGB images or 1 if we are working with grayscale images. This color channel interpretation only applies to the input tensor.

【下一个axis(从右至左)表示图像的color channels(颜色通道,如灰度图像就有1个颜色通道,RGB图像有三个)。】

【注意:color channel的说法只适用于input tensor。】

Image Batches

This brings us to the first axis of the four which represents the batch size. In neural networks, we usually work with batches of samples opposed to single samples, so the length of this axis tells us how many samples are in our batch.

Suppose we have the following shape [3, 1, 28, 28] for a given tensor. Using the shape, we can determine that we have a batch of three images.


tensor:[Batch, Channels, Height, Width]

Each image has a single color channel, and the image height and width are 28 x 28 respectively.

  1. Batch size
  2. Color channels
  3. Height
  4. Width


It’s common when reading API documentation and academic papers to see the B replaced by an N. The N standing for number of samples in a batch.

【在API文档或学术论文中,N经常会代替代替B,表示the number of samples in a batch。】

Furthermore, another difference we often encounter in the wild is a reordering of the dimensions. Common orderings are as follows:

  • NCHW
  • NHWC
  • CHWN


As we have seen, PyTorch uses NCHW, and it is the case that TensorFlow and Keras use NHWC by default (it can be configured). Ultimately, the choice of which one to use depends mainly on performance. Some libraries and algorithms are more suited to one or the other of these orderings.

【PyTorch 默认使用NCHW,而TensorFlow和Keras使用NHWC】

Output Channels And Feature Maps

Let’s look at how the interpretation of the color channel axis changes after the tensor is transformed by a convolutional layer.

Suppose we have three convolutional filters, and lets just see what happens to the channel axis.

Since we have three convolutional filters, we will have three channel outputs from the convolutional layer. These channels are outputs from the convolutional layer, hence the name output channels opposed to color channels.

【tensor送入convolutional layer(卷积层)后,color channel 这一axis的长度发生变化。

【在中解释到,有几个convolutional filters,卷积层输出的tensor就有几个channel(channel代替color channel的表达)。】

Feature Maps

With the output channels, we no longer have color channels, but modified channels that we call feature maps. These so-called feature maps are the outputs of the convolutions that take place using the input color channels and the convolutional filters.

Feature maps are the output channels created from the convolutions.

【卷积层输出tensor的channel维度代替color channels的叫法。】

【卷积层的输出也叫叫feature maps】

PyTorch Tensors

When programming neural networks, data preprocessing is often one of the first steps in the overall process, and one goal of data preprocessing is to transform the raw input data into tensor form.

【数据预处理往往是编写NN的第一步,将原始数据转换为tensor form。】

Tensor的基本操作见Colab运行笔记链接:PyTorch Tensors Explained

(不会用的也可以直接看github 上的)

PyTorch Tensors Attributes

  • torch.dtype:tensor包含数据类型。


    Data type dtype CPU tensor GPU tensor
    32-bit floating point torch.float32 torch.FloatTensor torch.cuda.FloatTensor
    64-bit floating point torch.float64 torch.DoubleTensor torch.cuda.DoubleTensor
    16-bit floating point torch.float16 torch.HalfTensor torch.cuda.HalfTensor
    8-bit integer (unsigned) torch.uint8 torch.ByteTensor torch.cuda.ByteTensor
    8-bit integer (signed) torch.int8 torch.CharTensor torch.cuda.CharTensor
    16-bit integer (signed) torch.int16 torch.ShortTensor torch.cuda.ShortTensor
    32-bit integer (signed) torch.int32 torch.IntTensor torch.cuda.IntTensor
    64-bit integer (signed) torch.int64 torch.LongTensor torch.cuda.LongTensor
  • torch.device: tensor数据所分配的设备,如CPU,cuda:0

  • torch.layout: tensor在内存中的存储方式。

As neural network programmers, we need to be aware of the following:

  1. Tensors contain data of a uniform type (dtype).
  2. Tensor computations between tensors depend on the dtype and the device.



Creating Tensors

These are the primary ways of creating tensor objects (instances of the torch.Tensor class), with data (array-like) in PyTorch:

Creating Tensors with data.


  1. torch.Tensor(data)
  2. torch.tensor(data)
  3. torch.as_tensor(data)
  4. torch.from_numpy(data)

torch.Tensor() Vs torch.tensor()

The first option with the uppercase T is the constructor of the torch.Tensor class, and the second option is what we call a factory function that constructs torch.Tensor objects and returns them to the caller.

However, the factory function torch.tensor() has better documentation and more configuration options, so it gets the winning spot at the moment.

torch.Tensor(data)torch.Tensor class的Constructor,而torch.tensor(data) 是生成/返回 torch.Tensor class的函数(factory functions)】

【因为torch.tensor() 有更多的选项设置,比如可以设置数据类型,所以一般用torch.tensor() 来生成。】

Default dtype Vs Inferred dtype

The difference here arises in the fact that the torch.Tensor() constructor uses the default dtype when building the tensor. The other calls choose a dtype based on the incoming data. This is called type inference. The dtype is inferred based on the incoming data.

torch.Tensor() 在生成tensor时,使用的是默认dtype=torch.float32 ,而其他三种是使用的引用dtype ,即生成tensor的数据类型和输入的数据类型一致。】

Sharing Memory For Performance: Copy Vs Share

torch.Tensor() and torch.tensor() copy their input data while torch.as_tensor() and torch.from_numpy() share their input data in memory with the original input object.

This sharing just means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the torch.Tensor and the numpy.ndarray.

Sharing data is more efficient and uses less memory than copying data because the data is not written to two locations in memory.

torch.Tensor()torch.tensor() 在根据data创建tensor时,在内存中额外复制数据】

torch.as_tensor()torch.from_numpy() 在根据data创建tensor时,是和原输入数据共享的内存,即原numpy.ndarry的数据改变,相应的tensor也会改变。】

Share Data Copy Data
torch.as_tensor() torch.tensor()
torch.from_numpy() torch.Tensor()

Some things to keep in mind about memory sharing (it works where it can):

  1. Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.

    【在使用GPU时, as_tensor() 也会将ndarray数据从CPU复制到GPU上。】

  2. The memory sharing of as_tensor() doesn’t work with built-in Python data structures like lists.

    as_tensor() 在Python内置数据结构时不会共享内存】

  3. The as_tensor() performance improvement will be greater if there are a lot of back and forth operations between numpy.ndarray objects and tensor objects.

    as_tensor() 在ndarry和tensor之间大量连续操作时能有效提高性能】

torch.as_tensor() Vs torch.from_numpy()

This establishes that torch.as_tensor() and torch.from_numpy() both share memory with their input data. However, which one should we use, and how are they different?

The torch.from_numpy() function only accepts numpy.ndarrays, while the torch.as_tensor() function accepts a wide variety of array-like objects, including other PyTorch tensors.

【这两个都是和输入数据共享内存,但 torch.from_numpy() 只能接受numpy.ndarrays 类型的数据,而torch.as_tensor() 能接受array-like(像list, tuple)等类型,所以一般torch.as_tensor() 更常用。】

If we have a torch.Tensor and we want to convert it to a numpy.ndarray

【用torch.numpy() 把tensor转换为ndarray】

Creating Tensors without data.


  1. torch.eyes(n) : 创建2-D tensor,即n*n的单位向量。
  2. torch.zeros(shape) : 创建shape=shape的全0tensor。
  3. torch.ones(shape) : 创建全1tensor。
  4. torch.rand(shape) : 创建随机值tensor。

Tensor Operation

关于Tensor 操作的Colab运行笔记。对照使用最佳。如果打不开也可以看github

Tensor Operations: Reshape

Tensor Operations: Element-wise

Tensor Operation: Reduction and Access

We have the following high-level categories of operations:

  1. Reshaping operations
  2. Element-wise operations
  3. Reduction operations
  4. Access operations

【对tensor的操作主要分为4种:reshape, element-wise, reduction, access】


As neural network programmers, we have to do the same with our tensors, and usually shaping and reshaping our tensors is a frequent task.


(具体操作见colab运行笔记本:Tensor Operations: Reshape

import torch
t = torch.tensor([
], dtype=torch.float32)

Reshaping changes the tensor’s shape but not the underlying data. Our tensor has 12 elements, so any reshaping must account for exactly 12 elements.


In PyTorch, the -1 tells the reshape() function to figure out what the value should be based on the number of elements contained within the tensor.


Squeezing And Unsqueezing

  • Squeezing a tensor removes the dimensions or axes that have a length of one.


  • Unsqueezing a tensor adds a dimension with a length of one.


(具体操作见colab运行笔记本:Tensor Operations: Reshape


Concatenation Tensors

We combine tensors using the cat() function, and the resulting tensor will have a shape that depends on the shape of the two input tensors.

(具体操作见colab运行笔记本:Tensor Operations: Reshape

torch.cat((t1,t2,t3), dim=0)
torch.cat((t1,t2,t3), dim=1)



A tensor flatten operation is a common operation inside convolutional neural networks. This is because convolutional layer outputs that are passed to fully connected layers must be flatted out before the fully connected layer will accept the input.


对于MNIST数据集中18*18的手写数字,在前文说到CNN的输入是[Batch Size, Channels, Height, Width] ,怎么才能flatten tensor的部分axis,而不是全部维度。


从dim1维度开始flatten(具体操作见colab运行笔记本:Tensor Operations: Reshape

t.flatten(start_dim=1, end_dim=-1)

Broadcasting and Element-Wise

An element-wise operation operates on corresponding elements between tensors.



Broadcasting describes how tensors with different shapes are treated during element-wise operations.

Broadcasting is the concept whose implementation allows us to add scalars to higher dimensional tensors.



Let’s think about the t1 + 2 operation. Here, the scaler valued tensor is being broadcasted to the shape of t1, and then, the element-wise operation is carried out.

【在t1+2时,scalar 2实际是先被broadcast到和t1相同的shape, 再执行element-wise操作】

We have two tensors with different shapes. The goal of broadcasting is to make the tensors have the same shape so we can perform element-wise operations on them.

(具体操作见colab运行笔记本:Tensor Operations: Element-wise

Broadcasting Details

(具体操作见colab运行笔记本:Tensor Operations: Element-wise

  • Same Shapes: 直接操作

  • Same Rank, Different Shape:

    1. Determine if tensors are compatible(兼容).

      【两个tensor兼容,才可以对tensor broadcast,再执行element-wise操作】

      We compare the shapes of the two tensors, starting at their last dimensions and working backwards. Our goal is to determine whether each dimension between the two tensors’ shapes is compatible.



      The dimensions are compatible when either:

      • They’re equal to each other.
      • One of them is 1.
    2. Determine the shape of the resulting tensor.


  • Different Ranks:

    1. Determine if tensors are compatible.(同上)

      When we’re in a situation where the ranks of the two tensors aren’t the same, like what we have here, then we simply substitute a one in for the missing dimensions of the lower-ranked tensor.

      【对低维度的tensor的缺失维度,用1来代替,比如shape为(1,3) 和 (),低维度的shape变为(1,1)】

    2. Determine the shape of the resulting tensor.

ArgMax and Reduction

A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor.

【reduction 操作是能减少tensor元素数量的操作。】

Reshaping operations gave us the ability to position our elements along particular axes. Element-wise operations allow us to perform operations on elements between two tensors, and reduction operations allow us to perform operations on elements within a single tensor.

【Reshape操作让我们能沿着某一axis操纵tensor 中的元素位置;Element-wise操作让我们能对tensors之间对应元素进行操作;Reduction操作能让我们对单个tensor间的元素操作。】

(具体操作见colab笔记本:Tensor Operation: Reduction and Access )


Reducing Tensors By Axes


(具体操作见colab笔记本:Tensor Operation: Reduction and Access )



Argmax returns the index location of the maximum value inside a tensor.


(具体操作见colab笔记本:Tensor Operation: Reduction and Access )


Aceessing Elements Inside Tensors

The last type of common operation that we need for tensors is the ability to access data from within the tensor.


(具体操作见colab笔记本:Tensor Operation: Reduction and Access )


Advanced Indexing And Slicing

PyTorch Tensor支持大多数NumPy的index和slicing操作。



  1. 挖坑:advanced indexing and slicing: https://numpy.org/doc/stable/reference/arrays.indexing.html


Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now