广播

2025年5月11日

14:25

★Pytorch中的广播：

In short, if a PyTorch operation supports broadcast, then its Tensor arguments can be automatically expanded to be of equal sizes (without making copies of the data).

Two tensors are “broadcastable” if the following rules hold:

Each tensor has at least one dimension.
When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

也就是说，当两个tensor的shape不一样时，进行operation时可以将两个tensor的shape变成一致的，然后做element wise的operation。需要注意的是，在广播时，pytorch将两个tensor右对齐，也就是从最后一个维度开始对齐，按照从右往左的顺序。

所以，需要对tensor的shape引起足够的关注。

在Bigram算法中，C是一个27*27的矩阵，C[i,j]表示第一个字母是i时，第二个字母是j的计数。当对C的每一行归一化时，需要统计每一行count的Sum值，然后用C[i,j]除以Sum值，得到当第一个字母是i时，第二个字母是j的概率，即

P =C/ C.sum(axis=1,keep_dim=True)，此时如果不加keep_dim=True, C.sum(axis=1）shape为27, 当C/C.sum(axis=1）时，由于二者的shape不一样，会进行广播，而广播是右对齐，C.sum(axis=1）的shape会首先变为(1,27)，然后把这个1行27列的tensor复制27次，变成(27*27)，最后element wise相除。此时虽然代码能跑通，但是计算得到的P每一行的概率之和并不是1。

假设C是一个26*27的矩阵，那么如果不加keep_dim=True, P =C/ C.sum(axis=1)压根不会跑通，会报错。

★对Tensor作inplace操作可以提高效率，如W/=5，而不是W=W/5

矩阵相乘就是并行地计算向量点积

torch.tensor和torch.Tensor的区别：

在 PyTorch 中，torch.Tensor 和 torch.tensor 都用于生成新的张量，但它们有一些关键的区别。

torch.Tensor它是默认张量类型 torch.FloatTensor 的别名。使用 torch.Tensor 创建的张量默认是单精度浮点类型。

torch.tensor 是一个 Python 函数，其原型为：

torch.tensor(data, dtype=None, device=None, requires_grad=False)

其中，data 可以是列表、元组、数组、标量等类型。torch.tensor 会从 data 中的数据部分做拷贝（而不是直接引用），并根据原始数据类型生成相应的 torch.LongTensor、torch.FloatTensor 或 torch.DoubleTensor

W.grad=None等价于W.grad=0，且更高效。