5. Build the Neural Network

신경망은 데이터에 대한 작업을 수행하는 레이어/모듈로 구성됩니다. torch.nn 네임스페이스는 자체 신경망을 구축하는 데 필요한 모든 빌딩 블록을 제공합니다. PyTorch의 모든 모듈은 nn.Module의 하위 클래스입니다. 신경망은 다른 모듈(레이어)로 구성된 모듈 자체입니다. 이 중첩 구조를 통해 복잡한 아키텍처를 쉽게 구축하고 관리할 수 있습니다.

Neural networks comprise of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks you need to build your own neural network. Every module in PyTorch subclasses the nn.Module. A neural network is a module itself that consists of other modules (layers). This nested structure allows for building and managing complex architectures easily.

다음 섹션에서는 FashionMNIST 데이터 세트에서 이미지를 분류하기 위해 신경망을 구축합니다.

In the following sections, we’ll build a neural network to classify images in the FashionMNIST dataset.

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Get Device for Training

가능한 경우 GPU와 같은 하드웨어 가속기에서 모델을 학습할 수 있기를 원합니다. torch.cuda를 사용할 수 있는지 확인하고 그렇지 않으면 CPU를 계속 사용합니다.

We want to be able to train our model on a hardware accelerator like the GPU, if it is available. Let’s check to see if torch.cuda is available, else we continue to use the CPU.

CodeOut

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cuda device

Define the Class

nn.Module을 하위 클래스로 분류하여 신경망을 정의하고 __init__에서 신경망 레이어를 초기화합니다. 모든 nn.Module 하위 클래스는 전달 메서드의 입력 데이터에 대한 작업을 구현합니다.

We define our neural network by subclassing nn.Module, and initialize the neural network layers in __init__. Every nn.Module subclass implements the operations on input data in the forward method.

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

우리는 NeuralNetwork의 인스턴스를 생성하고 device로 이동하고 그 구조를 출력합니다.

We create an instance of NeuralNetwork, and move it to the device, and print its structure.

CodeOut

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

모델을 사용하기 위해 입력 데이터를 전달합니다. 그러면 일부 백그라운드 작업과 함께 모델의 forward가 실행됩니다. model.forward()를 직접 호출하지 마세요!

To use the model, we pass it the input data. This executes the model’s forward, along with some background operations. Do not call model.forward() directly!

입력에서 모델을 호출하면 각 클래스에 대한 10개의 원시 예측 값의 각 출력에 해당하는 dim=0과 각 출력의 개별 값에 해당하는 dim=1인 2차원 텐서를 반환합니다. nn.Softmax 모듈의 인스턴스를 통과하여 예측 확률을 얻습니다.

Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output. We get the prediction probabilities by passing it through an instance of the nn.Softmax module.

CodeOut

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([2], device='cuda:0')

Model Layers

FashionMNIST 모델의 레이어를 분석해 봅시다. 이를 설명하기 위해 28x28 크기의 3개 이미지로 구성된 샘플 미니배치를 가져와 네트워크를 통과할 때 어떤 일이 발생하는지 확인합니다.

Let’s break down the layers in the FashionMNIST model. To illustrate it, we will take a sample minibatch of 3 images of size 28x28 and see what happens to it as we pass it through the network.

CodeOut

input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])

`nn.Flatten`

nn.Flatten 레이어를 초기화하여 각 2D 28x28 이미지를 784 픽셀 값의 연속 배열로 변환합니다(미니배치 차원(dim=0에서)이 유지됨).

We initialize the nn.Flatten layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( the minibatch dimension (at dim=0) is maintained).

CodeOut

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])

`nn.Linear`

선형 레이어는 저장된 가중치와 편향을 사용하여 입력에 선형 변환을 적용하는 모듈입니다.

The linear layer is a module that applies a linear transformation on the input using its stored weights and biases.

CodeOut

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])

`nn.ReLU`

비선형 활성화는 모델의 입력과 출력 사이에 복잡한 매핑을 만드는 것입니다. 비선형성을 도입하기 위해 선형 변환 후에 적용되어 신경망이 다양한 현상을 학습하도록 돕습니다.

Non-linear activations are what create the complex mappings between the model’s inputs and outputs. They are applied after linear transformations to introduce nonlinearity, helping neural networks learn a wide variety of phenomena.

이 모델에서는 선형 레이어 간에 nn.ReLU를 사용하지만 모델에 비선형성을 도입하기 위한 다른 활성화가 있습니다.

In this model, we use nn.ReLU between our linear layers, but there’s other activations to introduce non-linearity in your model.

CodeOut

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.4988,  0.1104,  0.5095, -0.0039,  0.0648,  0.4597,  0.3565,  0.1058,
         -0.4450, -0.5919, -0.2224, -0.4621, -0.5852,  0.1055,  0.0763, -0.0396,
          0.0664, -0.0582, -0.0969, -0.3257],
        [ 0.3097, -0.2333,  0.2847, -0.0299,  0.0698,  0.2570,  0.6639,  0.2283,
         -0.4441, -0.6207,  0.0160, -0.4387, -0.6074, -0.0954,  0.5952,  0.3003,
         -0.1924,  0.0746,  0.0703, -0.1286],
        [ 0.6549,  0.2369,  0.6299,  0.0896,  0.1700,  0.1656,  0.3755,  0.1096,
         -0.2217, -0.5969, -0.5345, -0.3031, -0.6031,  0.1542,  0.2401,  0.0036,
          0.0358,  0.1442,  0.2810, -0.0572]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.4988, 0.1104, 0.5095, 0.0000, 0.0648, 0.4597, 0.3565, 0.1058, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.1055, 0.0763, 0.0000, 0.0664, 0.0000,
         0.0000, 0.0000],
        [0.3097, 0.0000, 0.2847, 0.0000, 0.0698, 0.2570, 0.6639, 0.2283, 0.0000,
         0.0000, 0.0160, 0.0000, 0.0000, 0.0000, 0.5952, 0.3003, 0.0000, 0.0746,
         0.0703, 0.0000],
        [0.6549, 0.2369, 0.6299, 0.0896, 0.1700, 0.1656, 0.3755, 0.1096, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.1542, 0.2401, 0.0036, 0.0358, 0.1442,
         0.2810, 0.0000]], grad_fn=<ReluBackward0>)

`nn.Sequential`

nn.Sequential은 순서가 지정된 모듈 컨테이너입니다. 데이터는 정의된 것과 동일한 순서로 모든 모듈을 통해 전달됩니다. 순차 컨테이너를 사용하여 seq_modules와 같은 빠른 네트워크를 구성할 수 있습니다.

nn.Sequential is an ordered container of modules. The data is passed through all the modules in the same order as defined. You can use sequential containers to put together a quick network like seq_modules.

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

`nn.Softmax`

신경망의 마지막 선형 레이어는 nn.Softmax 모듈로 전달되는 [-infty, infty]의 원시 값인 로짓(logits)을 반환합니다. 로짓은 각 클래스에 대한 모델의 예측 확률을 나타내는 값 [0, 1]로 조정됩니다. dim 매개변수는 값의 합이 1이 되어야 하는 차원을 나타냅니다.

The last linear layer of the neural network returns logits - raw values in [-infty, infty] - which are passed to the nn.Softmax module. The logits are scaled to values [0, 1] representing the model’s predicted probabilities for each class. dim parameter indicates the dimension along which the values must sum to 1.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Model Parameters

신경망 내부의 많은 레이어는 매개변수화되어 있습니다. 즉, 학습 중에 최적화되는 관련 가중치 및 편향이 있습니다. nn.Module을 서브클래싱하면 모델 객체 내부에 정의된 모든 필드를 자동으로 추적하고 모델의 parameters() 또는 named_parameters() 메서드를 사용하여 모든 매개변수에 액세스할 수 있습니다.

Many layers inside a neural network are parameterized, i.e. have associated weights and biases that are optimized during training. Subclassing nn.Module automatically tracks all fields defined inside your model object, and makes all parameters accessible using your model’s parameters() or named_parameters() methods.

이 예에서는 각 매개변수를 반복하고 크기와 값의 미리보기를 출력합니다.

In this example, we iterate over each parameter, and print its size and a preview of its values.

CodeOut

print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0229, -0.0028, -0.0241,  ...,  0.0319, -0.0086, -0.0207],
        [ 0.0155, -0.0026,  0.0321,  ..., -0.0103,  0.0049, -0.0344]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0283, -0.0130], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0234,  0.0337, -0.0222,  ..., -0.0040,  0.0032,  0.0279],
        [ 0.0397, -0.0134, -0.0293,  ...,  0.0131, -0.0424,  0.0343]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([0.0333, 0.0191], device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0021, -0.0263,  0.0034,  ...,  0.0250, -0.0218,  0.0266],
        [-0.0177,  0.0311, -0.0379,  ..., -0.0167,  0.0137, -0.0286]],
       device='cuda:0', grad_fn=<SliceBackward0>)

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0157, -0.0078], device='cuda:0', grad_fn=<SliceBackward0>)

References

https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html