티스토리 뷰

ML & DL/PyTorch

torch.optim.Optimizer

Enna 2023. 2. 20. 15:50

torch.optim.Optimizer(*params*, *defaults*)

torch.optim - PyTorch 1.13 documentation

  • various optimization algorithms

Constructing it

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr=0.0001)

Per-parameter options

  • pass an iterable of variables, pass in an iterable of dicts

  • each of them will define a separate parameter group, and should contain a params key, containing a list of parameters belonging to it

  • specify per-layer learning rates

      optim.SGD([
                      {'params': model.base.parameters()},
                      {'params': model.classifier.parameters(), 'lr': 1e-3}
                  ], lr=1e-2, momentum=0.9)

Taking an optimization step

  • optimizer.step()

    • all optimizers implement a step() method, that updates the parameters

    • function can be called once the gradients are computed using e.g. backward()

        for input, target in dataset:
            optimizer.zero_grad()
            output = model(input)
            loss = loss_fn(output, target)
            loss.backward()
            optimizer.step()
  • optimizer.step(closure)

    • reevaluate the function multiple times
    • pass in a closure that allows them to recompute the model
    • the closure should clear the gradients, compute the loss, and return it

Base class

  • Optimizer.add_param_group: add a param group to the Optimizer's param_groups
  • Optimizer.load_state_dict: loads the optimizer state
  • Optimizer.state_dict: returns the state of the optimizer as a dict
  • Optimizer.step: performs a single optimization step (parameter update)
  • Optimizer.zero_grad: sets the gradients of all optimized torch.Tensors to zero
    • PyTorch에서는 gradient 값들을 추후에 backward를 해줄 때 계속 더해주기 때문
    • 따라서 항상 backpropagation을 하기 전에 gradients를 0으로 만들어주고 시작해야 함

How to adjust learning rate

  • torch.optim.lr_scheduler

    • provides several methods to adjust the learning rate based on the number of epochs

    • torch.optim.lr_scheduler.ReduceLR0nPlateau allows dynamic learning rate reducing based on some validation measurements

      model = [Parameter(torch.randn(2, 2, requires_grad=True))]
      optimizer = SGD(model, 0.1)
      scheduler = ExponentialLR(optimizer, gamma=0.9)
      
      for epoch in range(20):
        for input, target in dataset:
            optimizer.zero_grad()
            output = model(input)
            loss = loss_fn(output, target)
            loss.backward()
            optimizer.step()
        scheduler.step()
    • most learning rate schedulers can be called back-to-back (chaining schedulers)

    • each scheduler is applied one after the other on the learning rate obtained by one preceding it

'ML & DL > PyTorch' 카테고리의 다른 글

no_grad vs. requires_grad  (0) 2023.02.20
Dataset & DataLoader  (0) 2023.02.20
torch.nn.Linear  (0) 2023.02.20
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/06   »
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
글 보관함