torch.optim.Optimizer(*params*, *defaults*)
torch.optim - PyTorch 1.13 documentation
- various optimization algorithms
Constructing it
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr=0.0001)
Per-parameter options
pass an iterable of
variable
s, pass in an iterable ofdict
seach of them will define a separate parameter group, and should contain a
params
key, containing a list of parameters belonging to itspecify per-layer learning rates
optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9)
Taking an optimization step
optimizer.step()
all optimizers implement a
step()
method, that updates the parametersfunction can be called once the gradients are computed using e.g.
backward()
for input, target in dataset: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step()
optimizer.step(closure)
- reevaluate the function multiple times
- pass in a closure that allows them to recompute the model
- the closure should clear the gradients, compute the loss, and return it
Base class
Optimizer.add_param_group
: add a param group to theOptimizer
's param_groupsOptimizer.load_state_dict
: loads the optimizer stateOptimizer.state_dict
: returns the state of the optimizer as adict
Optimizer.step
: performs a single optimization step (parameter update)Optimizer.zero_grad
: sets the gradients of all optimizedtorch.Tensor
s to zero- PyTorch에서는 gradient 값들을 추후에 backward를 해줄 때 계속 더해주기 때문
- 따라서 항상 backpropagation을 하기 전에 gradients를 0으로 만들어주고 시작해야 함
How to adjust learning rate
torch.optim.lr_scheduler
provides several methods to adjust the learning rate based on the number of epochs
torch.optim.lr_scheduler.ReduceLR0nPlateau
allows dynamic learning rate reducing based on some validation measurementsmodel = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = ExponentialLR(optimizer, gamma=0.9) for epoch in range(20): for input, target in dataset: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() scheduler.step()
most learning rate schedulers can be called back-to-back (chaining schedulers)
each scheduler is applied one after the other on the learning rate obtained by one preceding it
'ML & DL > PyTorch' 카테고리의 다른 글
no_grad vs. requires_grad (0) | 2023.02.20 |
---|---|
Dataset & DataLoader (0) | 2023.02.20 |
torch.nn.Linear (0) | 2023.02.20 |