We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I've been working my way through the Jupyter Notebook for Chapter 8.
When I run the cell that trains using L2 regularization
model = Net().to(device=device) optimizer = optim.SGD(model.parameters(), lr=1e-2) loss_fn = nn.CrossEntropyLoss() training_loop_l2reg( n_epochs = 100, optimizer = optimizer, model = model, loss_fn = loss_fn, train_loader = train_loader, ) all_acc_dict["l2 reg"] = validate(model, train_loader, val_loader)
The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of training_loop_l2reg in the previous cell:
training_loop_l2reg
def training_loop_l2reg(n_epochs, optimizer, model, loss_fn, train_loader): for epoch in range(1, n_epochs + 1): loss_train = 0.0 for imgs, labels in train_loader: imgs = imgs.to(device=device) labels = labels.to(device=device) outputs = model(imgs) loss = loss_fn(outputs, labels) l2_lambda = 0.001 # Replace pow(2.0) with abs() for L1 regularization l2_norm = sum(p.pow(2.0).sum() for p in model.parameters()) loss = loss + l2_lambda * l2_norm optimizer.zero_grad() loss.backward() optimizer.step() loss_train += loss.item() if epoch == 1 or epoch % 10 == 0: print('{} Epoch {}, Training loss {}'.format( datetime.datetime.now(), epoch, loss_train / len(train_loader)))
Since if I instead train using the weight_decay parameter in SGD instead:
model = NetWidth(n_chans1=32).to(device=device) optimizer = optim.SGD(model.parameters(), weight_decay=0.001, lr=1e-2) loss_fn = nn.CrossEntropyLoss() training_loop( n_epochs = 100, optimizer = optimizer, model = model, loss_fn = loss_fn, train_loader = train_loader, ) all_acc_dict["width"] = validate(model, train_loader, val_loader)
I have no problem with the loss converging.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I've been working my way through the Jupyter Notebook for Chapter 8.
When I run the cell that trains using L2 regularization
The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of
training_loop_l2reg
in the previous cell:Since if I instead train using the weight_decay parameter in SGD instead:
I have no problem with the loss converging.
The text was updated successfully, but these errors were encountered: