I keep forgetting the exact formulation of `binary_cross_entropy_with_logits` in pytorch. So write this down for future reference.
The function binary_cross_entropy_with_logits
takes as two kinds of inputs: (1) the value right before the probability transformation (softmax) layer, whose range is (-infinity, +infinity); (2) the target, whose values are binary
binary_cross_entropy_with_logits
calculates the following loss (i.e., negative log likelihood), ignoring sample weights:
>>> import torch
>>> import torch.nn.functional as F
>>> input = torch.tensor([3.0])
>>> target = torch.tensor([1.0])
>>> F.binary_cross_entropy_with_logits(input, target)
tensor(0.0486)
>>> - (target * torch.log(torch.sigmoid(input)) + (1-target)*torch.log(1-torch.sigmoid(input)))
tensor([0.0486])
2019-12-03 Update
Now let’s look at the difference between N-classes cross entropy and KL-divergence loss. They refer the same thing (https://adventuresinmachinelearning.com/cross-entropy-kl-divergence/) but differ only in I/O format.
import torch.nn as nn import torch import torch.nn.functional as F loss = nn.CrossEntropyLoss() input = torch.randn(3, 5, requires_grad=True) target = torch.empty(3, dtype=torch.long).random_(5) loss(input, target) >>> tensor(1.6677, grad_fn=<NllLossBackward>) loss1 = nn.KLDivLoss(reduction="batchmean") loss1(nn.LogSoftmax()(input), F.one_hot(target, 5).float()) >>> tensor(1.6677, grad_fn=<NllLossBackward>)
Reference:
[1] https://pytorch.org/docs/stable/nn.html#binary-cross-entropy-with-logits
[2] https://pytorch.org/docs/stable/nn.html#bcewithlogitsloss