Focal loss for classification and regression

I haven’t learnt any new loss function for a long time. Today I am going to learn one new loss function, focal loss, which was introduced in 2018 [1]. Let’s start from a typical classification task. For a data , where is the feature vector and is a binary label, a model predicts . Then …

Analyze DistributedDataParallel (DPP)’s behavior

DistributedDataParallel implements data parallelism at the module level which can run across different machines. There is one process running on each device where one copy of the module is held. Each process loads its own data which is non-overlapping with other processes’. At the initialization phase, all copies are synchronized to ensure they start from …