Deep Learning-based Sorting

Here, I am talking about a few techniques of using deep neural networks to accomplish sorting/ranking tasks.

Reinforcement Learning – policy gradient paradigm

Using policy gradient to solve combinatorial optimization problems such as Traveling Salesman Problems is not new. Ranking K out of N candidates is also a combinatorial optimization problem thus can be solved by policy gradient. Now the only question remained is how you parameterize the ranking policy. You can parameterize as a sequence model (considering item interactions) [2] or a Packelee Lucce distribution (if there is assumed to be an optimal pointwise relevance score). Additionally, as we discussed in [1], you can treat the ranking reward non-differentiable (such as logged impressions, clicks, etc.) or differentiable (some loss similar to NDCG but differentiable). 

soft rank 

References

[1] https://czxttkl.com/2020/02/18/practical-considerations-of-off-policy-policy-gradient/

[2] https://arxiv.org/abs/1810.02019

Leave a comment

Your email address will not be published. Required fields are marked *