Here, I am talking about a few techniques of using deep neural networks to accomplish sorting/ranking tasks.
Reinforcement Learning – policy gradient paradigm
Using policy gradient to solve combinatorial optimization problems such as Traveling Salesman Problems is not new. Ranking K out of N candidates is also a combinatorial optimization problem thus can be solved by policy gradient. Now the only question remained is how you parameterize the ranking policy. You can parameterize as a sequence model (considering item interactions) [2] or a Packelee Lucce distribution (if there is assumed to be an optimal pointwise relevance score). Additionally, as we discussed in [1], you can treat the ranking reward non-differentiable (such as logged impressions, clicks, etc.) or differentiable (some loss similar to NDCG but differentiable).
soft rank
References
[1] https://czxttkl.com/2020/02/18/practical-considerations-of-off-policy-policy-gradient/