New blog post on why does the chosen and the rejected log-probs is decreased during DPO and why it is to some extent beneficial for alignment.