Announcement_paper_llms_evaluation | Ander Artola Velasco

Our group at MPI-SWS has released a new joint paper on the evaluation of Large Language Models (LLMs). We use a causal framework to argue why controlling for the randomization in the generative process can be beneficial when evaluating models. It can be found here.