Announcement_paper_llms_evaluation

Our group at MPI-SWS has released a new joint paper on the evaluation of Large Language Models (LLMs). We use a causal framework to argue why controlling for the randomization in the generative process can be beneficial when evaluating models. It can be found here. It has also been submitted to ICML 2025.