Does RLHF wreck a language model's calibration? Not quite. The calibrated signal relocates rather than vanishes: a confi...

Does RLHF wreck a language model's calibration? Not quite. The calibrated signal relocates rather than vanishes: a confidence the model states in words tracks accuracy better than its own token probabilities, often halving ECE. For instruction-tuned models, calibration becomes an elicitation problem more than a recalibration one. Just ask for the number.https://benjaminhan.net/posts/20260610-just-ask-calibration/?utm_source=mastodon&utm_medium=social#LLMs #Calibration #Metacognition #EMNLP #AI

Read Original

Related