The mechanism is clever: arithmetic coding maps each sequence to a sub-interval of [0,1] proportional to its probability...

The mechanism is clever: arithmetic coding maps each sequence to a sub-interval of [0,1] proportional to its probability, so QMC low-discrepancy points in [0,1] become exact LM samples with better coverage. Each marginal is correct by construction, making it a drop-in for GRPO. The striking part is near-saturation of the union-bound ceiling on pass@k. But all four tasks use compact symbolic vocabularies with verifiable rewards.#MachineLearning #TestTimeScaling #MonteCarloMethods

Read Original

Related