Spaces:
Running
Running
Tighten renderers comparison, commit to thesis
#2
by kashif HF Staff - opened
Follow-up to #1, addressing review feedback that the comparison section was too long,
over-built, and too even-handed for a post whose thesis is "you don't need this complexity
if you keep the trajectory immutable."
Changes:
- Compress the section to ~1/4 (790 deletions, 9 insertions). Renamed to
"Do you need a renderer for this?" and reframed thesis-forward: a renderer is, in effect,
a programmable fork of the chat template, and for RL most of its machinery guards against
problems the TITO loop never has. - Add the controlled-disagreement argument (the sharpest case for TITO): a renderer that
claims faithful parity with a template would reproduce its thinking-strip behaviour, which
is exactly wrong for RL. So it must intentionally diverge from the template and flag it — a
policy decision it can't derive automatically. TITO never re-renders prior turns, so the
trajectory is immutable for free. - Keep it honest, not zero-cost: still notes the property-test path needs a 12-line check
plus the occasional one-line patch (stock Qwen3-0.6B fails 100/100; Qwen2.5 passes 100/100). - Drop two figures. The per-model surface chart (
tito_fig5) and the throughput plot
(tito_fig7) are removed. The throughput plot in particular validated the renderer's own
">3x" headline — the ~3x is the cost of re-rendering history, which both a renderer and a
correct TITO loop avoid, so it argued against MITO rather than for the section's point. - Keeps the runtime data-flow figure (
tito_fig6), which makes the "where does the template
logic live" point that carries the section.
No people or organisations are named in the prose.
thanks!
qgallouedec changed pull request status to merged