Tighten renderers comparison, commit to thesis

#2
by kashif HF Staff - opened

Follow-up to #1, addressing review feedback that the comparison section was too long,
over-built, and too even-handed for a post whose thesis is "you don't need this complexity
if you keep the trajectory immutable."

Changes:

  • Compress the section to ~1/4 (790 deletions, 9 insertions). Renamed to
    "Do you need a renderer for this?" and reframed thesis-forward: a renderer is, in effect,
    a programmable fork of the chat template, and for RL most of its machinery guards against
    problems the TITO loop never has.
  • Add the controlled-disagreement argument (the sharpest case for TITO): a renderer that
    claims faithful parity with a template would reproduce its thinking-strip behaviour, which
    is exactly wrong for RL. So it must intentionally diverge from the template and flag it — a
    policy decision it can't derive automatically. TITO never re-renders prior turns, so the
    trajectory is immutable for free.
  • Keep it honest, not zero-cost: still notes the property-test path needs a 12-line check
    plus the occasional one-line patch (stock Qwen3-0.6B fails 100/100; Qwen2.5 passes 100/100).
  • Drop two figures. The per-model surface chart (tito_fig5) and the throughput plot
    (tito_fig7) are removed. The throughput plot in particular validated the renderer's own
    ">3x" headline — the ~3x is the cost of re-rendering history, which both a renderer and a
    correct TITO loop avoid, so it argued against MITO rather than for the section's point.
  • Keeps the runtime data-flow figure (tito_fig6), which makes the "where does the template
    logic live" point that carries the section.

No people or organisations are named in the prose.

qgallouedec changed pull request status to merged

Sign up or log in to comment