Bilingual (中文+EN) ML / LLM / diffusion / agent interview cheat sheets for AI 秋招 — generated by ARIS /interview-cheatsheet, rendered by /render-html into single-file HTML, reads anywhere — plus a CV ...
single-turn preferences do not directly transfer to multi-turn task success The RM learns "which style humans prefer", not "which call order solves the problem" Reward is over the entire response, ...