💪 FP8 compatibility ! 🚀 Speed Up all Process 🚀 less VRAM consumption (Stay high, batch_size=1 for RTX4090 max, I'm trying to fix that) 🛠️ Better benchmark coming soon ...
This is a deployable baseline, not the final speed ceiling. The strict benchmark/quality lane remains p512/n1536 at context 2048 for comparability; the served OpenAI-compatible endpoint now defaults ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果