这项由北京航空航天大学的杨健、国鑫、林静等研究者联合优矿公司和中国人民大学人工智能学院团队完成的突破性研究,发表于2025年12月的arXiv预印本(论文编号:2512.13472v1),是全球首次系统性探索多语言编程训练规律的重要成果。
在代码大模型(Code LLMs)的预训练中,行业内长期存在一种惯性思维,即把所有编程语言的代码都视为同质化的文本数据,主要关注数据总量的堆叠。然而,现代软件开发本质上是多语言混合的,不同语言的语法特性、语料规模和应用场景差异巨大。如果忽略这些差异,笼统地应用通用的 Scaling Laws,往往会导致性能预测偏差和算力浪费。
作者 | Sergio De Simone 译者 | 马可薇谷歌刚刚发布了 Magika 1.0 版本,这是其开源文件类型检测系统的一次重大重写。新版本引入了 AI 技术,以支持更广泛的文件类型,并使用 Rust ...
“Technology from the past comes to save the future from itself.” That’s how Graydon Hoare, the creator of Rust, describes what he wants to achieve. That’s one of the key hallmarks of Rust: using ...
Four new projects make it easier to develop Rust libraries with Python bindings, allowing Rust to replace C as a low-level Python partner Proponents of Rust, the language engineered by Mozilla to give ...
The best new features and fixes in Python 3.14 Released in October 2025, the latest edition of Python makes free-threaded ...
“I know other languages, I don’t need to learn Rust” – sound familiar? If you haven’t started learning about Rust yet, you may want to change your tune. According to a recent study by technology ...