openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...
CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...
Abstract: Infrastructure development in the urban areas greatly reflects people’s lifestyles, and this hence calls for serious consideration with the public opinion. This study develops an NLP ...
1 Department of Software Science, Tallinn University of Technology (TalTech), Tallinn, Estonia 2 LEARN! Research Institute, Vrije Universiteit Amsterdam, Amsterdam, Netherlands Project-based learning ...
Abstract: Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated ...