HLCE: Humanity's Last Code Exam

Paper Dataset GitHub

Latest News

[17/06/2025]: Our paper is released on ArXiv. Checkout Paper !
[13/06/2025]: We are excited to release HLCE. Checkout GitHub and HuggingFace!

Introduction

HLCE is a challenging benchmark of 235 competitive programming problems sourced from the IOI and ICPC World Finals. It features both standard and interactive problems, along with a novel self‑assessment task designed to evaluate deeper reasoning capabilities. By incorporating data from human competitions, HLCE provides metrics that directly compare large language models with elite human programmers—revealing a clear gap in performance. The benchmark is designed to push the boundaries of code generation, encouraging progress toward models that can truly compete at the highest levels of programming expertise.

LeaderBoard

We evaluate solutions using pass@1 and pass@5 metrics for both ICPC and IOI competitions, reporting average performance across both datasets. For IOI, we additionally provide average points earned. Detailed scoring scripts are available in our code repository.

ICPC IOI Average

Submitting Custom Models

To submit a model:

Copy your model's output folder to the Results directory.
Rename the folder to your model's name.
Create a pull request for these changes.

We'll review your submission and integrate the model into the leaderboard upon approval.

Citation

If you use HLCE in your research, please cite our paper:

@misc{li2025humanityscodeexamadvanced,
      title={Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?}, 
      author={Xiangyang Li and Xiaopeng Li and Kuicai Dong and Quanhu Zhang and Rongju Ruan and Xinyi Dai and Xiaoshuang Liu and Shengchun Xu and Yasheng Wang and Ruiming Tang},
      year={2025},
      eprint={2506.12713},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2506.12713}
}