Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs
This dashboard presents an interactive exploration of Polyglot, a multi-language framework for evaluating LLM performance in code translation. Leveraging the IBM CodeNet Project dataset, we assess translation quality through syntactic correctness, execution reliability, and semantic preservation. Use the filters below to explore how different models, prompting strategies, and problem complexities impact translation success across multiple target languages.
This is a living dataset — new results and analyses will be added soon. Check back regularly for updates!
Performance Analysis
Analysis of translation quality across compilation failures, runtime errors, test failures, and overall test pass rates.
Code Variation Analysis
Analyze how translated code differs from source code in terms of cyclomatic complexity (ΔCC) and source lines of code (ΔSLoC).
Metrics and charts in this section use pre-aggregated data across all versions. Version filter applies only to Performance Analysis above (this will be improved in the future).