Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs

Marco Vieira•Priyam Shah•Bhavain Shah•Rrezarta Krasniqi

This dashboard presents an interactive exploration of Polyglot, a multi-language framework for evaluating LLM performance in code translation. Leveraging the IBM CodeNet Project dataset, we assess translation quality through syntactic correctness, execution reliability, and semantic preservation. Use the filters below to explore how different models, prompting strategies, and problem complexities impact translation success across multiple target languages.

This is a living dataset — new results and analyses will be added soon. Check back regularly for updates!

Performance Analysis

Analysis of translation quality across compilation failures, runtime errors, test failures, and overall test pass rates.

Code Variation Analysis

Analyze how translated code differs from source code in terms of cyclomatic complexity (ΔCC) and source lines of code (ΔSLoC).

Metrics and charts in this section use pre-aggregated data across all versions. Version filter applies only to Performance Analysis above (this will be improved in the future).