Best Results per LLM

Filters all results by LLM, Workload, and Quant, then shows the fastest tested setup for each GPU, CPU, or combined configuration.

For more context, see Systems and All Results.

Each bar shows the total time it took to process a prompt of the selected workload length and to generate 500 tokens. Shorter bars are better/faster. Hover or tap a bar for tooltips, or click the bar label to open the details page.