All Benchmark Results

These benchmarks measure single-user LLM inference speed across different combinations of hardware, models, and inference applications.

Each workload cell shows three numbers:

Time (s) 12.5

150 Prompt processing (t/s)

45 Token generation (t/s)

Time: Total time in seconds to run the workload, from sending the API request to receiving the last generated token. Lower is better.
Prompt processing: Speed in tokens per second to process the prompt part of the workload. Higher is better.
Token generation: Speed in tokens per second to generate the output, which is always 500 tokens in length. Higher is better.

You may also want to read Method.

You can click on a row to open the detail page that shows the launch command and all measurements that were used to calculate these numbers.

Similar total times, different speeds: On very fast setups running short workloads, you might see the same total time (e.g., 4.3 seconds) for clearly different PP/TG speeds. This might look like an error, but the measurements typically show very fast PP times for both runs, with averages differing by only hundredths of a second. Because the total times are shown with limited precision, such small differences are lost.

PP speed increasing with prompt length: There is a certain (static) overhead included in the measured prompt processing time, caused by sending the request to the endpoint, internal processing, and generating & returning the first token. The longer the actual prompt processing takes, the smaller that overhead is proportionally.

If you still think you found inconsistent numbers, please tell me.

GPUs Tested

CPUs Tested

Models Tested

Quant Files Tested

Setups Benchmarked

1,260

Workloads Measured

25.6h

Total Runtime

I put a lot of effort into the benchmark automation to make sure the launch configs are reasonably optimized and the results are reported just as they were measured. But don't decide for or against hardware/models/apps solely on these benchmarks. Research other sources.

Filter Results

Click into a field to see and select available values. Use * as wildcard, | for OR.

Hardware		Model		Inference		Workloads Describes the prompt length in tokens. The generation length is always 500 tokens.
GPU▲	CPU▲	LLM▲	Quant▲	App▲	Option▲	1K ▲	4K ▲	8K ▲	16K ▲	32K ▲	64K ▲
AMD Radeon Mi50	—	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	20.8 577 26.3	31.8 329 25.5	53.8 246 23.6	83.4 259 23.3	164 229 20.9	385 181 17.4
AMD Radeon Mi50 (2x)	—	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	22.0 572 24.6	29.0 499 23.9	43.1 393 22.1	63.8 392 21.9	108 387 19.9	230 321 16.5
AMD Radeon Mi50	—	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU only (ROCm)	23.8 358 23.7	42.1 196 23.2	74.9 155 21.6	124 160 21.3	243 148 19.3	OOM
AMD Radeon Mi50 (2x)	—	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU only (ROCm)	25.2 343 22.4	36.0 307 21.8	56.2 255 20.2	86.9 258 20.1	152 257 18.3	316 227 15.4
Nvidia RTX 4080	Intel Core i7-13700K	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (25% Offload)	33.3 2312 15.2	39.1 2276 13.4	51.6 1935 10.5	76.2 1992 7.3	OOM	—
AMD Radeon Mi50	AMD EPYC 7F52	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	35.0 501 15.2	52.4 317 12.6	83.3 238 10.1	132 246 7.4	248 221 4.9	554 177 2.6
AMD Radeon 8060S (AI Max+ 395)	—	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	37.2 845 13.9	46.1 445 13.5	68.3 293 12.3	99.8 276 12.0	169 264 10.6	392 192 8.7
AMD Radeon Mi50	AMD EPYC 7F52	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU (ROCm) & CPU (25% Offload)	47.6 311 11.3	72.6 187 9.8	115 149 8.1	186 150 6.3	340 142 4.4	729 122 2.5
AMD Radeon Mi50	AMD EPYC 7F52	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (75% Offload)	57.9 413 9.0	86.3 309 6.8	133 233 5.1	215 238 3.4	393 215 2.0	858 171 1.0
AMD Radeon 8060S (AI Max+ 395)	—	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU only (ROCm)	58.6 920 8.7	67.5 454 8.5	89.1 298 8.1	118 291 7.9	188 270 7.3	409 195 6.3
Nvidia RTX 4080	Intel Core i7-13700K	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (75% Offload)	69.1 1387 7.3	82.1 1577 6.3	112 1381 4.7	174 1401 3.1	280 1291 2.0	501 1039 1.1
AMD Radeon Mi50	AMD EPYC 7F52	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU (ROCm) & CPU (75% Offload)	88.7 236 5.9	126 172 4.8	185 139 3.9	289 140 2.9	515 132 1.8	1073 115 1.0
—	AMD EPYC 7F52	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	91.1 51 7.0	243 25.1 6.1	562 17.9 4.4	~1301	—	—
Nvidia RTX 4080	Intel Core i7-13700K	devstral-small-2-24b-instruct-2512	Q8_0	llama.cpp	GPU (CUDA) & CPU (75% Offload)	109 945 4.7	122 1592 4.2	153 1413 3.4	215 1446 2.5	324 1348 1.7	537 1126 1.0
—	Intel Core i7-13700K	devstral-small-2-24b-instruct-2512	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	115 52 5.2	274 24.5 4.5	552 19.8 3.5	1182 16.8 2.3	~2688	—
AMD Radeon Mi50 (3x)	—	gpt-oss-120b	MXFP4	llama.cpp	GPU only (ROCm)	11.3 352 59	17.6 441 59	27.1 438 56	43.9 458 56	86.8 416 53	195 353 47.9
AMD Radeon 8060S (AI Max+ 395)	—	gpt-oss-120b	MXFP4	llama.cpp	GPU only (ROCm)	12.3 549 47.8	17.9 591 45.0	25.6 575 43.0	38.5 593 43.4	80.3 474 39.0	208 333 32.0
AMD Radeon Mi50	AMD EPYC 7F52	gpt-oss-120b	MXFP4	llama.cpp	GPU (ROCm) & CPU (100% Offload)	24.6 129 29.8	30.3 329 27.6	38.9 374 28.7	57.3 395 29.7	104 369 28.2	223 314 25.9
Nvidia RTX 4080	—	gpt-oss-20b	MXFP4	llama.cpp	GPU only (CUDA)	2.9 5152 186	3.4 6609 177	4.1 7047 170	5.0 7488 172	7.9 6854 155	15.2 5642 130
Nvidia RTX 4080	—	gpt-oss-20b	MXFP4	vLLM	GPU only (CUDA)	3.5 7942 147	3.9 9777 143	4.4 9435 139	OOM	—	—
Nvidia RTX 4080	Intel Core i7-13700K	gpt-oss-20b	MXFP4	llama.cpp	GPU (CUDA) & CPU (25% Offload)	5.1 2510 107	5.7 4501 104	6.5 5131 100	7.8 5557 101	11.3 5277 96	20.4 4414 85
AMD Radeon Mi50	—	gpt-oss-20b	MXFP4	llama.cpp	GPU only (ROCm)	5.8 918 107	8.6 1054 103	12.3 1086 101	19.6 1094 102	39.5 936 94	95.6 713 85
AMD Radeon Mi50 (2x)	—	gpt-oss-20b	MXFP4	llama.cpp	GPU only (ROCm)	6.8 976 86	9.0 1240 86	12.8 1204 82	18.6 1265 84	35.3 1121 78	83.6 858 70
AMD Radeon Mi50	AMD EPYC 7F52	gpt-oss-20b	MXFP4	llama.cpp	GPU (ROCm) & CPU (25% Offload)	8.0 679 77	10.9 956 75	14.7 1012 74	22.4 1025 74	43.2 886 71	101 683 65
AMD Radeon 8060S (AI Max+ 395)	—	gpt-oss-20b	MXFP4	llama.cpp	GPU only (ROCm)	8.3 1228 67	11.0 1322 63	14.3 1334 61	20.6 1296 61	42.7 952 55	118 596 45.8
Nvidia RTX 4080	Intel Core i7-13700K	gpt-oss-20b	MXFP4	llama.cpp	GPU (CUDA) & CPU (100% Offload)	11.3 1062 48.3	11.9 2744 47.7	13.0 3283 47.4	14.8 3723 47.7	19.4 3671 46.6	30.8 3312 43.6
AMD Radeon Mi50	AMD EPYC 7F52	gpt-oss-20b	MXFP4	llama.cpp	GPU (ROCm) & CPU (100% Offload)	14.3 420 41.9	17.6 784 40.0	21.7 854 40.6	29.9 887 42.2	53.5 788 39.0	115 626 37.9
—	Intel Core i7-13700K	gpt-oss-20b	MXFP4	llama.cpp	CPU only (OneAPI MKL)	25.2 111 30.9	59.5 98 26.4	118 84 21.4	275 68 12.7	733 48.3 7.0	2224 30.5 4.0
—	Intel Core i7-13700K	gpt-oss-20b	MXFP4	llama.cpp	CPU only (Generic)	28.4 99 27.3	67.7 87 23.0	132 76 18.7	304 62 10.4	799 45.7 5.1	2387 29.1 2.7
—	AMD EPYC 7F52	gpt-oss-20b	MXFP4	llama.cpp	CPU only (Generic)	31.2 97 24.0	79.8 84 15.5	159 72 10.5	372 55 6.3	1019 36.5 3.5	~3263
AMD Radeon Mi50	—	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	15.0 477 38.9	20.9 509 38.4	28.5 522 37.9	43.3 533 37.9	74.9 522 37.2	143 496 35.8
AMD Radeon Mi50 (2x)	—	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	16.7 466 34.5	22.3 611 32.2	26.8 661 34.2	38.2 687 33.7	62.5 675 33.4	118 624 32.1
AMD Radeon Mi50	AMD EPYC 7F52	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	17.4 396 33.5	24.1 448 33.1	32.5 463 32.8	49.1 473 32.7	84.5 464 32.3	167 426 31.2
AMD Radeon Mi50 (2x)	—	granite-4.0-h-small	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	21.1 258 29.0	OOM	—	—	—	—
Nvidia RTX 4080	Intel Core i7-13700K	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (100% Offload)	21.4 591 25.3	24.8 848 25.0	28.4 921 25.4	36.4 968 25.2	52.4 981 25.4	86.5 970 24.6
AMD Radeon 8060S (AI Max+ 395)	—	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	21.5 529 25.5	29.5 524 22.9	45.0 485 17.6	53.0 543 21.3	102 498 13.3	212 414 8.8
AMD Radeon Mi50 (3x)	—	granite-4.0-h-small	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	22.2 238 27.9	34.4 268 25.7	50.6 267 24.3	76.3 286 24.6	OOM	—
AMD Radeon Mi50	AMD EPYC 7F52	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (100% Offload)	28.8 230 20.5	39.9 279 19.6	55.9 276 18.6	80.0 303 18.4	141 288 16.6	OOM
AMD Radeon 8060S (AI Max+ 395)	—	granite-4.0-h-small	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	32.1 490 16.6	40.8 472 15.5	56.3 467 12.8	65.2 512 14.8	116 472 10.4	231 393 7.4
AMD Radeon Mi50	AMD EPYC 7F52	granite-4.0-h-small	UD-Q8-K-XL	llama.cpp	GPU (ROCm) & CPU (100% Offload)	41.0 128 15.1	59.2 162 14.5	84.4 165 13.9	127 176 13.9	OOM	—
AMD Radeon Mi50 (3x)	—	granite-4.0-h-small	BF16	llama.cpp	GPU only (ROCm)	41.1 68 18.9	OOM	—	—	—	—
—	Intel Core i7-13700K	granite-4.0-h-small	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	68.4 43.3 11.0	141 42.4 10.7	243 41.0 10.5	441 40.9 10.4	833 41.3 9.5	~1640
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (25% Offload)	5.0 2966 107	OOM	—	—	—	—
AMD Radeon Mi50	—	qwen3-30b-a3b-instruct-2507	Q4-0	llama.cpp	GPU only (ROCm)	6.7 1305 84	11.1 918 75	17.4 795 71	26.6 832 71	60.7 612 60	184 371 47.4
AMD Radeon Mi50	—	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	7.7 1212 73	12.3 868 66	18.8 756 63	28.5 789 63	63.5 589 55	189 362 43.5
AMD Radeon 8060S (AI Max+ 395)	—	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	8.7 1239 63	15.4 769 49.4	24.0 649 44.2	35.5 675 44.6	84.2 462 33.7	266 263 22.9
AMD Radeon Mi50 (2x)	—	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	9.1 1130 61	13.0 1031 55	18.9 864 52	27.1 915 52	60.4 673 46.8	165 456 38.5
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	9.5 900 60	14.2 828 54	20.6 742 52	31.0 756 52	66.9 571 46.4	193 356 38.1
AMD Radeon Mi50 (2x)	—	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	9.8 705 59	16.4 577 53	25.3 525 50	39.6 543 51	83.4 443 45.5	226 302 37.5
AMD Radeon Mi50 (3x)	—	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	10.6 698 55	16.8 575 51	25.7 522 49.1	40.3 540 48.4	97.0 390 42.5	228 300 34.9
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (100% Offload)	10.8 1257 50	13.8 1256 47.1	17.4 1239 45.6	23.6 1275 45.5	40.0 1152 41.5	75.3 1059 34.0
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	12.3 497 48.8	18.2 550 45.7	27.1 510 44.4	42.5 516 44.6	87.9 424 40.4	234 293 32.9
AMD Radeon 8060S (AI Max+ 395)	—	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	13.0 1088 41.5	19.8 723 35.2	28.6 618 32.6	40.8 640 32.8	90.6 446 26.7	276 257 19.4
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (CUDA) & CPU (75% Offload)	15.0 866 36.2	16.7 1449 36.0	19.9 1453 34.8	25.0 1497 35.0	38.5 1401 32.1	OOM
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (100% Offload)	16.1 521 35.3	23.4 458 34.1	34.6 425 32.0	52.5 436 32.2	104 369 29.0	262 266 24.9
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-30b-a3b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (ROCm) & CPU (100% Offload)	20.7 285 29.1	27.1 419 28.5	41.4 352 27.1	62.2 376 25.8	120 323 23.8	290 241 21.2
—	Intel Core i7-13700K	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	38.0 77 20.9	147 39.9 11.1	356 27.4 7.9	654 27.3 7.8	~1202	—
—	AMD EPYC 7F52	qwen3-30b-a3b-instruct-2507	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	39.3 83 19.7	154 43.2 8.6	370 28.1 6.0	—	—	—
Nvidia RTX 4080	—	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (CUDA)	3.0 9490 171	4.5 7231 128	5.7 6726 111	6.6 7667 113	11.8 5694 80	29.8 3200 51
Nvidia RTX 4080	—	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (CUDA)	4.9 8779 105	6.3 7444 87	7.5 6989 79	8.3 7769 80	13.6 5743 62	31.5 3215 43.2
Nvidia RTX 4080	—	qwen3-4b-instruct-2507	FP8	vLLM	GPU only (CUDA)	6.7 11897 75	7.4 12703 71	8.2 11333 67	10.2 8685 60	15.6 5874 49.2	31.9 3536 36.3
AMD Radeon Mi50	—	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	6.8 1297 83	11.4 974 68	17.2 885 62	25.8 908 63	57.5 676 49.4	171 410 34.2
Nvidia RTX 4080	—	qwen3-4b-instruct-2507	BF16	vLLM	GPU only (CUDA)	6.9 8451 74	7.5 9469 70	8.4 8782 66	10.7 7105 60	16.5 5106 48.9	OOM
Nvidia RTX 4080	—	qwen3-4b-instruct-2507	F16	llama.cpp	GPU only (CUDA)	7.0 8617 73	8.5 7194 63	9.7 6742 59	10.6 7462 59	15.9 5559 49.1	OOM
AMD Radeon Mi50	—	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	7.3 850 81	13.0 721 68	19.7 693 61	31.6 686 62	69.3 544 48.5	196 355 33.8
AMD Radeon Mi50 (2x)	—	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	8.1 1456 68	12.0 1229 57	17.4 1045 52	24.1 1106 52	53.1 810 42.1	142 549 30.8
AMD Radeon Mi50 (2x)	—	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	8.2 988 69	13.2 952 56	19.2 842 52	27.8 889 51	60.1 685 41.8	156 487 30.5
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (25% Offload)	8.4 6860 61	13.4 5485 39.5	22.8 5331 23.5	41.6 5895 12.9	77.4 4581 7.1	156 2807 3.8
AMD Radeon 8060S (AI Max+ 395)	—	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU only (ROCm)	9.1 1584 59	15.8 924 44.5	25.3 653 38.6	36.8 675 38.8	90.3 482 28.2	298 264 18.2
AMD Radeon Mi50	—	qwen3-4b-instruct-2507	F16	llama.cpp	GPU only (ROCm)	9.9 997 57	13.9 1076 49.1	19.1 995 45.6	27.4 986 46.1	59.1 698 38.0	183 388 28.6
AMD Radeon Mi50 (2x)	—	qwen3-4b-instruct-2507	F16	llama.cpp	GPU only (ROCm)	11.2 1042 49.1	15.4 985 44.1	21.4 873 41.2	29.6 924 41.1	61.3 704 34.8	157 495 26.4
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	12.5 1177 43.1	23.3 928 26.4	37.6 836 17.9	61.9 864 11.6	126 642 6.6	318 392 3.2
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (CUDA) & CPU (25% Offload)	13.0 5389 39.0	18.1 5557 28.8	27.9 5222 19.0	46.6 5890 11.4	81.0 4704 6.7	161 2978 3.6
AMD Radeon 8060S (AI Max+ 395)	—	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU only (ROCm)	14.0 1775 37.3	19.9 1102 30.8	26.2 976 27.9	35.0 953 28.2	72.6 641 22.1	235 317 15.4
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (ROCm) & CPU (25% Offload)	15.3 774 35.7	27.3 708 23.2	42.2 666 16.6	69.8 661 11.0	140 519 6.4	353 326 3.2
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (CUDA) & CPU (75% Offload)	16.9 4515 29.9	28.6 4194 18.1	53.4 3865 9.7	105 4292 5.0	194 3499 2.7	395 2194 1.4
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	F16	llama.cpp	GPU (CUDA) & CPU (25% Offload)	18.5 4444 27.4	23.7 5128 21.8	32.7 4868 16.1	52.4 5445 10.1	86.8 4426 6.3	163 2873 3.5
AMD Radeon 8060S (AI Max+ 395)	—	qwen3-4b-instruct-2507	F16	llama.cpp	GPU only (ROCm)	20.7 1879 24.9	26.5 1151 21.8	32.6 1011 20.4	41.0 994 20.5	78.2 655 17.1	239 321 12.8
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	GPU (ROCm) & CPU (75% Offload)	24.3 553 23.4	64.5 274 10.4	110 174 7.8	123 250 9.0	278 172 5.5	~1260
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (CUDA) & CPU (75% Offload)	25.8 3392 19.6	37.5 4429 13.7	63.0 4154 8.2	119 4507 4.3	200 3803 2.6	401 2497 1.3
AMD Radeon Mi50	AMD EPYC 7F52	qwen3-4b-instruct-2507	UD-Q8-K-XL	llama.cpp	GPU (ROCm) & CPU (75% Offload)	30.0 436 18.5	70.9 248 9.4	120 162 7.1	136 221 8.3	301 159 5.1	~1308
—	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	30.9 122 22.1	83.2 86 13.7	190 65 7.4	493 44.5 3.8	~1447	—
—	Intel Core i7-13700K	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	CPU only (OneAPI MKL)	31.0 117 22.7	97.7 68 13.1	218 47.3 10.4	400 46.3 9.6	858 41.9 5.6	~2160
Nvidia RTX 4080	Intel Core i7-13700K	qwen3-4b-instruct-2507	F16	llama.cpp	GPU (CUDA) & CPU (75% Offload)	38.7 2493 13.1	49.8 3903 10.3	75.6 3559 6.8	131 3965 4.0	217 3411 2.4	405 2332 1.3
—	AMD EPYC 7F52	qwen3-4b-instruct-2507	UD-Q4-K-XL	llama.cpp	CPU only (Generic)	41.2 91 17.1	137 49.8 9.2	311 33.6 6.9	542 33.7 7.8	1099 32.4 4.9	~2595

OOM Out of memory

~ Predicted time (stopped benchmarking there)