LLMs
This page lists all LLMs and their quantizations used in the benchmarks.
| Model | Quantization | ||||||
|---|---|---|---|---|---|---|---|
| Name | Params (B) | Active (B) |
Sampling
Based on official recommendation. T=Temp, K=TopK, P=TopP, M=MinP | Type | Format | Size (GB) | Download |
| qwen3-4b-instruct-2507 | 4.0 | 4.0 | T=0.7 K=20 P=0.8 M=0 | UD-Q4-K-XL | Gguf | 2.55 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf |
| UD-Q8-K-XL | Gguf | 5.06 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf | ||||
| F16 | Gguf | 8.05 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-F16.gguf | ||||
| FP8 | Safetensors | 5.20 | Qwen/Qwen3-4B-Instruct-2507-FP8 hf download Qwen/Qwen3-4B-Instruct-2507-FP8 | ||||
| BF16 | Safetensors | 8.05 | Qwen/Qwen3-4B-Instruct-2507 hf download Qwen/Qwen3-4B-Instruct-2507 | ||||
| gpt-oss-20b | 20 | 3.6 | T=1 K=0 P=1 M=0 | MXFP4 | Gguf | 12.1 | ggml-org/gpt-oss-20b-GGUF hf download ggml-org/gpt-oss-20b-GGUF |
| MXFP4 | Safetensors | 13.8 | openai/gpt-oss-20b hf download openai/gpt-oss-20b --exclude "metal/*" "original/*" | ||||
| devstral-small-2-24b-instruct-2512 | 24 | 24 | T=0.15 K=0 P=1 M=0.01 | UD-Q4-K-XL | Gguf | 14.5 | unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf |
| Q8_0 | Gguf | 25.1 | unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf | ||||
| qwen3-30b-a3b-instruct-2507 | 31 | 3.3 | T=0.7 K=20 P=0.8 M=0 | Q4-0 | Gguf | 17.4 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf |
| UD-Q4-K-XL | Gguf | 17.7 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL.gguf | ||||
| UD-Q8-K-XL | Gguf | 36.0 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q8_K_XL.gguf | ||||
| granite-4.0-h-small | 32 | 9.0 | T=0 K=0 P=1 M=0 | UD-Q4-K-XL | Gguf | 18.8 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q4_K_XL.gguf |
| UD-Q8-K-XL | Gguf | 38.1 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q8_K_XL.gguf | ||||
| BF16 | Gguf | 64.4 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include BF16/* | ||||
| gpt-oss-120b | 117 | 5.1 | T=1 K=0 P=1 M=0 | MXFP4 | Gguf | 63.4 | ggml-org/gpt-oss-120b-GGUF hf download ggml-org/gpt-oss-120b-GGUF |