LLMs
This page lists all LLMs and their quantizations used in the benchmarks.
| Model | Quantization | ||||||
|---|---|---|---|---|---|---|---|
| Name | Params (B) | Active (B) |
Sampling
Based on official recommendation. T=Temp, K=TopK, P=TopP, M=MinP | Type | Format | Size (GB) | Download |
| qwen3-4b-instruct-2507 | 4.0 | 4.0 | T=0.7 K=20 P=0.8 M=0 | UD-Q4-K-XL | Gguf | 2.55 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf |
| UD-Q8-K-XL | Gguf | 5.06 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf | ||||
| F16 | Gguf | 8.05 | unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-F16.gguf | ||||
| FP8 | Safetensors | 5.20 | Qwen/Qwen3-4B-Instruct-2507-FP8 hf download Qwen/Qwen3-4B-Instruct-2507-FP8 | ||||
| BF16 | Safetensors | 8.05 | Qwen/Qwen3-4B-Instruct-2507 hf download Qwen/Qwen3-4B-Instruct-2507 | ||||
| gpt-oss-20b | 20 | 3.6 | T=1 K=0 P=1 M=0 | MXFP4 | Gguf | 12.1 | ggml-org/gpt-oss-20b-GGUF hf download ggml-org/gpt-oss-20b-GGUF |
| MXFP4 | Safetensors | 13.8 | openai/gpt-oss-20b hf download openai/gpt-oss-20b --exclude "metal/*" "original/*" | ||||
| devstral-small-2-24b-instruct-2512 | 24 | 24 | T=0.15 K=0 P=1 M=0.01 | UD-Q4-K-XL | Gguf | 14.5 | unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf |
| Q8_0 | Gguf | 25.1 | unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf | ||||
| qwen3-30b-a3b-instruct-2507 | 31 | 3.3 | T=0.7 K=20 P=0.8 M=0 | Q4-0 | Gguf | 17.4 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf |
| UD-Q4-K-XL | Gguf | 17.7 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL.gguf | ||||
| UD-Q8-K-XL | Gguf | 36.0 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q8_K_XL.gguf | ||||
| BF16 | Gguf | 61.1 | unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include BF16/* | ||||
| granite-4.0-h-small | 32 | 9.0 | T=0 K=0 P=1 M=0 | UD-Q4-K-XL | Gguf | 18.8 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q4_K_XL.gguf |
| UD-Q8-K-XL | Gguf | 38.1 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q8_K_XL.gguf | ||||
| BF16 | Gguf | 64.4 | unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include BF16/* | ||||
| glm-4.5-air | 106 | 12 | T=0.5 K=0 P=0.95 M=0 | UD-Q4-K-XL | Gguf | 67.7 | unsloth/GLM-4.5-Air-GGUF hf download unsloth/GLM-4.5-Air-GGUF --include UD-Q4_K_XL/* |
| UD-Q6-K-XL | Gguf | 102 | unsloth/GLM-4.5-Air-GGUF hf download unsloth/GLM-4.5-Air-GGUF --include UD-Q6_K_XL/* | ||||
| gpt-oss-120b | 117 | 5.1 | T=1 K=0 P=1 M=0 | MXFP4 | Gguf | 63.4 | ggml-org/gpt-oss-120b-GGUF hf download ggml-org/gpt-oss-120b-GGUF |
| minimax-m2.1 | 230 | 10 | T=1 K=40 P=0.95 M=0 | UD-Q3-K-XL | Gguf | 101 | unsloth/MiniMax-M2.1-GGUF hf download unsloth/MiniMax-M2.1-GGUF --include UD-Q3_K_XL/* |