Llocalhost

LLMs

This page lists all LLMs and their quantizations used in the benchmarks.

Model Quantization
Name Params (B) Active (B) Sampling
Based on official recommendation. T=Temp, K=TopK, P=TopP, M=MinP
Type Format Size (GB) Download
qwen3-4b-instruct-2507 4.0 4.0
T=0.7
K=20
P=0.8
M=0
UD-Q4-K-XL Gguf 2.55
unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf
UD-Q8-K-XL Gguf 5.06
unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-UD-Q8_K_XL.gguf
F16 Gguf 8.05
unsloth/Qwen3-4B-Instruct-2507-GGUF hf download unsloth/Qwen3-4B-Instruct-2507-GGUF --include Qwen3-4B-Instruct-2507-F16.gguf
FP8 Safetensors 5.20
Qwen/Qwen3-4B-Instruct-2507-FP8 hf download Qwen/Qwen3-4B-Instruct-2507-FP8
BF16 Safetensors 8.05
Qwen/Qwen3-4B-Instruct-2507 hf download Qwen/Qwen3-4B-Instruct-2507
gpt-oss-20b 20 3.6
T=1
K=0
P=1
M=0
MXFP4 Gguf 12.1
ggml-org/gpt-oss-20b-GGUF hf download ggml-org/gpt-oss-20b-GGUF
MXFP4 Safetensors 13.8
openai/gpt-oss-20b hf download openai/gpt-oss-20b --exclude "metal/*" "original/*"
devstral-small-2-24b-instruct-2512 24 24
T=0.15
K=0
P=1
M=0.01
UD-Q4-K-XL Gguf 14.5
unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf
Q8_0 Gguf 25.1
unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF hf download unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF --include Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf
qwen3-30b-a3b-instruct-2507 31 3.3
T=0.7
K=20
P=0.8
M=0
Q4-0 Gguf 17.4
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-Q4_0.gguf
UD-Q4-K-XL Gguf 17.7
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q4_K_XL.gguf
UD-Q8-K-XL Gguf 36.0
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include Qwen3-30B-A3B-Instruct-2507-UD-Q8_K_XL.gguf
BF16 Gguf 61.1
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF hf download unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF --include BF16/*
granite-4.0-h-small 32 9.0
T=0
K=0
P=1
M=0
UD-Q4-K-XL Gguf 18.8
unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q4_K_XL.gguf
UD-Q8-K-XL Gguf 38.1
unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include granite-4.0-h-small-UD-Q8_K_XL.gguf
BF16 Gguf 64.4
unsloth/granite-4.0-h-small-GGUF hf download unsloth/granite-4.0-h-small-GGUF --include BF16/*
glm-4.5-air 106 12
T=0.5
K=0
P=0.95
M=0
UD-Q4-K-XL Gguf 67.7
unsloth/GLM-4.5-Air-GGUF hf download unsloth/GLM-4.5-Air-GGUF --include UD-Q4_K_XL/*
UD-Q6-K-XL Gguf 102
unsloth/GLM-4.5-Air-GGUF hf download unsloth/GLM-4.5-Air-GGUF --include UD-Q6_K_XL/*
gpt-oss-120b 117 5.1
T=1
K=0
P=1
M=0
MXFP4 Gguf 63.4
ggml-org/gpt-oss-120b-GGUF hf download ggml-org/gpt-oss-120b-GGUF
minimax-m2.1 230 10
T=1
K=40
P=0.95
M=0
UD-Q3-K-XL Gguf 101
unsloth/MiniMax-M2.1-GGUF hf download unsloth/MiniMax-M2.1-GGUF --include UD-Q3_K_XL/*