Systems

This page lists all hardware systems used in the benchmarks.

Name	GPU	CPU	RAM	OS
Bosgame M5 - AMD Ryzen AI Max+ 395 "Strix Halo" 128 GB (2025)	AMD Radeon 8060S (AI Max+ 395)	AMD Zen 5	128 GB LPDDR5x 8000 MHz	Ubuntu Server
Gaming-PC (2023)	Nvidia RTX 4080 16 GB	Intel Core i7-13700K	16 GB DDR5 6400 MHz (2x)	Linux Mint
Refurb-Rig (2025)	AMD Radeon Mi50 32 GB (3x)	AMD EPYC 7F52	64 GB DDR4 2666 MHz (8x)	Ubuntu Server

System Notes

Components, setup, resources, issues,...

BIOS: UMA_SPECIFIED = 512M
/etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=30408704 ttm.page_pool_size=30408704 amdttm.pages_limit=30408704 amdttm.page_pool_size=30408704"
Sets 116 GB (116 * 1024 * 1024 / 4 KB per page) for GPU, leaves 12 GB for OS
Check: sudo dmesg | grep -i "GTT"
Setting up unified memory for Strix Halo correctly on Ubuntu

ROCm 7.2 has worse performance with llama.cpp than 7.1.1
Latest vLLM Navi docker image has good PP speed but slow TG (rocm7.2_navi_ubuntu24.04_py3.12_pytorch_2.9_vllm_0.14.0rc0)
Old vLLM Navi image has issues with some models (rocm7.1.1_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1)
→ Not running vLLM for now

Mainboard: Supermicro H11SSL-i
Had to update the BIOS in order to make the third PCIe x16 slot work
MI50s are cooled with a 5W 40mm server fan each, which is not enough for sustained high (PP) loads, despite the terrible noise
Thermal throttling sometimes causes a setup with 2x MI50 to be faster than the same setup with one MI50, as splitting the load helps a lot with keeping temps low
RAM is getting hot and needs airflow
>100W in idle, loud — cannot recommend as an on-demand-inference device, better suited for concentrated workloads