Llocalhost

Systems

This page lists all hardware systems used in the benchmarks.

Name GPU CPU RAM OS
Bosgame M5 - AMD Ryzen AI Max+ 395 "Strix Halo" 128 GB (2025) AMD Radeon 8060S (AI Max+ 395) AMD Zen 5 128 GB LPDDR5x 8000 MHz Ubuntu Server
Gaming-PC (2023) Nvidia RTX 4080 16 GB Intel Core i7-13700K 16 GB DDR5 6400 MHz (2x) Linux Mint
Refurb-Rig (2025) AMD Radeon Mi50 32 GB (3x) AMD EPYC 7F52 64 GB DDR4 2666 MHz (8x) Ubuntu Server

System Notes

Components, setup, resources, issues,...

Bosgame M5 - Ryzen AI Max+ 395 "Strix Halo"

Memory setup

  • BIOS: UMA_SPECIFIED = 512M
  • /etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=30408704 ttm.page_pool_size=30408704 amdttm.pages_limit=30408704 amdttm.page_pool_size=30408704"
  • Sets 116 GB (116 * 1024 * 1024 / 4 KB per page) for GPU, leaves 12 GB for OS
  • Check: sudo dmesg | grep -i "GTT"
  • Setting up unified memory for Strix Halo correctly on Ubuntu

Other resources

Software support

  • ROCm 7.2 has worse performance with llama.cpp than 7.1.1
  • Latest vLLM Navi docker image has good PP speed but slow TG (rocm7.2_navi_ubuntu24.04_py3.12_pytorch_2.9_vllm_0.14.0rc0)
  • Old vLLM Navi image has issues with some models (rocm7.1.1_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1)
  • → Not running vLLM for now

Refurb-Rig

  • Mainboard: Supermicro H11SSL-i
  • Had to update the BIOS in order to make the third PCIe x16 slot work
  • MI50s are cooled with a 5W 40mm server fan each, which is not enough for sustained high (PP) loads, despite the terrible noise
  • Thermal throttling sometimes causes a setup with 2x MI50 to be faster than the same setup with one MI50, as splitting the load helps a lot with keeping temps low
  • RAM is getting hot and needs airflow
  • >100W in idle, loud — cannot recommend as an on-demand-inference device, better suited for concentrated workloads