Mark Stratmann
Running LLMs Background

Running Local LLMs on the Framework Desktop (Max+ 395 - 128GB)

Run local LLMs using both llama.cpp and Ollama with Docker containers

This article documents how I've configured my Framework Desktop PC (Max+ 395 with 128GB unified memory) to run local LLMs using both llama.cpp and Ollama with Docker containers.

Hardware Overview

The Framework Desktop Max+ 395 features the AMD Ryzen AI Max 395 processor with integrated Radeon graphics (Strix Halo), featuring:

  • 128GB of unified memory (shared CPU/GPU)
  • AMD RDNA 3.5 architecture
  • gfx1151 GPU target

https://frame.work/gb/en/desktop

GPU Memory Configuration

To enable the full 128GB of unified memory for GPU workloads, the following configuration is required. These instructions synthesize the Ubuntu-based guide from technigmaai-wiki with the LLM benchmark setup from lhl/strix-halo-testing for Fedora 43.

BIOS Setup

  1. Reboot and enter BIOS/UEFI
  2. Set Integrated Graphics/UMA Frame Buffer Size to 512MB
  3. Disable IOMMU

Fedora 43 GRUB Configuration

Edit /etc/default/grub and modify the GRUB_CMDLINE_LINUX_DEFAULT line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432"

Then update GRUB and reboot:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo reboot

Verify Memory Settings

After reboot, verify the parameters were applied:

cat /proc/cmdline
sudo dmesg | grep -i gtt
sudo dmesg | grep -i ttm

This configuration enables approximately 128GB of GTT memory for GPU workloads, which is essential for running large models like Qwen3-Coder-Next with its 256K context window.

Alternative Kernel Module Configuration

For more granular control, create /etc/modprobe.d/amdgpu_llm_optimized.conf:

options amdgpu gttsize=120000
options ttm pages_limit=31457280
options ttm page_pool_size=15728640

Then regenerate initramfs:

sudo dracut --force

Why Fedora 43 for LLMs?

While many guides target Ubuntu, Fedora 43 offers several advantages for local LLM workloads:

More Recent Kernel: Fedora 43 ships with kernel 6.18.4+, which provides better support for ROCm 7.x and the Ryzen AI Max 395's GPU

Better ROCm Support: The Linux kernel's AMDGPU driver improvements in newer kernels translate to better unified memory management and VRAM allocation for LLM workloads

Package Freshness: Fedora's rolling release model provides more recent versions of key dependencies, like:

  • LLVM/Clang for HIP compilation
  • Vulkan drivers (Mesa RADV/AMDVLK)
  • CMake and build toolchains
  1. SELinux Considerations: While SELinux requires additional configuration (like container_use_devices=1), it provides better security isolation for containerized LLM workloads

The key sources that informed this approach are:

Docker-Based LLM Containers

I've created Docker containers based on kyuz0/amd-strix-halo-toolboxes to run both llama.cpp and Ollama.

Container Prerequisites

Before running any containers, enable SELinux to allow container access to GPU devices:

sudo setsebool container_use_devices=1

This is a one-time configuration that needs to persist across reboots:

sudo setsebool -P container_use_devices=1

Ollama with Vulkan (Working)

The Vulkan backend works perfectly with Ollama on Fedora 43.

Dockerfile: ollama-vulkan/Dockerfile Docker Compose: ollama-vulkan/docker-compose.yml

# Ollama + Vulkan on Strix Halo (Fedora 43)
FROM registry.fedoraproject.org/fedora-minimal:43

# Base runtime deps + Vulkan userspace
RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \
      bash ca-certificates curl tar \
      libatomic libstdc++ libgcc \
      vulkan-loader vulkan-loader-devel vulkaninfo \
      mesa-vulkan-drivers radeontop \
      pciutils procps-ng wget gzip zstd \
  && microdnf clean all && rm -rf /var/cache/dnf/*

# Install AMDVLK (optional, can use Mesa RADV)
RUN curl -L -o /tmp/amdvlk-2025.Q2.1.x86_64.rpm \
    https://github.com/GPUOpen-Drivers/AMDVLK/releases/download/v-2025.Q2.1/amdvlk-2025.Q2.1.x86_64.rpm \
 && microdnf -y install /tmp/amdvlk-*.rpm \
 && rm -f /tmp/amdvlk-*.rpm

# Install Ollama (generic Linux build with Vulkan support)
RUN wget -P /tmp https://github.com/ollama/ollama/releases/download/v0.15.5-rc2/ollama-linux-amd64.tar.zst \
 && tar --zstd -C /usr -xf /tmp/ollama-linux-amd64.tar.zst \
 && rm -f /tmp/ollama-linux-amd64.tar.zst

RUN mkdir -p /root/.ollama

ENV OLLAMA_VULKAN=1 \
    GGML_VK_VISIBLE_DEVICES=0 \
    OLLAMA_HOST=0.0.0.0 \
    OLLAMA_ORIGINS="*"

EXPOSE 11434

CMD ["/usr/bin/ollama", "serve"]

Launch Ollama with Vulkan:

# Build the image
docker build -t ollama-strix-vulkan -f ollama-vulkan/Dockerfile .

# Or using Docker Compose
cd ollama-vulkan
docker compose up -d

Access Ollama:

ollama list
ollama pull qwen3-coder-next

Ollama with ROCm (Not Working - Pending Fix)

The Ollama ROCm implementation currently does not work on Fedora 43 with ROCm 7.x. The issue is tracked at ROCm issue #5902.

Dockerfile: ollama-rocm/Dockerfile Docker Compose: ollama-rocm/docker-compose.yml

# Ollama + Vulkan on Strix Halo (Fedora 43)
FROM registry.fedoraproject.org/fedora:43

RUN dnf -y --nodocs --setopt=install_weak_deps=False install \
  make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \
  radeontop git vim patch curl ninja-build tar xz aria2c wget zstd \
  && dnf clean all && rm -rf /var/cache/dnf/*


# find & fetch the latest Linux 7.x.x tarball (gfx1151)
WORKDIR /tmp
ARG ROCM_MAJOR_VER=7
ARG GFX=gfx1151
RUN set -euo pipefail; \
  BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \
  PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \
  KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \
  | tr '<' '\n' \
  | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \
  | sort -V | tail -n1)"; \
  echo "Latest tarball: ${KEY}"; \
  aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz
RUN mkdir -p /opt/rocm-7.0 \
  && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

# Install the Ollama ROCm drivers (v0.15.2)
RUN wget -P /tmp https://github.com/ollama/ollama/releases/download/v0.15.5-rc3/ollama-linux-amd64-rocm.tar.zst \
 && tar -C /usr --use-compress-program=unzstd -xf /tmp/ollama-linux-amd64-rocm.tar.zst \
 && rm -f /tmp/ollama-linux-amd64-rocm.tar.zst

# Install ollama (v0.15.2)
RUN wget -P /tmp https://github.com/ollama/ollama/releases/download/v0.15.5-rc3/ollama-linux-amd64.tar.zst \
 && tar -C /usr --use-compress-program=unzstd -xf /tmp/ollama-linux-amd64.tar.zst \
 && rm -f /tmp/ollama-linux-amd64.tar.zst

# Make Ollama + ROCm shared libs visible to the runtime linker
RUN printf '%s\n' \
  /usr/lib/ollama \
  /opt/rocm-7.0/lib \
  > /etc/ld.so.conf.d/ollama-rocm.conf \
  && ldconfig

# Create /opt/rocm symlink that Ollama expects
RUN ln -sfn /opt/rocm-7.0 /opt/rocm

# Data directory
RUN mkdir -p /root/.ollama

# Expose Ollama API port
EXPOSE 11434

# profile
RUN printf '%s\n' \
  'export ROCBLAS_USE_HIPBLASLT=1' \
  > /etc/profile.d/rocm.sh && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

ENV OLLAMA_HOST=0.0.0.0 \
    OLLAMA_ORIGINS="*"

# Start the server
CMD ["/usr/bin/ollama", "serve"]

The ROCm backend fails with "out of memory" errors even when the system reports ~111GB available VRAM. This appears to be a regression in ROCm 7.2 affecting Ollama's memory calculation.

Current Status: Awaiting upstream fix from AMD/ROCm team.

llama.cpp with Vulkan

Dockerfile: llamacpp/Dockerfile.llamacpp-strix-vulkan

# build stage
FROM registry.fedoraproject.org/fedora:43 AS builder

# deps
RUN dnf -y --nodocs --setopt=install_weak_deps=False install \
      git vim \
      make gcc cmake ninja-build lld clang clang-devel compiler-rt libcurl-devel \
      vulkan-loader-devel vulkaninfo mesa-vulkan-drivers \
      radeontop glslc \
    && dnf clean all && rm -rf /var/cache/dnf/*

# llama.cpp
WORKDIR /opt/llama.cpp
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git .

# build
RUN git clean -xdf \
 && git submodule update --recursive \
 && cmake -S . -B build -G Ninja \
      -DGGML_VULKAN=ON \
      -DCMAKE_BUILD_TYPE=Release \
      -DGGML_RPC=ON \
      -DCMAKE_INSTALL_PREFIX=/usr \
      -DLLAMA_BUILD_TESTS=OFF \
      -DLLAMA_BUILD_EXAMPLES=ON \
      -DLLAMA_BUILD_SERVER=ON \
 && cmake --build build --config Release \
 && cmake --install build --config Release

# libs
RUN find /opt/llama.cpp/build -type f -name 'lib*.so*' -exec cp {} /usr/lib64/ \; \
 && ldconfig


# runtime stage
FROM registry.fedoraproject.org/fedora-minimal:43

# runtime deps
RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \
      bash ca-certificates libatomic libstdc++ libgcc \
      vulkan-loader vulkan-loader-devel vulkaninfo mesa-vulkan-drivers radeontop \
  && microdnf clean all && rm -rf /var/cache/dnf/*

# copy
COPY --from=builder /usr/ /usr/
COPY --from=builder /usr/local/ /usr/local/
COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/

# ld
RUN echo "/usr/local/lib"  > /etc/ld.so.conf.d/local.conf \
 && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \
 && ldconfig \
 && cp -n /usr/local/lib/libllama*.so* /usr/lib64/ 2>/dev/null || true \
 && ldconfig

# shell
CMD ["/bin/bash"]

The Vulkan backend for llama.cpp is stable and well-tested. It provides reliable performance for all model sizes.

Docker Compose Entry:

qwen-3-coder-vulkan:
  image: llamacpp-strix-vulkan
  container_name: llamacpp
  restart: unless-stopped
  devices:
    - /dev/dri:/dev/dri
  group_add:
    - "video"
  volumes:
    - /home/mark/running-llms/:/root/running-llms
  ports:
    - "8080:8080"
  security_opt:
    - seccomp=unconfined
  command: >
    bash -c "llama-server --alias Qwen3-Coder-30B -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-BF16/BF16/Qwen3-Coder-30B-A3B-Instruct-1M-BF16-00001-of-00002.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 0.7 --top-k 20 --min-p 0.01 --top-p 0.8 --repeat-penalty 1.05 --jinja -ngl 99 --threads -1"

llama.cpp with ROCm (Faster - ~30% Performance Boost)

Dockerfile: llamacpp/Dockerfile.llamacpp-rocm

# build
FROM registry.fedoraproject.org/fedora:43 AS builder

RUN dnf -y --nodocs --setopt=install_weak_deps=False install \
  make gcc cmake lld clang clang-devel compiler-rt libcurl-devel \
  radeontop git vim patch curl ninja-build tar xz aria2c \
  && dnf clean all && rm -rf /var/cache/dnf/*

# find & fetch the latest Linux 7.x.x tarball (gfx1151)
WORKDIR /tmp
ARG ROCM_MAJOR_VER=7
ARG GFX=gfx1151
RUN set -euo pipefail; \
  BASE="https://therock-nightly-tarball.s3.amazonaws.com"; \
  PREFIX="therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}"; \
  KEY="$(curl -s "${BASE}?list-type=2&prefix=${PREFIX}" \
  | tr '<' '\n' \
  | grep -o "therock-dist-linux-${GFX}-${ROCM_MAJOR_VER}\..*\.tar\.gz" \
  | sort -V | tail -n1)"; \
  echo "Latest tarball: ${KEY}"; \
  aria2c -x 16 -s 16 -j 16 --file-allocation=none "${BASE}/${KEY}" -o therock.tar.gz
RUN mkdir -p /opt/rocm-7.0 \
  && tar xzf therock.tar.gz -C /opt/rocm-7.0 --strip-components=1

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

WORKDIR /opt/llama.cpp
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git . \
  && git clean -xdf \
  && git submodule update --recursive

RUN cmake -S . -B build \
  -DGGML_HIP=ON \
  -DAMDGPU_TARGETS=gfx1151 \
  -DCMAKE_BUILD_TYPE=Release \
  -DGGML_RPC=ON \
  -DLLAMA_HIP_UMA=ON \
  && cmake --build build --config Release -- -j$(nproc) \
  && cmake --install build --config Release

# keep bin; drop headers/docs/static libs (retain llama.cpp for rpc binaries)
RUN find /opt/rocm-7.0 -type f -name '*.a' -delete \
  && rm -rf /opt/rocm-7.0/include /opt/rocm-7.0/share \
  /opt/rocm-7.0/llvm/include /opt/rocm-7.0/llvm/share

# runtime
FROM registry.fedoraproject.org/fedora-minimal:43

RUN microdnf -y --nodocs --setopt=install_weak_deps=0 install \
  bash ca-certificates libatomic libstdc++ libgcc radeontop vim procps-ng \
  && microdnf clean all && rm -rf /var/cache/dnf/*

COPY --from=builder /opt/rocm-7.0 /opt/rocm-7.0
COPY --from=builder /usr/local/ /usr/local/
COPY --from=builder /opt/llama.cpp/build/bin/rpc-* /usr/local/bin/

# COPY gguf-vram-estimator.py /usr/local/bin/
# RUN chmod +x /usr/local/bin/gguf-vram-estimator.py

ENV ROCM_PATH=/opt/rocm-7.0 \
  HIP_PLATFORM=amd \
  HIP_PATH=/opt/rocm-7.0 \
  HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin \
  HIP_INCLUDE_PATH=/opt/rocm-7.0/include \
  HIP_LIB_PATH=/opt/rocm-7.0/lib \
  HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode \
  PATH=/opt/rocm-7.0/bin:/opt/rocm-7.0/llvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
  LD_LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64:/opt/rocm-7.0/llvm/lib \
  LIBRARY_PATH=/opt/rocm-7.0/lib:/opt/rocm-7.0/lib64 \
  CPATH=/opt/rocm-7.0/include \
  PKG_CONFIG_PATH=/opt/rocm-7.0/lib/pkgconfig

RUN printf '%s\n' \
  'export ROCM_PATH=/opt/rocm-7.0' \
  'export HIP_PLATFORM=amd' \
  'export HIP_PATH=/opt/rocm-7.0' \
  'export HIP_CLANG_PATH=/opt/rocm-7.0/llvm/bin' \
  'export HIP_INCLUDE_PATH=/opt/rocm-7.0/include' \
  'export HIP_LIB_PATH=/opt/rocm-7.0/lib' \
  'export HIP_DEVICE_LIB_PATH=/opt/rocm-7.0/lib/llvm/amdgcn/bitcode' \
  'export PATH="$ROCM_PATH/bin:$HIP_CLANG_PATH:$PATH"' \
  'export LD_LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64:$ROCM_PATH/llvm/lib"' \
  'export LIBRARY_PATH="$HIP_LIB_PATH:$ROCM_PATH/lib:$ROCM_PATH/lib64"' \
  'export CPATH="$HIP_INCLUDE_PATH"' \
  'export PKG_CONFIG_PATH="$ROCM_PATH/lib/pkgconfig"' \
  > /etc/profile.d/rocm.sh \
  && chmod +x /etc/profile.d/rocm.sh \
  && echo 'source /etc/profile.d/rocm.sh' >> /etc/bashrc

# make /usr/local libs visible without touching env
RUN echo "/usr/local/lib"  > /etc/ld.so.conf.d/local.conf \
  && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/local.conf \
  && ldconfig

CMD ["/bin/bash"]

The ROCm backend for llama.cpp provides approximately 30% better performance than Vulkan. This is because ROCm is AMD's native GPU computing platform, optimized for compute-heavy workloads like LLM inference.

Docker Compose Entry:

qwen-3-coder-rocm:
  image: llamacpp-rocm
  container_name: llamacpp
  restart: unless-stopped
  devices:
    - /dev/dri:/dev/dri
    - /dev/kfd:/dev/kfd
  group_add:
    - "video"
    - "render"
  volumes:
    - /home/mark/running-llms/:/root/running-llms
  ports:
    - "8080:8080"
  security_opt:
    - seccomp=unconfined
  command: >
    bash -c "llama-server --alias Qwen3-Coder-30B -m /root/running-llms/hf-models/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-BF16/BF16/Qwen3-Coder-30B-A3B-Instruct-1M-BF16-00001-of-00002.gguf --ctx-size 262144 -fa 1 --no-mmap --host 0.0.0.0 --port 8080 --temp 0.7 --top-k 20 --min-p 0.01 --top-p 0.8 --repeat-penalty 1.05 --jinja -ngl 99 --threads -1"

Running Models

Qwen3-Coder-Next (80B MoE)

This is where the Framework Desktop really shines. I can run the full UD-Q8_K_XL version of Qwen3-Coder-Next with the complete 256K context window.

Links:

The model specs:

  • Architecture: 80B MoE (3B active parameters)
  • Context Window: 262,144 tokens
  • Memory Required: ~93.4 GB for UD-Q8_K_XL (8-bit)
  • Recommended Settings: temp=1.0, top_p=0.95, top_k=40, min_p=0.01

Current Model Setup

My docker-compose.yml defines multiple services:

ServiceBackendModelContext
qwen-3-coder-next-rocmROCmQwen3-Coder-Next (UD-Q8_K_XL)262k
qwen-3-coder-next-vulkanVulkanQwen3-Coder-Next262k
qwen-3-next-rocmROCmQwen3-Next-80B-A3B-Thinking32k
gpt-oss-rocmROCmgpt-oss-120b-GGUF131k
glm-4.7VulkanGLM-4.716k

Launch Commands

Start Ollama service:

cd ollama-vulkan
docker compose up -d

Start llama.cpp for a specific model:

docker compose up -d qwen-3-coder-next-rocm

Stop all services:

docker compose down

Agentic Workflows and Claude Code

I've been running fully autonomous Claude Code sessions for many hours on this configuration. The Framework Desktop has written the majority of this article through these autonomous sessions.

Claude Code Configuration

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://your-framework-desktop-ip:11434

claude --model qwen3-coder-next

Note: Replace your-framework-desktop-ip with your actual domain or IP address.

Current Issues

I've encountered one issue with Ollama in agentic workflows - see Ollama issue #13939. Claude Code occasionally attempts to use model names that aren't available locally, resulting in timeouts.

Workaround: Using llama.cpp directly provides more reliable results for agentic workflows on my setup. The stability and consistency are noticeably better.

Why I'm Thrilled with This Setup

Infinite "Tokens" for Free

This configuration allows me to "burn" infinite LLM tokens 24x7 for free. While paid models are faster, this setup enables:

  • Endless experimentation with different models and prompts
  • Long-running autonomous agent sessions
  • No per-token costs to worry about
  • Complete data privacy and offline operation

Performance Characteristics

  • Slower than paid models: Yes, but the difference is acceptable for most tasks
  • Better for coding: The local models excel at code completion and understanding
  • Unlimited context: The 256K context window on Qwen3-Coder-Next is game-changing

The Future

This configuration has been incredibly productive. I'm now:

  • Running fully autonomous Claude Code sessions for hours
  • Experimenting with different quantizations and model architectures
  • Developing custom agent workflows that leverage the local GPU

I'll be adding an article soon about my Claude Code setup and how I configure it to work with these local models.

Attribution and Thanks

This setup would not have been possible without the incredible work of the open-source community. A huge thank you to:

These resources represent significant effort and represent the cutting edge of local LLM inference on AMD hardware.