← Go Back Home

AMD Radeon RX 7600 (Navi 33)
PyTorch Setup on Ubuntu 25

A guide to Native Docker, ROCm (HIP), and GFX Masquerading.

This guide documents the successful setup of a local AI environment on Ubuntu 25 using Native Docker (not Docker Desktop) to run PyTorch with ROCm (HIP) acceleration on an RX 7600.

Note: Since the RX 7600 (gfx1102) is not officially supported by PyTorch binaries (which target gfx1100/RX 7900), this setup uses a masquerading technique via Docker to force compatibility.

1. Prerequisites & Host Setup

A. Driver Verification (Ubuntu 25)

Ubuntu 25's kernel (6.10+) includes the necessary drivers by default. Verify your kernel has loaded the compute drivers:

ls -l /dev/kfd /dev/dri

Success: You see /dev/kfd and /dev/dri/renderD128.
Troubleshooting: If /dev/kfd is missing, run sudo modprobe amdkfd.

B. User Permissions

Your user must belong to the render, video, and docker groups.

sudo usermod -aG docker,video,render $USER

Log out and log back in (or reboot) to apply changes.

C. Docker Setup (Native)

Uninstall Docker Desktop if present (it blocks hardware passthrough) and install Native Docker.

# Remove Docker Desktop
sudo apt-get remove docker-desktop
rm -r $HOME/.docker/desktop
mv ~/.docker ~/.docker_backup

# Install Native Engine
sudo apt update
sudo apt install docker.io docker-compose-v2 docker-buildx-plugin

2. Project Configuration

Create a workspace:

mkdir -p ~/rocm-lab/workspace
cd ~/rocm-lab

File 1: Dockerfile

This image extends the official ROCm PyTorch image and adds the gfx1100 override.

# Base image: Official ROCm 6.0 with PyTorch 2.1.1
FROM rocm/pytorch:rocm6.0_ubuntu22.04_py3.9_pytorch_2.1.1

# --- CRITICAL OVERRIDE ---
# Masquerade RX 7600 (gfx1102) as RX 7900 (gfx1100)
ENV HSA_OVERRIDE_GFX_VERSION=11.0.0

# Install system dependencies
RUN pip install --upgrade pip && \
    pip install transformers accelerate streamlit sentence-transformers pandas scikit-learn jupyterlab

# --- BITSANDBYTES FIX ---
# Required for loading 4-bit/8-bit LLMs on ROCm
RUN pip install https://github.com/ROCm/bitsandbytes/releases/download/rocm_enabled_beta_0.61.0/bitsandbytes-0.61.0+rocm6.0-py3-none-any.whl

WORKDIR /app
EXPOSE 8888 8501
CMD ["tail", "-f", "/dev/null"]

File 2: docker-compose.yml

Handles hardware passthrough and security privileges.

services:
  lab:
    build: .
    container_name: rocm-rx7600
    
    # --- HARDWARE ACCESS ---
    privileged: true             
    security_opt:
      - seccomp:unconfined       
    cap_add:
      - SYS_PTRACE               
    
    # Device Mapping
    devices:
      - "/dev/kfd:/dev/kfd"      
      - "/dev/dri:/dev/dri"      
    
    # Group Passthrough
    group_add:
      - video
      - render
    
    ipc: host
    
    volumes:
      - ./workspace:/app
    
    ports:
      - "8888:8888"
      - "8501:8501"
    
    command: tail -f /dev/null

3. Launching and Verification

Start the Lab:

docker compose up -d --build
docker exec -it rocm-rx7600 bash

Verify Hardware Visibility:

ls -l /dev/kfd
rocm-smi

The Final Test (Python):

python3 -c "import torch; print(f'Device: {torch.cuda.get_device_name(0)}'); x = torch.randn(1024, 1024).cuda(); print(x @ x)"

Should print the device name and a tensor of numbers without error.

4. Troubleshooting

Issue: GPU: False / No HIP GPUs

Cause: Container started before permissions applied or host /dev/kfd blocked.

Fix: Run sudo usermod -aG render $USER. Ensure privileged: true is in compose file. Restart container.

Issue: ls: cannot access '/dev/kfd' (Inside container)

Fix: Ensure security_opt: seccomp:unconfined is in your compose file.

Issue: jupyter: command not found

Fix: Rebuild with docker compose build --no-cache.

Back to Top