# LXC VDI with GPU and Flatpak

This guide explains how to get GPU accelerated applications and full desktop-remoting with unprivileged containers working. It further explains how to get nested containers to run. This not only includes things like docker, podman, or kubernetes but also flatpaks as these too (by their use of namespaces and cgroups) are containers.

---

## GPU Acceleration

The first and easiest to solve hurdle is GPU acceleration. For this the UI can be used. Since PVE 9 one simply has to add `/dev/dri/cardX` and `/dev/dri/renderD12Y` with X and Y corresponding to the GPU to be used. With single GPU systems this is usually card0 and renderD128, however in multi GPU systems (or on special hardware like the Raspi) it can also be card1 or renderD127. The Raspi for example splits the GPU into card0 and card1 in addition to renderD128.

Now the dialogue has the option to supply the container with a user and group id (uid and gid). The reason for this option is to make the (unprivileged) container aware of how to map the hosts uid/gid for the given device-node onto a corresponding id inside the container. This is vital for direct rendering interface (dri) devices, as these can only be accessed by root or by applications (or users) with the `video` or `render` groups.

This also means however, that the uid for the `card0` and `renderD128` device nodes can be ignored here as it is usually just 0 for root. The gid however is vital. So for this to work one needs to know what the gid of `video` and `render` is inside the container. This is easy to find however, one just needs to look into `/etc/group`.

### Hardware Specific Tips

* For AMD cards one can also add `/dev/kfd`, as this is required for ROCm.
* For Raspis, if the `v4l2video` module is loaded and therefore `/dev/video{10..30}` exists, these can also be passed through.
* The `/dev/video{10..30}` device nodes (one suffices) can (for example) be used for video-transcoding in Jellyfin. This is required there as modern Jellyfin versions no longer implement the Raspi specific `mmal` en/decoder and use the Vulkan implementation on the platform agnostic `/dev/videoX` interfaces.

---

## GPU Accelerated Graphical Remoting

This can be very easily accomplished via `waypipe` simply install it on both the container and the device meant to connect from and you are golden. Invocation is something like this:

```bash
# Start waypipe connection
waypipe -c none ssh <user>@<containerFQDN-or-ip>
```

This produces basically an SSH session and acts the same as `ssh -X` or `ssh -Y` did in Xorg/X11 times, only that now with wayland this is naturally GPU accelerated without hacky GLX indirection tricks like VirtualGL. So to start a desktop session from there one simply needs to run something like:

```bash
# Start a wayland session
dbus-run-session -- startplasma-wayland
```

NOTE: This will have neither shared clipboard nor audio. These can be done manually though via use of the `wl-clipboard` package on both ends as well as networked audio sinks via pipewire. Clipboard sharing can then be done via something like: `wl-paste | ssh <user>@<lxc> wl-copy`.

---

## Nested cgroups, podman and docker

One would think that simply providing the `nesting` feature flag is enough for this to work as according to the docs it "[...] expose(s) procfs and sysfs contents of the host to the guest". A look into `/var/lib/lxc/<containerid>/config` however will show the following relevant line:

```text
lxc.mount.auto = sys:mixed
```

When trying to actually run a container like a flatpak the error will say something about not being able to write to `/proc/sys/user/max_cgroup_namespaces`. The solution therefore is to add the following line to `/etc/pve/lxc/<container-id>.conf`:

```text
lxc.mount.auto: sys:mixed proc:rw
```

This will (upon generation of the `config` file under `/var/lib/lxc/<containerID>` at container start) modify the line accordingly. Now nested cgroups will work.

---

## Docker, Podman in rootless mode and kubernetes

If one wants to run docker, then in accordance with the docs, the `keyctl` feature flag needs to be supplied as well, as docker is special insofar as it needs to be able to use this syscall. What this feature flag does under the hood is to add the following line to `/var/lib/lxc/<container-id>/rules.seccomp`:

```text
keyctl errno 38
```

Podman does not need this feature flag. However to run podman in rootless mode will throw the following error unless additional changes are made:

```text
ERRO[0000] running `/usr/bin/newuidmap 1068 0 1000 1 1 100000 65536`: newuidmap: write to uid_map failed: Operation not permitted
Error: cannot set up namespace using "/usr/bin/newuidmap": exit status 1
```

The reason for this error is that the configured default subuid and subgid maps under `/etc/sub{g,u}id` cannot be applied to the system due to having only 65536 ids as configured host side.

### Fixing ID Mapping

**Variant 1: Host side extension**
The assigned id range on the host side is extended by the default offset inside the guest (by another 100,000). Change the content of `/etc/sub{u,g}id` from `root:100000:65536` to `root:100000:165536`. propagate this to `/etc/pve/lxc/<container-id>.conf`:

```text
lxc.idmap: u 0 100000 165536
lxc.idmap: g 0 100000 165536
```

**Variant 2: Container side change**
Change `/etc/sub{g,u}id` on the container side to not exceed the allotted 65536 ids by changing both the offset and the range to something within the 65535 ids (e.g. `<user>:20000:10000`).

### Network and Kubernetes

If a tun network interface error appears, add this to `/etc/pve/lxc/<container-id>.conf`:

```text
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net dev/net none bind,create=dir
```

Kubernetes (or at least the k3s distribution of it) has a special caveat of writing log-output to `/dev/kmsg` which does not exist in a container.
This however is trivial to fix by just creating a symlink under that path to `/dev/console` via `ln -s /dev/kmsg /dev/console`

---

## Proxying Mountpoints from the Host

Mounts can be easily proxied from the host by mounting with `-o uid=101000` on the host to have the UID match the uid mapped first unprivileged user (uid 1000 on debian) inside the container. Add the following line to `/etc/pve/lxc/<container-id>.conf`:

```text
mp0: <mountpoint-host>,mp=<mountpoint-container>
```

---

## Use cases

* Using an AMD based Mini-PCs full GPU power for GPU tasks like Video Editing, panorama stitching, etc.
* Running LocalAI with the Vulkan Backend.
* Playing games.
* Running a Kubernetes node or Docker in an unprivileged container.