Coral TPU in a docker container inside incus (container Ubuntu Noble)

Hello,

I am trying to use a Coral USB TPU in a docker container. The incus config has the following defined,

config:
  security.nesting: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"
  security.syscalls.intercept.sysinfo: "true"
devices:
  coral:
    gid: "46"
    productid: "9302"
    type: usb
    vendorid: 18d1
  coral1:
    gid: "46"
    productid: 089a
    type: usb
    vendorid: 1a6e

With this i can see the devices in the incus container.

# lsusb 
Bus 004 Device 004: ID 1a6e:089a Global Unichip Corp. 
# ls -la /dev/bus/usb/004/002 
crw-rw-rw- 1 root plugdev 189, 385 Nov  5 10:15 /dev/bus/usb/004/002

Next thing i tried to use this device inside a docker container (running inside incus) using the instructions here. Unfortunately, the device is not usable inside the container.

#docker run -ti --rm --entrypoint=/bin/bash --privileged --mount type=bind,source=/dev,target=/dev ghcr.io/blakeblackshear/frigate:stable
#cd ~  && apt-get -y update  && apt-get -y install curl  && mkdir test_data  && cd test_data  && curl -LO https://coral.ai/static/docs/images/parrot.jpg  && curl -LO https://raw.githubusercontent.com/google-coral/test_data/master/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite  && curl -LO https://raw.githubusercontent.com/google-coral/test_data/master/inat_bird_labels.txt  && cd ..  && curl -LO https://raw.githubusercontent.com/google-coral/pycoral/master/examples/classify_image.py  && python3 classify_image.py   --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite    --labels test_data/inat_bird_labels.txt   --input test_data/parrot.jpg

Traceback (most recent call last):
  File "/root/classify_image.py", line 121, in <module>
    main()
  File "/root/classify_image.py", line 71, in main
    interpreter = make_interpreter(*args.model.split('@'))
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 87, in make_interpreter
    delegates = [load_edgetpu_delegate({'device': device} if device else {})]
  File "/usr/lib/python3/dist-packages/pycoral/utils/edgetpu.py", line 52, in load_edgetpu_delegate
    return tflite.load_delegate(_EDGETPU_SHARED_LIB, options or {})
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1

This used to work with LXD, but i am not able to make it work now. Any idea why this doesnt work?

Thanks

This seems to be happening because Google TPU changes its id on first use. So the device does not get picked up by incus. Is it possible to pass in the whole /dev/bus/usb to incus container rather than a particular USB device?

Ideally there should be udev rules on the host to shift the device into the final mode, then take it over in the container.

This is common with those USB 3G/4G sticks that first appear as CDROM devices, then switch into modem devices with UDEV magic.

I think the (github) issue has such guidance.

1 Like

Right. this scripts seems to do that for proxmox. Wonder if there is a hook script for an incus container?

I am trying this on the host,

# cat /etc/systemd/system/coral.service
[Unit]
Description=Load coral firmware
DefaultDependencies=no
Before=incus.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/dfu-util -D /home/user/apex_latest_single_ep.bin -d \"1a6e:089a\" -R
StandardInput=tty-force

[Install]
WantedBy=incus.service

That looks OK.

What is happening, is that your host is configured to load that Coral firmware through a systemd service.
The command is the following, which means that you need to set it up on your host, test it, then make it a systemd service.

/usr/bin/dfu-util -D /home/user/apex_latest_single_ep.bin -d \"1a6e:089a\" -R

That is,

  1. you have setup your container with the correct final IDs for the device. You stop the container if it is running.
  2. you then connect the device on the host. The device has the wrong, pre-init, IDs.
  3. you run the above command, and verify that the IDs have changed to the ones configured in the container.
  4. you can now start the container and verify that through this manual process, the container works.

Once it works manually, you can perform the subsequent steps to automate the process.

The systemd script is configured in such a way so that

  1. the device is always connected to your host.
  2. when the Incus service is started on bootup, systemd will run that firmware-loading command to configure the device.
  3. you are ready to use the device in the container.

Personally I would opt for a udev solution (autoload the firmware upon connecting the device to the host). But do first the above steps so that you get a baseline setup.

stgraber@castiana:~$ incus config show s-shf-cluster:frigate01 --project stgraber
architecture: x86_64
config:
  boot.autorestart: "true"
  cluster.evacuate: stop
  environment.FRIGATE_RTSP_PASSWORD: 2378c8aa-1f3c-4d20-94c2-86b067c0e17a
  environment.HOME: /root
  environment.LIBVA_DRIVER_NAME: radeonsi
  environment.NVIDIA_DRIVER_CAPABILITIES: compute,video,utility
  environment.NVIDIA_VISIBLE_DEVICES: all
  environment.PATH: /usr/lib/btbn-ffmpeg/bin:/usr/local/go2rtc/bin:/usr/local/nginx/sbin:/usr/local/tempio/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  environment.PLUS_API_KEY: ad34fa39-f958-4ede-822b-8c5da0f241cf:bafc658170ac1f11768c0c6ca11a0113cf7b0a42
  environment.S6_LOGGING_SCRIPT: T 1 n0 s10000000 T
  environment.TERM: xterm
  environment.TZ: America/Toronto
  image.architecture: x86_64
  image.description: ghcr.io/blakeblackshear/frigate (OCI)
  image.type: oci
  limits.cpu: "48"
  limits.cpu.allowance: 800ms/100ms
  limits.memory: 8GiB
  limits.processes: "50000"
  linux.kernel_modules: amdgpu
  security.syscalls.intercept.sysinfo: "true"
  volatile.base_image: 22e3d0b486df52c3d669682254c2b1bf4205fa6ad8bd8f8c9f7fe76b1517005d
  volatile.cloud-init.instance-id: b22d372d-21c7-4476-b031-7ff53c7b73b3
  volatile.container.oci: "true"
  volatile.cpu.nodes: "1"
  volatile.eth0.host_name: veth40ecc482
  volatile.eth0.hwaddr: 00:16:3e:db:90:40
  volatile.eth0.last_state.ip_addresses: 10.1.154.6,2602:fc62:b:8005:216:3eff:fedb:9040
  volatile.idmap.base: "1720896"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1720896,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1720896,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 3766623e-d835-4314-8cdb-647de53224f5
  volatile.uuid.generation: 3766623e-d835-4314-8cdb-647de53224f5
devices:
  coral-early:
    productid: 089a
    required: "false"
    type: usb
    vendorid: 1a6e
  coral-late:
    productid: "9302"
    required: "false"
    type: usb
    vendorid: 18d1
  eth0:
    ipv6.routes.external: 2602:fc62:b:10::7/128
    name: eth0
    network: default
    security.acls: frigate
    type: nic
  frigate-config:
    path: /config
    pool: ssd
    source: frigate-config/config/
    type: disk
  frigate-storage:
    path: /media/frigate
    pool: hdd
    source: frigate-storage
    type: disk
  frigate-templates:
    path: /usr/local/nginx/templates/
    pool: ssd
    source: frigate-config/templates/
    type: disk
  gpu:
    pci: "0000:03:00.0"
    type: gpu
ephemeral: false
profiles:
- default
stateful: false
description: ""

The relevant part here is:

  coral-early:
    productid: 089a
    required: "false"
    type: usb
    vendorid: 1a6e
  coral-late:
    productid: "9302"
    required: "false"
    type: usb
    vendorid: 18d1

Which mostly matches yours. Here it’s been working quite reliably for me even with frequent server reboots and the like.

something like this?

SUBSYSTEMS=="usb", ATTRS{idVendor}=="1a6e", ATTRS{idProduct}=="089a", OWNER="root", SYMLINK+="coral", MODE="0666", GROUP="uucp", RUN+="/usr/bin/dfu-util -D /home/user/apex_latest_single_ep.bin -d \"1a6e:08
9a\" -R"

@stgraber Coral stopped working for me on two servers. In both cases the host is archlinux and container is Ubuntu running docker.

Yep, something like that.