Incus: are we macOS yet?

So, as you may have noticed from the changelogs, a bit of work was done to give low-level enthusiasts more control on their VMs by allowing them to interact with the QEMU monitor. I am the author of the feature, and honestly, probably one of its few users. Still, I think it’s pretty neat if you have very specific needs. Mine is being able to run macOS on Incus, and still have Incus manage both storage and network devices.

Disclaimer: read and understand your Apple software license before joining the journey; chances are you don’t have the right to do it. This post assumes you have the right to do it.

Disclaimer 2: this is absolutely not a guide; rather a call for suggestions.

Context

SOTA

Running macOS on QEMU is not new at all; it’s actually pretty easy if your hypervision software lets you do whatever you want with devices. Incus works a bit differently, as the project doesn’t intend to provide support for non-VirtIO virtual devices. Many people think macOS doesn’t ship VirtIO device drivers, but it’s simply not true; what macOS doesn’t support is PCI hotplug. And Incus sure loves its PCI hotplug!

Incus and QEMU

Virtual devices in QEMU are made of two distinct devices: a frontend device, that the guest OS sees, and a backend device, directly linked to the host. Thanks to some pretty dirty QMP trickery, we can actually hot-remap a backend disk (a blockdev) to an already-defined frontend disk (in our case, a good old non-hotpluggable virtual IDE drive). For network devices however, the frontend devices seem more tightly coupled to the backend ones; this is where I’m all ears for good ideas, as mine are getting pretty dry…

Once network devices are realized, there is no way to link them to another netdev; the QMP API is very lacking when it comes to network devices introspection… This means that we have to hotplug them, but as I said, macOS doesn’t know PCI hotplugging. What bus can we then hotplug it to? A USB bus! Unfortunately, it doesn’t work 82% of the time; more on that later.

Let’s get dirty!

QEMU command-line

I tried not to directly touch QEMU’s command-line, but it was unfortunately impossible:

raw.qemu: -blockdev node-name=devzero,driver=raw,file.driver=host_device,file.filename=/dev/zero

This line is needed to setup a temporary block device pointing to /dev/zero. This block device is only used by cold-plugged SATA drives before the hotplugged ones are remapped.

QEMU configuration file

I got pretty liberal when configuring QEMU, mostly because everything is still at a very early testing stage.

raw.qemu.conf: |-
  [device "qemu_keyboard"]
  [device "qemu_tablet"]
  [chardev "qemu_spice-usb-chardev1"]
  [device "qemu_spice-usb1"]
  [chardev "qemu_spice-usb-chardev2"]
  [device "qemu_spice-usb2"]
  [chardev "qemu_spice-usb-chardev3"]
  [device "qemu_spice-usb3"]
  [device "qemu_gpu"]

  [device "usb"]
  driver = "qemu-xhci"

  [device "usb_keyboard"]
  driver = "usb-kbd"
  bus = "usb.0"
  port = "1"

  [device "usb_tablet"]
  driver = "usb-tablet"
  bus = "usb.0"
  port = "2"

  [device "qemu_sata"]
  driver = "ich9-ahci"

  [device "qemu_vga"]
  driver = "VGA"

  [device "apple_smc"]
  driver = "isa-applesmc"
  osk = "<REPLACE THIS WITH APPLE OSK>"

  [device "sata0"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata1"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata2"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata3"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata4"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata5"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata6"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata7"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

This basically removes stuff that macOS doesn’t know how to handle, and creates 8 SATA disks. Additionally, it sets up an XHCI bus directly plugged to the PCI root.

QMP scriptlet

This is where the fun begins :slight_smile:

What I basically want to do is:

  • Remap all my disks to dummy SATA drives;
  • Remap all my NICs to usb-net devices.

What currently works

Storage remapping

raw.qemu.scriptlet: |-
  def remap_storage(dev, drive):
    """
    Remap a storage device onto a SATA port
    :param dev: The dictionary representing the original device
    :param drive: The SATA drive
    """
    # Get data from the device
    qdev = dev["qdev"]
    inserted = dev["inserted"]
    name = inserted["node-name"]
    driver = inserted["drv"]
    path = inserted["file"]
    fdset = path.split('/')[-1]

    log_info("[MacOS subsystem] Remapping disk {} [fdset{}] to {}".format(name, fdset, drive))

    # Add a blockdev with the same FDset
    run_qmp({"execute": "blockdev-add",
             "arguments": {"driver": driver,
                           "node-name": "fdset{}".format(fdset),
                           "filename": path}})

    # Attach this blockdev to the SATA drive
    qom_set(path="/machine/peripheral/{}".format(drive), property="drive", value="fdset{}".format(fdset))

    # Unplug the original device
    if qdev.endswith("/virtio-backend"):
      qdev = qdev[:-15]
    log_info("[MacOS subsystem] Unplugging {}".format(qdev))
    device_del(id=qdev)


  def qemu_hook(stage):
    if stage == "pre-start":
      log_info("[MacOS subsystem] Started")

      # Initialize device IDs
      sata_id = 0

      # For each block device
      for dev in run_command("query-block"):
        # If the device is a non-CD-ROM Incus disk
        if dev["inserted"]["node-name"].startswith("incus_") and "tray_open" not in dev:
          # Remap it
          remap_storage(dev, "sata{}".format(sata_id))
          sata_id += 1

      log_info("[MacOS subsystem] Done")

Plenty of code for something not that hard to phrase: look at which file descriptors are open by which block devices, create clone block devices, attach them to our SATA drives, and delete the old block devices.

I’m actually surprised it works; our disks get remapped and can be seen and handled by macOS!

Kext for usb-net macOS support

macOS doesn’t know which driver to use for usb-net devices by default. I’ve created a very simple codeless Kext which I’ll release once I have tested it on more versions of macOS.

Basically, we need to tell macOS that for idVendor=1317, idProduct=42146 and bcdDevice=0, it should use the appropriate driver (on my machine™, AppleUSBCDCCompositeDevice, in com.apple.driver.usb.cdc).

What doesn’t work

… or at least doesn’t 82% of the time:

  def hmp(command):
    """
    Run an HMP command
    :param command: The command to execute
    """
    return run_qmp({"execute": "human-monitor-command",
                    "arguments": {"command-line": command}})["return"].strip().split("\r\n")


  def remap_network(netdev, dev_name, net_id, fds):
    """
    Remap a network device onto a USB card
    :param netdev: The original netdev name
    :param dev_name: The original device name
    :param net_id: The USB card number
    :param fds: The TAP FDs
    """
    # Get data from the device
    mac = qom_get(path="/machine/peripheral/{}".format(dev_name), property="mac")
    name = "net{}".format(net_id)

    log_info("[MacOS subsystem] Remapping NIC {} [{}] to {}".format(netdev, mac, name))

    if not mac.startswith("40:"):
      # QEMU replaces the first byte of usb-net devices with 0x40, for some reason.
      # We must therefore restrict ourselves to MAC addresses starting with 40:.
      log_error("Network device MAC address must start with 40:. Got {}.".format(mac))
      run_command("<CRITICAL ERROR, CHECK YOUR LOGS>")

    # Add a netdev with the same FDs
    run_qmp({"execute": "netdev_add",
             "arguments": {"type": "tap",
                           "id": name,
                           "fds": ":".join(fds)}})

    # Attach this netdev to a new USB card
    run_qmp({"execute": "device_add",
             "arguments": {"driver": "usb-net",
                           "id": name,
                           "netdev": name,
                           "mac": mac,
                           "bus": "usb.0",
                           "port": net_id + 3,
                          }})

    # Unplug the original device
    run_command("set_link", name=dev_name, up=False)
    log_info("[MacOS subsystem] Unplugging {}".format(dev_name))
    device_del(id=dev_name)


  def qemu_hook(stage):
    ...
      ...
      net_id = 0
      ...
      # Scan the network FDs
      fds = {}
      for line in hmp("info network"):
        if line.startswith(" \\ "):
          netdev = line.split(":")[0][3:]
          if netdev not in fds:
            fds[netdev] = []
          if "fd=" in line:
            fds[netdev].append(line.split("fd=")[1])

      # For each device
      for dev in qom_list(path="/machine/peripheral"):
        # If the device is a VirtIO PCI network device
        if dev["type"] == "child<virtio-net-pci>":
          dev_name = dev["name"]
          # Get its backend netdev
          netdev = qom_get(path="/machine/peripheral/{}".format(dev_name), property="netdev")
          # And remap it
          remap_network(netdev, dev_name, net_id, fds[netdev])
          net_id += 1
      ...

This one is very tricky. First, we scan the output of the HMP command info network (there is no equivalent in QMP unfortunately) and do some dark magic on it to identify the FDs associated with Incus’ TAP devices. We then do something pretty similar to what we did with drives and blockdevs. Strangely, QEMU’s usb-net device implementation hardcodes the first byte of the MAC address to 40:; I’ll have to ask the devs why (if you have any idea, please tell!).

As unbelievable as it may seem, our virtual cards actually get an IP address (well, I only tested with one card)!
But I lose approximately 7 packets out of 8 (the network works for 20 seconds, then doesn’t work for 2 minutes or so). I don’t really know what to blame… is it the multiqueue? some strange race condition? an obscure coupling between frontend and backend devices? macOS is not (yet) to blame there, the behavior is the exact same on Debian.

Wanna help?

If you have any idea which doesn’t say “that’s impossible”, I would really love to read it! And if we meet at the next FOSDEM, I can offer a nice amount of beer :slight_smile:

4 Likes

I’m actually also one of its users, though not because of MacOS but because of VMWare ESXi which I need to run on occasion to work on VMWare to Incus migrations, not having to run it on dedicated hardware is quite nice :slight_smile:

(ESXi is very picky about what devices it supports, so the normal virtio devices don’t really cut it)

1 Like

Interesting post and wonder if the old approach OSX-KVM I used a few years back might give some ideas on how to archive or improve things?

It looks like this project i still active. One day if I have time I will try it out…

Thanks! OSX-KVM doesn’t limit itself to what Incus restricts us to do with QEMU. To get the network working, passing the right options to the QEMU command-line (or config file) works, but it then prevents us to use Incus’ network configuration. It may work with some people (e.g. if you’re just plugging to an existing bridge), but it’s a nightmare if you want to use OVN or even SR-IOV “cleverly”.

The real struggle here is to make macOS happy with what’s available to us. I really think that with the current QMP API, we can’t do much dark magic with network devices, because there’s basically no introspection beyond the info network HMP command. USB emulation seems to be the way, it works with my Kext if I define it in QEMU config file and plug it to a bridge; it doesn’t work with my dirty remapping however.

I’ll have to recompile Incus without multiqueue support to see if that’s the problem. The “good” thing is that you don’t need to virtualize macOS to see the bad behavior, as it’s reproducible in Debian. As I said, I’ll take any idea at this point :), although I’m really hoping it’s just the mq, in which case it’s fixable with a simple (new) configuration key…