Incus: are we macOS yet?

So, as you may have noticed from the changelogs, a bit of work was done to give low-level enthusiasts more control on their VMs by allowing them to interact with the QEMU monitor. I am the author of the feature, and honestly, probably one of its few users. Still, I think it’s pretty neat if you have very specific needs. Mine is being able to run macOS on Incus, and still have Incus manage both storage and network devices.

Disclaimer: read and understand your Apple software license before joining the journey; chances are you don’t have the right to do it. This post assumes you have the right to do it.

Disclaimer 2: this is absolutely not a guide; rather a call for suggestions.

Context

SOTA

Running macOS on QEMU is not new at all; it’s actually pretty easy if your hypervision software lets you do whatever you want with devices. Incus works a bit differently, as the project doesn’t intend to provide support for non-VirtIO virtual devices. Many people think macOS doesn’t ship VirtIO device drivers, but it’s simply not true; what macOS doesn’t support is PCI hotplug. And Incus sure loves its PCI hotplug!

Incus and QEMU

Virtual devices in QEMU are made of two distinct devices: a frontend device, that the guest OS sees, and a backend device, directly linked to the host. Thanks to some pretty dirty QMP trickery, we can actually hot-remap a backend disk (a blockdev) to an already-defined frontend disk (in our case, a good old non-hotpluggable virtual IDE drive). For network devices however, the frontend devices seem more tightly coupled to the backend ones; this is where I’m all ears for good ideas, as mine are getting pretty dry…

Once network devices are realized, there is no way to link them to another netdev; the QMP API is very lacking when it comes to network devices introspection… This means that we have to hotplug them, but as I said, macOS doesn’t know PCI hotplugging. What bus can we then hotplug it to? A USB bus! Unfortunately, it doesn’t work 82% of the time; more on that later.

Let’s get dirty!

QEMU command-line

I tried not to directly touch QEMU’s command-line, but it was unfortunately impossible:

raw.qemu: -blockdev node-name=devzero,driver=raw,file.driver=host_device,file.filename=/dev/zero

This line is needed to setup a temporary block device pointing to /dev/zero. This block device is only used by cold-plugged SATA drives before the hotplugged ones are remapped.

QEMU configuration file

I got pretty liberal when configuring QEMU, mostly because everything is still at a very early testing stage.

raw.qemu.conf: |-
  [device "qemu_keyboard"]
  [device "qemu_tablet"]
  [chardev "qemu_spice-usb-chardev1"]
  [device "qemu_spice-usb1"]
  [chardev "qemu_spice-usb-chardev2"]
  [device "qemu_spice-usb2"]
  [chardev "qemu_spice-usb-chardev3"]
  [device "qemu_spice-usb3"]
  [device "qemu_gpu"]

  [device "usb"]
  driver = "qemu-xhci"

  [device "usb_keyboard"]
  driver = "usb-kbd"
  bus = "usb.0"
  port = "1"

  [device "usb_tablet"]
  driver = "usb-tablet"
  bus = "usb.0"
  port = "2"

  [device "qemu_sata"]
  driver = "ich9-ahci"

  [device "qemu_vga"]
  driver = "VGA"

  [device "apple_smc"]
  driver = "isa-applesmc"
  osk = "<REPLACE THIS WITH APPLE OSK>"

  [device "sata0"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata1"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata2"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata3"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata4"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata5"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata6"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

  [device "sata7"]
  driver = "virtio-blk-pci"
  drive = "devzero"
  share-rw = "on"

This basically removes stuff that macOS doesn’t know how to handle, and creates 8 SATA disks. Additionally, it sets up an XHCI bus directly plugged to the PCI root.

QMP scriptlet

This is where the fun begins :slight_smile:

What I basically want to do is:

  • Remap all my disks to dummy SATA drives;
  • Remap all my NICs to usb-net devices.

What currently works

Storage remapping

raw.qemu.scriptlet: |-
  def remap_storage(dev, drive):
    """
    Remap a storage device onto a SATA port
    :param dev: The dictionary representing the original device
    :param drive: The SATA drive
    """
    # Get data from the device
    qdev = dev["qdev"]
    inserted = dev["inserted"]
    name = inserted["node-name"]
    driver = inserted["drv"]
    path = inserted["file"]
    fdset = path.split('/')[-1]

    log_info("[MacOS subsystem] Remapping disk {} [fdset{}] to {}".format(name, fdset, drive))

    # Add a blockdev with the same FDset
    run_qmp({"execute": "blockdev-add",
             "arguments": {"driver": driver,
                           "node-name": "fdset{}".format(fdset),
                           "filename": path}})

    # Attach this blockdev to the SATA drive
    qom_set(path="/machine/peripheral/{}".format(drive), property="drive", value="fdset{}".format(fdset))

    # Unplug the original device
    if qdev.endswith("/virtio-backend"):
      qdev = qdev[:-15]
    log_info("[MacOS subsystem] Unplugging {}".format(qdev))
    device_del(id=qdev)


  def qemu_hook(stage):
    if stage == "pre-start":
      log_info("[MacOS subsystem] Started")

      # Initialize device IDs
      sata_id = 0

      # For each block device
      for dev in run_command("query-block"):
        # If the device is a non-CD-ROM Incus disk
        if dev["inserted"]["node-name"].startswith("incus_") and "tray_open" not in dev:
          # Remap it
          remap_storage(dev, "sata{}".format(sata_id))
          sata_id += 1

      log_info("[MacOS subsystem] Done")

Plenty of code for something not that hard to phrase: look at which file descriptors are open by which block devices, create clone block devices, attach them to our SATA drives, and delete the old block devices.

I’m actually surprised it works; our disks get remapped and can be seen and handled by macOS!

Kext for usb-net macOS support

macOS doesn’t know which driver to use for usb-net devices by default. I’ve created a very simple codeless Kext which I’ll release once I have tested it on more versions of macOS.

Basically, we need to tell macOS that for idVendor=1317, idProduct=42146 and bcdDevice=0, it should use the appropriate driver (on my machine™, AppleUSBCDCCompositeDevice, in com.apple.driver.usb.cdc).

What doesn’t work

… or at least doesn’t 82% of the time:

  def hmp(command):
    """
    Run an HMP command
    :param command: The command to execute
    """
    return run_qmp({"execute": "human-monitor-command",
                    "arguments": {"command-line": command}})["return"].strip().split("\r\n")


  def remap_network(netdev, dev_name, net_id, fds):
    """
    Remap a network device onto a USB card
    :param netdev: The original netdev name
    :param dev_name: The original device name
    :param net_id: The USB card number
    :param fds: The TAP FDs
    """
    # Get data from the device
    mac = qom_get(path="/machine/peripheral/{}".format(dev_name), property="mac")
    name = "net{}".format(net_id)

    log_info("[MacOS subsystem] Remapping NIC {} [{}] to {}".format(netdev, mac, name))

    if not mac.startswith("40:"):
      # QEMU replaces the first byte of usb-net devices with 0x40, for some reason.
      # We must therefore restrict ourselves to MAC addresses starting with 40:.
      log_error("Network device MAC address must start with 40:. Got {}.".format(mac))
      run_command("<CRITICAL ERROR, CHECK YOUR LOGS>")

    # Add a netdev with the same FDs
    run_qmp({"execute": "netdev_add",
             "arguments": {"type": "tap",
                           "id": name,
                           "fds": ":".join(fds)}})

    # Attach this netdev to a new USB card
    run_qmp({"execute": "device_add",
             "arguments": {"driver": "usb-net",
                           "id": name,
                           "netdev": name,
                           "mac": mac,
                           "bus": "usb.0",
                           "port": net_id + 3,
                          }})

    # Unplug the original device
    run_command("set_link", name=dev_name, up=False)
    log_info("[MacOS subsystem] Unplugging {}".format(dev_name))
    device_del(id=dev_name)


  def qemu_hook(stage):
    ...
      ...
      net_id = 0
      ...
      # Scan the network FDs
      fds = {}
      for line in hmp("info network"):
        if line.startswith(" \\ "):
          netdev = line.split(":")[0][3:]
          if netdev not in fds:
            fds[netdev] = []
          if "fd=" in line:
            fds[netdev].append(line.split("fd=")[1])

      # For each device
      for dev in qom_list(path="/machine/peripheral"):
        # If the device is a VirtIO PCI network device
        if dev["type"] == "child<virtio-net-pci>":
          dev_name = dev["name"]
          # Get its backend netdev
          netdev = qom_get(path="/machine/peripheral/{}".format(dev_name), property="netdev")
          # And remap it
          remap_network(netdev, dev_name, net_id, fds[netdev])
          net_id += 1
      ...

This one is very tricky. First, we scan the output of the HMP command info network (there is no equivalent in QMP unfortunately) and do some dark magic on it to identify the FDs associated with Incus’ TAP devices. We then do something pretty similar to what we did with drives and blockdevs. Strangely, QEMU’s usb-net device implementation hardcodes the first byte of the MAC address to 40:; I’ll have to ask the devs why (if you have any idea, please tell!).

As unbelievable as it may seem, our virtual cards actually get an IP address (well, I only tested with one card)!
But I lose approximately 7 packets out of 8 (the network works for 20 seconds, then doesn’t work for 2 minutes or so). I don’t really know what to blame… is it the multiqueue? some strange race condition? an obscure coupling between frontend and backend devices? macOS is not (yet) to blame there, the behavior is the exact same on Debian.

Wanna help?

If you have any idea which doesn’t say “that’s impossible”, I would really love to read it! And if we meet at the next FOSDEM, I can offer a nice amount of beer :slight_smile:

4 Likes

I’m actually also one of its users, though not because of MacOS but because of VMWare ESXi which I need to run on occasion to work on VMWare to Incus migrations, not having to run it on dedicated hardware is quite nice :slight_smile:

(ESXi is very picky about what devices it supports, so the normal virtio devices don’t really cut it)

1 Like

Interesting post and wonder if the old approach OSX-KVM I used a few years back might give some ideas on how to archive or improve things?

It looks like this project i still active. One day if I have time I will try it out…

Thanks! OSX-KVM doesn’t limit itself to what Incus restricts us to do with QEMU. To get the network working, passing the right options to the QEMU command-line (or config file) works, but it then prevents us to use Incus’ network configuration. It may work with some people (e.g. if you’re just plugging to an existing bridge), but it’s a nightmare if you want to use OVN or even SR-IOV “cleverly”.

The real struggle here is to make macOS happy with what’s available to us. I really think that with the current QMP API, we can’t do much dark magic with network devices, because there’s basically no introspection beyond the info network HMP command. USB emulation seems to be the way, it works with my Kext if I define it in QEMU config file and plug it to a bridge; it doesn’t work with my dirty remapping however.

I’ll have to recompile Incus without multiqueue support to see if that’s the problem. The “good” thing is that you don’t need to virtualize macOS to see the bad behavior, as it’s reproducible in Debian. As I said, I’ll take any idea at this point :), although I’m really hoping it’s just the mq, in which case it’s fixable with a simple (new) configuration key…

So as I wrote here, turns out I was very wrong and my NIC remapping method actually works well. I’m quite upset I didn’t accuse the kernel before and to be fair, I’m surprised by the tight coupling between QEMU and the kernel in this particular case. Oh well, I’ve wasted a lot of time on that, but it was quite informative.

Anyway, now is time for a status update: I’m getting closer to a working macOS support. And with some of my contributions to Incus in the previous months, I can now propose a scriptlet-only solution (although, please note that it’s still an early WIP).

The scriptlet

# Some devices don’t make a lot of sense in the macOS world, at least for now
DELETED_DEVICES = {
  'chardev': ['spice-usb-chardev1', 'spice-usb-chardev2', 'spice-usb-chardev3'],
  'device': ['gpu', 'keyboard', 'spice-usb1', 'spice-usb2', 'spice-usb3', 'tablet', 'usb']
}

# On the other hand, a few devices need to be added
ADDED_DEVICES = {
  'device': {'apple_smc': {'driver': 'isa-applesmc', 'osk': '<REPLACE THIS WITH APPLE OSK>'},
             'qemu_sata': {'driver': 'ich9-ahci'},
             'qemu_vga': {'driver': 'VGA'},
             'qemu_usb': {'driver': 'qemu-xhci'},
             'usb_keyboard': {'driver': 'usb-kbd', 'bus': 'qemu_usb.0'},
             'usb_tablet': {'driver': 'usb-tablet', 'bus': 'qemu_usb.0'}}
}


def remap_storage(dev, drive):
  """
  Remap a storage device onto a SATA port
  :param dev: The dictionary representing the original device
  :param drive: The SATA drive
  """
  # Get data from the device
  qdev = dev['qdev']
  inserted = dev['inserted']
  fdset = 'fdset{}'.format(inserted['file'].split('/')[-1])

  log_info('[macOS scriptlet] Remapping disk {} to {}'.format(inserted['node-name'], drive))

  # Add a blockdev with the same FDset
  run_qmp({'execute': 'blockdev-add',
           'arguments': {'aio': 'native',
                         'cache': {'direct': True, 'no-flush': False},
                         'discard': 'unmap',
                         'driver': inserted['drv'],
                         'filename': inserted['file'],
                         'locking': 'off',
                         'node-name': fdset,
                         'read-only': inserted['ro']}})

  # Attach this blockdev to the SATA drive
  qom_set(path='/machine/peripheral/{}'.format(drive), property='drive', value=fdset)

  # Unplug the original device
  if qdev.endswith('/virtio-backend'):
    qdev = qdev[:-15]
  log_info('[macOS scriptlet] Unplugging {}'.format(qdev))
  device_del(id=qdev)


def hmp(command):
  """
  Run an HMP command
  :param command: The command to execute
  """
  return run_qmp({'execute': 'human-monitor-command',
                  'arguments': {'command-line': command}})['return'].strip().split('\r\n')


def remap_network(netdev, dev_name, net_id, fds):
  """
  Remap a network device onto a USB card
  :param netdev: The original netdev name
  :param dev_name: The original device name
  :param net_id: The USB card number
  :param fds: The TAP FDs
  """
  # Get data from the device
  mac = qom_get(path='/machine/peripheral/{}'.format(dev_name), property='mac')
  name = 'net{}'.format(net_id)

  log_warn('[macOS scriptlet] Remapping NIC {} [{}] to {}; consider setting `io.bus: usb`'
           .format(netdev, mac, name))

  # Add a netdev with the same FDs
  netdev_add(type='tap', id=name, fds=':'.join(fds))

  # Attach this netdev to a new USB card
  device_add(driver='usb-net', id=name, netdev=name, mac=mac, bus='qemu_usb.0')

  # Unplug the original device
  run_command('set_link', name=dev_name, up=False)
  log_info('[macOS scriptlet] Unplugging {}'.format(dev_name))
  device_del(id=dev_name)


def patch_config(devices):
  """
  Patch QEMU configuration
  :param devices: The expanded devices dictionary
  """
  log_info('[macOS scriptlet] Reconfiguring QEMU')

  # Initialize a dummy block device for hot-remapping purposes
  set_qemu_cmdline(get_qemu_cmdline() +
                   ['-blockdev', 'node-name=devzero,driver=raw,'+
                                 'file.driver=host_device,file.filename=/dev/zero'])

  # Get initial QEMU configuration
  initial_conf = get_qemu_conf()
  conf = []

  # Remove a few unusable devices
  deleted = ['{} "qemu_{}"'.format(prefix, name)
             for (prefix, devices) in DELETED_DEVICES.items() for name in devices]
  for device in initial_conf:
    name = device['name']
    if name in deleted:
      continue
    conf.append(device)

  # Add necessary devices
  added = {'{} "{}"'.format(prefix, name): value
           for (prefix, devices) in ADDED_DEVICES.items()
           for (name, value) in devices.items()}
  for (name, entries) in added.items():
    conf.append({'name': name, 'entries': entries})

  # Add placeholder SATA disks
  sata_count = 0
  for device in devices.values():
    if device['type'] == 'disk':
      conf.append({'name': 'device "sata{}"'.format(sata_count),
                   'comment': 'Automatically generated SATA disk',
                   'entries': {'driver': 'virtio-blk-pci', 'drive': 'devzero',
                               'share-rw': 'on'}})
      sata_count += 1

  # Set the new configuration
  set_qemu_conf(conf)


def remap_devices():
  """Remap QEMU devices"""
  log_info('[macOS scriptlet] Remapping devices')

  # Initialize device numbers
  sata_id = 0
  net_id = 0

  # For each block device
  for dev in run_command('query-block'):
    # If the device is a non-CD-ROM Incus disk
    if dev['inserted']['node-name'].startswith('incus_') and 'tray_open' not in dev:
      # Remap it
      remap_storage(dev, 'sata{}'.format(sata_id))
      sata_id += 1

  # Scan the network FDs
  fds = {}
  for line in hmp('info network'):
    if line.startswith(' \\ '):
      netdev = line.split(':')[0][3:]
      if netdev not in fds:
        fds[netdev] = []
      if 'fd=' in line:
        fds[netdev].append(line.split('fd=')[1])

  # For each device
  for dev in qom_list(path='/machine/peripheral'):
    # If the device is a VirtIO PCI network device
    if dev['type'] == 'child<virtio-net-pci>':
      dev_name = dev['name']
      # Get its backend netdev
      netdev = qom_get(path='/machine/peripheral/{}'.format(dev_name), property='netdev')
      # And remap it
      remap_network(netdev, dev_name, net_id, fds[netdev])
      net_id += 1


def qemu_hook(instance, stage):
  if stage == 'config':
    patch_config(instance.expanded_devices)
  elif stage == 'pre-start':
    remap_devices()

I’ve basically kept everything, and ported the code to the latest Incus version, using a single scriptlet without polluting other raw keys. There’s still quite a bit of work to do, but the project is back on track!

What’s next?

I can now install macOS without any issue, so I’ll need to test interacting with the OS, plugging devices and debug SPICE features. Then, some packaging work will be required, as getting macOS running is a bit trickier than just inserting an ISO and clicking “next”. I hope to give more news soon!

4 Likes

Great to see those scriptlets working well!

Looking forward to having this be reliable and for us to start providing some example scriptlets with the Incus documentation.

I guess the next question will be whether we can get this working on a Mac too.
I have one of those cheap M4 Mac Mini here and they support nested virtualization, so I can use Colima to get Incus set up which can then run nested arm64 VMs. No idea how different the arm64 builds of MacOS are though, but that may be an interesting experiment to futureproof the work as Apple will likely soon abandon x86.

1 Like

Well for now the scriptlet itself is good enough, but I guess I’ll have a bit of work to simplify how the installation works. The usual virtualized macOS setup requires initially 3 disks: a blank one to install macOS, one for the OpenCore bootloader, and one for the recovery medium (that actually downloads and installs macOS). Then it gets reduced to 2 after preinstallation (macOS + OpenCore), which can be merged with some manual work. I’m still hoping to build an ISO combining OpenCore and the recovery medium, then patch the EFI partition after macOS has been preinstalled. I’m imagining this installation workflow:

  • Boot to a custom EFI program setting the BootNext variable to boot to another EFI program after the preinstallation (explained after) and chainloading OpenCore
  • Boot to OpenCore launching macOS recovery and rebooting after preinstallation
  • Boot thanks to BootNext to a secondary EFI program patching the preinstalled macOS drive to install OpenCore in its EFI partition, (bonus point if I manage to eject the ISO from the EFI program,) clear BootNext, then chainload the newly installed OpenCore
  • Finish the installation

It’s gonna be tough, but a good EDK II refresher for me :slight_smile:

Let’s not rush into that for now. As you can see, there’s quite a busy roadmap for now :smiley:

Yeah and I suspect they’ve changed quite a few things with the move to arm64 :slight_smile:

Quick update to announce that all the code and configuration are now available on GitHub. There’s no guide to bootstrap it, but it shouldn’t be too hard for people already familiar with OpenCore.

I’m glad that I now can build single ISOs that directly boot into the macOS recovery image without any interaction from the user, thanks to some funky NVRAM manipulation. The macOS ecosystem sure loves its NVRAM :slight_smile:

I’m gonna pause the project for probably a few weeks, as I now have an Incus+LINSTOR cluster to deploy at work. The next step requires to do some EDK II programming, which will take me quite some time as I haven’t done that for a good while. I hope to give some good news next month!