Lxd vm not able to pxe boot with ipv4

Hi,

I have folowing problem with PXE booting a machine using MAAS.


Then what comes next is attached as console log, to sum up the pxe booting over maas starts lxd agent and all network services fail to start, and then the final cloud init stage fails due to NoDataSource error.

My LXD network setup relevant to the problem is:

lxc network show lxdfan0
config:
  bridge.mode: fan
  fan.underlay_subnet: 10.10.11.0/24
  ipv4.nat: "true"
description: ""
name: lxdfan0
type: bridge
managed: true

lxc network show maasbr0
config:
  ipv4.address: 10.41.229.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:63:cfc3:e14e::1/64
  ipv6.nat: "true"
description: ""
name: maasbr0
type: bridge
managed: true


lxc profile show juju-controller-pxe
config:
  limits.cpu: "2"
  limits.memory: 4GB
  security.secureboot: "false"
description: LXD profile for juju controllers starting from pxe
devices:
  eth0:
    boot.priority: "10"
    name: eth0
    network: maasbr0
    type: nic
  root:
    path: /
    pool: remote-lvm
    size: 30GB
    type: disk
name: juju-controller-pxe

According to MAAS | How to troubleshoot MAAS
and to the video I tried to

lxc network set maasbr0 ipv6.dhcp=false
lxc network set maasbr0 ipv4.dhcp=false
lxc network set maasbr0 dns.mode=none

But there was no effect
The only differnece was that it now starts over http(end result is the same - cloud init error) :

lxc console juju-maas-3
To detach from the console, press: <ctrl>+a q

>>Start PXE over IPv4.
  Station IP address is 10.41.229.210

  Server IP address is 10.41.229.1
  NBP filename is bootx64.efi
  NBP filesize is 0 Bytes
  PXE-E99: Unexpected network error.
BdsDxe: failed to load Boot0002 "UEFI PXEv4 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6..
  Station IP address is FD42:63:CFC3:E14E:217:0:0:E66
  Server IP address is FD42:63:CFC3:E14E:216:3EFF:FE69:83CE
  NBP filename is bootx64.efi
  NBP filesize is 955656 Bytes
 Downloading NBP file...

  NBP file downloaded successfully.
BdsDxe: loading Boot0003 "UEFI PXEv6 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)
BdsDxe: starting Boot0003 "UEFI PXEv6 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)
Fetching Netboot Image
Unable to fetch TFTP image: Time out
start_image() returned Time out
BdsDxe: failed to start Boot0003 "UEFI PXEv6 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Time out

>>Start HTTP Boot over IPv4....
  Station IP address is 10.41.229.210

  URI: http://10.41.229.1:5248/images/bootx64.efi

  Error: Could not retrieve NBP file size from HTTP server.

  Error: Unexpected network error.
BdsDxe: failed to load Boot0004 "UEFI HTTPv4 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6....
  Station IPv6 address is FD42:63:CFC3:E14E:217:0:0:C31

  URI: http://[fd42:63:cfc3:e14e:216:3eff:fe69:83ce]:5248/images/bootx64.efi
  File Size: 955656 Bytes
  Downloading...100%BdsDxe: loading Boot0005 "UEFI HTTPv6 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri()
BdsDxe: starting Boot0005 "UEFI HTTPv6 (MAC:00163EAAC84E)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163EAAC84E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri()
error: file `/grub/x86_64-efi/command.lst' not found.
error: file `/grub/x86_64-efi/fs.lst' not found.
error: file `/grub/x86_64-efi/crypto.lst' not found.
error: file `/grub/x86_64-efi/terminal.lst' not found.
error: file `/grub/grub.cfg-00:16:3e:aa:c8:4e' not found.
Booting under MAAS direction...

then

My maas network settings:




Netplan:

network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true
        eth1:
            addresses:
            - 10.10.10.103/24
            gateway4: 10.10.10.1
            nameservers:
                addresses:
                - 1.1.1.1
                search: []
        eth2:
            addresses:
            - 10.10.99.103/24
            nameservers:
                addresses:
                - 1.1.1.1
                search: []
        eth3:
            addresses:
            - 10.41.229.1/24
            dhcp4: false
            dhcp6: false

lxc config show maas3
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu jammy amd64 (20221028_07:42)
  image.os: Ubuntu
  image.release: jammy
  image.serial: "20221028_07:42"
  image.type: squashfs
  image.variant: cloud
  user.network-config: |-
    version: 1
    config:
      - type: physical
        name: eth0
        subnets:
          - type: dhcp
      - type: physical
        name: eth1
        subnets:
          - type: static
            ipv4: true
            address: 10.10.10.103/24
            netmask: 255.255.255.0
            gateway: 10.10.10.1
            control: auto
      - type: physical
        name: eth2
        subnets:
          - type: static
            ipv4: true
            address: 10.10.99.103/24
            netmask: 255.255.255.0
            control: auto
      - type: nameserver
        address: 1.1.1.1
  volatile.base_image: 5ebc511ee7955cf789d0bfd702deb09bc90adb20acf4cdc41af2705c272fec7f
  volatile.cloud-init.instance-id: 161633d9-b19e-40f1-a554-c3fc7ea6d9c8
  volatile.eth0.host_name: veth25c60791
  volatile.eth0.hwaddr: 00:16:3e:36:02:5a
  volatile.eth1.host_name: macd6a61d05
  volatile.eth1.hwaddr: 00:16:3e:29:8f:94
  volatile.eth1.last_state.created: "false"
  volatile.eth2.host_name: macc64a5511
  volatile.eth2.hwaddr: 00:16:3e:c2:8c:5e
  volatile.eth2.last_state.created: "false"
  volatile.eth3.host_name: veth1988fbdd
  volatile.eth3.hwaddr: 00:16:3e:69:83:ce
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 29d32482-29c1-4a01-a1db-2f77877da5d7
devices: {}
ephemeral: false

The only idea I have left is to enable dhcp4 in netplan level but that would make no sense at all.

I tried some other settings of dns.mode in maasbr0 and setting it explicedly to managed let me a couple steps further, since the machine was visible in MAAS machines section and the system loaded and started downloading some packages, but was not able to resolve any ubuntu repo address and eventually failed.

Any help appreciated

Thanks :slight_smile:

I will reply myself, maybe someone will use it in the future.
I recreated a POC environment on AZ cloud, with much simplier setup(one VM host not entire LXD cluster)
The rest was 100 per cent the same as in Exploring MAAS with LXD - YouTube
I got to the point that every vm created in my LXD host with the profile pxe:

lxc profile show pxe           
config:
  limits.cpu: "2"
  limits.memory: 2GB
  security.secureboot: "false"
description: Default LXD profile
devices:
  eth0:
    boot.priority: "5"
    name: eth0
    network: maasbr0
    type: nic
  root:
    path: /
    pool: lvm-local
    size: 20GB
    type: disk
name: pxe
used_by:
- /1.0/instances/test

got only an IPv6 address :\ and that is thing I cannot fathom by any means. Since:

lxc network show lxdbr0 
config:
  ipv4.address: 11.11.11.11/24
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/maas
- /1.0/profiles/default
- /1.0/profiles/maas
managed: true
status: Created
locations:
- none
root@lxd-vm:~# lxc network show maasbr0
config:
  ipv4.address: 10.28.87.1/24
  ipv4.nat: "true"
  ipv4.dhcp: "false"
  ipv6.address: fd42:4122:1df5:3403::1/64
  ipv6.nat: "true"
  ipv6.dhcp: "false"
description: ""
name: maasbr0
type: bridge
used_by:
- /1.0/instances/maas
- /1.0/instances/test
- /1.0/profiles/pxe
managed: true
status: Created
locations:
- none
root@lxd-vm:~# 

DHCP is turned off for both, nevertheless the way a machine gets discovered is via HTTPv6 which makes completely no sense.
When machine is discovered and passes comissioning I attempted to install os but got my favourite cloudinit error, this time I did not give up and changed network layout


by adding auto assigned(MAAS-provided in the image) IPv6 address and IPv4 address as an alias since you cannot assign ipv4 and ipv6 addresses to one MAC the other way(at least nothing I know of).
It then let me to install the machine.

Then I created VM in project configured in lxd kvm host in KVM in maas with one difference, explicidly telling it to get an ipv4 address by:

lxc profile show  pxe-kvm
config:
  limits.cpu: "2"
  limits.memory: 2GB
  security.secureboot: "false"
description: ""
devices:
  eth0:
    boot.priority: "5"
    ipv4.address: 10.28.87.15
    name: eth0
    network: maasbr0
    type: nic
  root:
    path: /
    pool: lvm-local
    size: 20GB
    type: disk
name: pxe-kvm

The outcome is the same though

 NBP file downloaded successfully.
BdsDxe: loading Boot0003 "UEFI PXEv6 (MAC:00163E74FBEE)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163E74FBEE,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)
BdsDxe: starting Boot0003 "UEFI PXEv6 (MAC:00163E74FBEE)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163E74FBEE,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)
Fetching Netboot Image
Unable to fetch TFTP image: Time out
start_image() returned Time out
BdsDxe: failed to start Boot0003 "UEFI PXEv6 (MAC:00163E74FBEE)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163E74FBEE,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Time out

>>Start HTTP Boot over IPv4.....

But my question is, WHY DID I HAVE TO STRUGGLE WITH THIS IPV6? If someone can explain it I will greatly appreciate that.

UPDATE from 4.11.2022
Using my POC environment I was able to move some steps further:

>>Start PXE over IPv4.
  Station IP address is 10.41.229.115

  Server IP address is 10.41.229.111
  NBP filename is bootx64.efi
  NBP filesize is 955656 Bytes
 Downloading NBP file...

  NBP file downloaded successfully.
BdsDxe: loading Boot0002 "UEFI PXEv4 (MAC:00163E314F05)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163E314F05,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)
BdsDxe: starting Boot0002 "UEFI PXEv4 (MAC:00163E314F05)" from PciRoot(0x0)/Pci(0x1,0x4)/Pci(0x0,0x0)/MAC(00163E314F05,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)
Fetching Netboot Image
Booting under MAAS direction...

I am pxe booting using IPV4
I achieved this by removing the dhcp4 and dhcp6 false from netplan and adding address which is not 10.41.229.1/24:

network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true
        eth1:
            addresses:
            - 10.41.229.111/24
        eth2:
            addresses:
            - 10.10.10.102/24
            gateway4: 10.10.10.1
            nameservers:
                addresses:
                - 1.1.1.1

My maasbr0 now looks like this:

lxc network show maasbr0
config:
  dns.mode: none
  ipv4.address: 10.41.229.1/24
  ipv4.dhcp: "false"
  ipv4.nat: "true"
  ipv6.address: fd42:63:cfc3:e14e::1/64
  ipv6.nat: "true"
description: ""
name: maasbr0
type: bridge

I am disabling dns in that network as well as disabling dhcp4
It helped to some point but I was getting either:


when no networks were allowed to use maas as DNS(makes sense, since we disabled dns on the network)

the latter scenario was :

when the network was specified:

I did not give up and tried to change something on the subnet level:

It turned out to be the key setting(combined with leaving SETTINGS – DNS – List of external networks (not previously known), that will be allowed to use MAAS for DNS resolution
empty), the packages started downloading:

I also had to make sure that proxy is turned off as it was on for some strange reason:

then comissioning scripts come into play:


The last log is
Script result - maas-capture-lldpd changed status from 'Running' to 'Passed'
and then the machine turns off by itself :smiley:
I attach the entire log

Funny fact is that the entire comissioning seems to have succeeded:


Node has now status NEW

What worries me now is the problem with installing lxd agent, it does not seem to be the reason why it turns off(seems to be standard maaas behaviour), the fact that I am not able to add power type to the configuration section seems to be blocking deployment, but it’s rather my prod env’s fault.

Nevertheless the problem with lxd agent during initialization does not seem to be the standard behaviour, I saw that there were some posts addressing the issue and in most of cases the kernel was to blame for that. Has anyone tested it? In my case it is the problem on both ubuntu 22 and 20.
@stgraber have you maybe experienced the same when taking the MAAS + LXD video?
Thanks

========================================================================

link update to the first post