Linux for Tegra (L4T) networking issues

Hey @tomp any red flags in my config output?

No that all seems fine.

Can you add an IP manually inside the container and try pinging the bridge, e.g.

lxc exec test -- ip a add 10.26.226.2/24 dev eth0
lxc exec test -- ping 10.26.226.1

Assigning the IP address worked (and is reflected in lxc ls) but for ping, I get ping: socket: Operation not permitted

This is consistent with my description – I suspect if I make the container privileged ping will work, but I also suspect I’ll get IP addresses from DHCP.

I wonder if there is some sort of additional security settings on the host OS that is preventing unprivileged users from sending raw packets (most likely used for ping and DHCP).

@stgraber @brauner is there anything you can think of that could cause this behavior?

Feels like it could be a kernel restriction or LSM maybe…

Anything suspicious in dmesg?

@brauner

Is ping either setid or has the CAP_NET_RAW capability set?

Running as root though

Hey just following up that I’m still interested in working through this topic – LXD on Linux for Tegra will be critical for building snaps targeting the platform.

Do let me know if there’s additional information I can collect!

Can we get a login to it?

I’ll do my best to set that up today. However in the recent past I was unsuccessful getting SSH ports forwarded through my Verizon FIOS router so I’m not optimistic.

@tomp – actually looks like ssh port forwarding to the device is working. No idea what was going on the last time I tried.

Anyway – how do you want to handle access? Maybe you can share a public key and I’ll create a sudoers account on the device for you?

FYI as mentioned over on the snapcraft thread I’ve managed to get snapcraft building snaps inside an LXD container on L4T through a combination of a privileged container, exclusion from apparmor, and some other raw.lxc changes.

I’m not sure whether this is actually the solution to the core LXD problem on L4T, but I also don’t really care since my objective is just to build snaps inside LXD containers on this platform. I’m happy to finish helping debug if the core LXD team is interested to do so, though – let me know.

My ssh public key is here: https://launchpad.net/~tomparrott/+sshkeys

sudo root access would be needed.

Thanks

ssh uskellse@108.52.92.202 -p 55555 should get you in. Passwordless sudo enabled. Feel free to modify anything; no critical data or state is stored on the machine.

Please let me know when you’re done so I can lock everything back down… :smiley:

Thanks, so I took a look.

You’re running the 4.9.140-tegra kernel, which isn’t a standard ubuntu kernel, so there could be something unusual in its configuration.

I did notice however that if I add an IP inside the container manually using:

ip a add 10.1.185.2/24 dev eth0

Then I can ping that IP from the LXD host successfully, proving the bridge and veth-pair is running OK.

Additionally I can actually make DNS requests from inside the container to the LXD bridge, e.g.:

dig @10.1.185.1 www.google.com

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> @10.1.185.1 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49501
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		123	IN	A	172.217.7.228

So outbound traffic is allowed too.

Also interestingly if I remove the static IP and then run the dhclient command manually, DHCP succeeds and a dynamic IP and default is configured.

This then allows me to do curl http://www.google.com successfully also.

So it seems that networking is up and running OK, but that any application that tries to access (I’m guessing) raw sockets is denied.

@stgraber @brauner is there anything that can block raw socket access running as root inside a user namespace?

strace ping 10.1.185.1, so it does look like RAW sockets are blocked:

socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP) = -1 EACCES (Permission denied)
socket(AF_INET, SOCK_RAW, IPPROTO_ICMP) = -1 EPERM (Operation not permitted)
socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = -1 EACCES (Permission denied)
socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6) = -1 EPERM (Operation not permitted)

Has cap_net_raw:

capsh --print | grep cap_net_raw
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read

So I asked @stgraber whether he had any ideas on what could be causing the issue and he downloaded the Tegra custom kernel source and tracked the issue down to what appears to be a bug that has been introduced into the custom kernel. When opening raw sockets, rather than checking the namespace capabilities (which it does in the vanilla kernel) it is checking the global capabilities in the root namespace. And as the container is running unprivileged it does not have global CAP_NET_RAW capability and fails.

I’m not sure if you can see this, but the diff is here: https://paste.ubuntu.com/p/rJZ8hfhFHD/

So this is a custom kernel issue and not something we can fix I’m afraid.

@tomp, @stgraber – big thanks for your time spent digging in to this one, and for the detailed writeup. Understood you won’t be able to fix the custom kernel issue.

Is running in privileged mode a reasonable workaround for this issue or are other dragons lurking (at least from what you’ve seen – I know you can’t answer that question with 100% certainty)?

If so, I’ll take this back to the snapcraft discourse to work through the next round of issues with snapcraft --use-lxd with privileged containers.

Its OK to run privileged as long as you’re not running untrusted workloads and can accept the potential for escaping the container and/or possibility of influencing the host OS from the container.

If you are just using it for logical separation of workloads then that may be acceptable to you.

Cool. Thanks very much for the help.