Let's expose local VMs/containers under ._dhcp by default. Let's also
expose WIFI AP clients under .home.arpa (i.e. the RFC8375 domain for
home networks).
These networks are not connections to upstream routers, but where we are
ourselves are the upstream router, hence it doesn't make too much sense
to require them to be up as default to determine if we are "online",
because they lead "in the wrong direction".
Also, disable DHCP lease persistency for these networks, since
container/VM/namespaces are generally shortlived, and typically have no
persistent identity. Moreover, the IP range we assign each VM/container
connection is just too small to permit persistency, as otherwise we'll
run out of leases way too quickly if VM/containers are restarted a bunch of
times with different MAC addresses (which I ran into).
I think these are better defaults, but of course these are only
defaults.
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.