systemd-nspawn now optionally supports colon-separated pair of
host interface name and container interface name for --network-macvlan, --network-ipvlan and --network-interface options.
Also supported in .nspawn configuration files (i.e Interface=, MACVLAN=, IPVLAN= parameters).
man page changed for ntwk interface naming
Previously, when the NULL (all zero) machine ID is configured in the
container, nspawn refused to execute.
Now id128_get_machine() is used, so NULL machine ID is refused with
-ENOMEDIUM, and fallback to specified UUID or randomly generated one.
Otherwise, if getopt() and friends are used before parse_argv(), then
the GNU extensions may be ignored.
This should not change any behavior at least now, as we usually use
getopt_long() only once per invocation. But in the next commit,
getopt_long() will be used for other arrays, hence this change will
become necessary.
Chasing symlinks is a core function that's used in a lot of places
so it deservers a less verbose names so let's rename it to chase()
and chaseat().
We also slightly change the pattern used for the chaseat() helpers
so we get chase_and_openat() and similar.
Whenever we're going to close all file descriptors, we tend to close
the log and set it into open when needed mode. When this is done with
the logging target set to LOG_TARGET_AUTO, we run into issues because
for every logging call, we'll check if stderr is connected to the
journal to determine where to send the logging message. This check
obviously stops working when we close stderr, so we settle the log
target before we do that so that we keep using the same logging
target even after stderr is closed.
We bind mount two selected inodes from the host into our container.
Let's turn off propagation for that, since we just want those inodes,
nothing else.
With this change "grep master: /proc/self/mountinfo" should list only
the mount propagation "tunnel" dir, and nothing else anymore.
@brauner noticed that in invoked containers the root directory is set to
still receive mounts from the host. We should disable that, and
guarantee we live in our own world, because that's what an
(nspawn-style) container *is* after all: a whole new world.
This hence mounts the container subtree to MS_PRIVATE after getting the
root dir in place. Note that this will later be set to MS_SHARED again.
The MS_PRIVATE disconnects mounts from the host, the MS_SHARED then
establishes a new peer group for mount propagation events, so that
payload service managers (such as systemd) can take benefit of
propagation further down the tree.
Since quite a while the propagation from the DDI arch into the
personality() wasn't hooked up anymore. Let's fix that: when the DDI has
a determined arch, automatically propagate this into the personality.
Although this slightly more verbose it makes it much easier to reason
about. The code that produces the tests heavily benefits from this.
Test lists are also now sorted by test name.
When using --private-users, we have to create bind mount points as
the user that will become root in the user namespace, so let's take
that into account.
If we're in a user namespace but not unsharing the network namespace,
we won't be able to bind any privileged ports even with
CAP_NET_BIND_SERVICE, so let's drop it from the retained capabilities
so services can condition themselves on that.
Let's port one more over.
Note that this changes behaviour of file_in_same_dir() in some regards.
Specifically, a trailing slash of the input path will be treated
differently: previously we'd operate below that dir then, instead of the
parent. I think that makes little sense however, and I think the code
using this function doesn't expect that either.
Moroever, addresses some corner cases if the path is specified as "/" or
".", i.e. where e cannot extract a parent. These will now be treated as
error, which I think is much cleaner.
-1 was used everywhere, but -EBADF or -EBADFD started being used in various
places. Let's make things consistent in the new style.
Note that there are two candidates:
EBADF 9 Bad file descriptor
EBADFD 77 File descriptor in bad state
Since we're initializating the fd, we're just assigning a value that means
"no fd yet", so it's just a bad file descriptor, and the first errno fits
better. If instead we had a valid file descriptor that became invalid because
of some operation or state change, the other errno would fit better.
In some places, initialization is dropped if unnecessary.
This is an octal number. We used the 0 prefix in some places inconsistently.
The kernel always interprets in base-8, so this has no effect, but I think
it's nicer to use the 0 to remind the reader that this is not a decimal number.