In Arch Linux we currently have a half-baked apparmor support,
in particular we cannot link systemd to libapparmor for service
context integration, since that will pull apparmor into base system.
Hence, let's turn this into a dlopen dep.
Ref: https://gitlab.archlinux.org/archlinux/packaging/packages/systemd/-/issues/22
This extends systemd-import-generator to not only download a disk image
at boot, but also attach it to a loopback device, so that we can boot
from it.
We have most of the pieces already in place, this just polishes some
things, to make this round.
The topmost commit contains example command lines that just work to make
`systemd-vmspawn` boot from a `mkosi serve` call.
Note that this does not address how to get the UKI running on the target
system, this only deals with the later boot phase once the UKI is
already running.
This is WIP, because it lacks docs, and I want to do some more
polishing. But it works great.
Ultimate goal, provide a complete solution so that we also can do uefi
http boot for ukis
With this we can now do:
systemd-vmspawn -n -i foobar.raw -s io.systemd.boot.entries-extra:particleos-current.conf=$'title ParticleOS Current\nuki-url http://example.com/somedir/uki.efi'
Assuming sd-boot is available inside the ESP of foobar.raw a new item
will show up in the boot menu that allows booting directly into the
specified UKI.
This one is between "efi" and "linux": we'll recognize such entries as
linux, but we'll just invoke them as EFI binaries.
This creates a high-level concept for invoking UKIs via indirection of a
bls type #1 entry, for example to permit invocation from a non-standard
path or for giving entries a different name.
Companion BLS spec PR:
https://github.com/uapi-group/specifications/pull/135
(Let's rename LOADER_UNIFIED_LINUX to LOADER_TYPE2_UKI at the same time
to reduce confusion what is what)
So far our login in gpt-auto-generator when run in the initrd has been
to generate the units that wait for /dev/gpt-auto-root to show up and
mount them only if we have the loader partition EFI variables set. This
is of course not the case for network boots with a UKI kernel, which
means auto-gpt would not work for mounting the rootfs.
What's nasty is that we don't know for sure whether the "rootdisk"
loopback device will shown up eventually, as it needs explicit
configuration by the user via the kernel cmdline, or could be configured
entirely indepdenently. Hence, let's tweak the logic when we wat for
/dev/gpt-auto-root as device to mount: make the gpt auto root logic a
tristate: if root=gpt-auto is specified on the cmdline *definitely*
enable the logic. If root= is specified and set to anyting else,
*definitely* disable the logic. And if root= is not specified check for
the EFI partition vars – as before – to conditionalized things.
Or in other words, you can now boot the same image either via ESP/local
boot or via netboot with a kernelcmdline image like this:
rd.systemd.pull=verify=no,machine,blockdev,bootorigin,raw:rootdisk:image.raw root=gpt-auto rootflags=x-systemd.device-timeout=infinity ip=any
So far the gpt-auto logic only looked for the partition table of devices
that the ESP/XBOOTLDR partition used to boot was on. This works great
for local boots, but is more problematic if we boot a UKI via UEFI HTTP
boot, because there is no ESP in play in that case.
Let's introduce an alternative to communicate the intended default root
disk to cover for this situation: any loopback block device whose
backing file field (i.e. the userspace controlled freeform field we use
for /dev/disk/by-loop-ref/ naming) is set to "rootdisk" will be consider
for gpt-auto will be consider for gpt-auto.
With this in place we should have nice automatic behaviour:
1. If we are booted locally we'll get the ESP/XBOOTLDR data, and derive
the root disk from that.
2. If we are booted via UEFI HTTP boot we expect that the caller makes
the loopback device appear with the right loop-ref identifier, and
then will use that.
Previously, we'd name the import services numerically. Let's instead use
the local target file name, i.e. the object we are creating with these
services locally. That's useful so that we can robustely order against
these service instances, should we need to one day.
This is useful for bind mounting a freshly downloaded and unpacked tar
disk images to /sysroot to mount into.
Specifically, with a kernel command line like this one:
rd.systemd.pull=verify=no,machine,tar:root:http://_gateway:8081/image.tar root=bind:/run/machines/root ip=any
The first parameter downloads the root image, the second one then binds
it to /sysroot so that we can boot into it.
This is useful in particular in the initrd, as this ensures any
downloaded images are not deleted during the initrd→host transition
(where /var/ does not survive, but /run/ does). Might be useful in other
cases too, for example for transiently deployed confexts and such.
This is useful for booting from a freshly downloaded disk image: just
specify
rd.systemd.pull=verify=no,machine,blockdev,raw:image:https://192.168.100.1:8081/image.raw
root=/dev/disk/by-loop-ref/image.raw-part2
on the kernel command line, and we'll download that in the initrd and boot from it.
(note the above disables download-time verification, putting trust in
verity and image policy that this won#t do harm)
Here's a more complete example. From a git checkout do:
ninja -C build && mkosi -f -T serve
and then from another terminal do within the same checkout:
./build/systemd-vmspawn \
--ram=16G \
--register=no \
-n \
-i ./build/mkosi.output/image.raw \
rd.systemd.pull=verify=no,machine,blockdev,raw:image:http://192.168.100.1:8081/image.raw \
root=/dev/disk/by-loop-ref/image.raw-part2 \
rootflags=x-systemd.device-timeout=infinity \
ip=any
This will then boot via the ESP of the specified image, then download
the image via HTTP from the mkosi instance running in the first
terminal, attach it to a loopback block device, and then use its second
partition as root fs, and boot into it.
(this assumes your host is 192.168.100.1, of course)
Note that downloading the full image takes a bit of time (this downloads
it uncompressed after all), hence we turn off the timeout to wait for
the device.
This also introduces a new "imports.target" unit (and associated
"imports-pre.target") between imports are grouped, and which ensure the
imports actually are ordered correctly both on the host and in the
initrd.
This is mostly just a friendly unit wrapper around "systemd-dissect
--attach".
This is useful so that we can automatically attach disk images as
block device at boot.
This ensures the child process is immediately re-parented to the
manager process which avoids a "Supervising process xxx which is
not our child. We'll most likely not notice when it exits." warning
which can currently happen if the parent systemd-executor parent
process sends the pid namespace child process pidref to the manager
process and the manager process dispatches the child process pidref
before the systemd-executor parent process exits, since at that point
the pid namespace child process's parent will still be the
systemd-executor parent process and not the manager process.
This ensures the child process is immediately re-parented to the
manager process which avoids a "Supervising process xxx which is
not our child. We'll most likely not notice when it exits." warning
which can currently happen if the parent systemd-executor parent
process sends the pid namespace child process pidref to the manager
process and the manager process dispatches the child process pidref
before the systemd-executor parent process exits, since at that point
the pid namespace child process's parent will still be the
systemd-executor parent process and not the manager process.
Let's switch things around, and move the internals of safe_fork_full() into
pidref_safe_fork_full() and make safe_fork_full() a trivial wrapper on top
of pidref_safe_fork_full().
This started out as a simple attempt to fix a race in an existing test
case for io.systemd.Machine.Open(). To address it nicely I added some
machinery to varlinkctl and systemd-notify though, and because of that I
refacting reception of sd_notify() messages in various places of our
codebase. So it became much much bigger.
This ports all receivers of sd_notify() messages over to a new common
implementation, except for one: the one in PID1. It's more powerful than
the others, since it accepts fds too. I think we should generalize
notify_recv() to cover that too, it's not that much more work, but for
now I didn't want to add even more refactorings on top, and this can
easily happen later separately, hence I left it out for now.
While this is obvious if you spend a few minutes thinking about how
D-Bus signals work (in this case, they are broadcast from a system
service, so cannot apply to a specific user/session/seat), it’s a bit
easy to overlook this while putting code together which uses the login1
D-Bus API, so it’s helpful to point this hazard out specifically in the
docs.
The signals can only be emitted on the canonical objects. The
convenience objects are useful for method calls, as the calling context
can be used to dereference ‘self’ and ‘auto’, but this can’t work for
signals.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
Sometimes it's interesting to condition units not just on the
installation but on the physical device. Let's make ConditionHost=
useful for that kind of checks, and while we are at it, also allow it to
be used for condition checks on the boot id.
Overloading like this is safe, since UUIDs are globally unique after
all, and hence there should be no conflicts between the namespace of
boot ids, machine ids and product ids.
Finally, relax rules on uuid checking: if the specified string parses
as uuid or id, also check it against the hostname, for setups where
people name hosts after uuids. I wouldn't know why anyone would do that,
but also, why not? shouldn'rt hurt allowing them and should not create
ambiguity conflicts.