systemd-coredump sandbox already has ProtectSystem=strict hence all non
API filesystems are made read-only, thus RestrictSUIDSGID= doesn't buy
us much.
On top of that systemd-coredump's EnterNamespace= feature requires
openat2() to work correctly and that is implicitly blocked by
RestrictSUIDSGID=.
Follow-up for 8f8148cb08
This combines nicely with the X_SYSTEMD_UNIT_INACTIVE= notification
we send out, to ensure when all sshd units go down the actual
status is always reflected on the target.
We already check existences of quotaon in quotaon@.service and
quotacheck in systemd-quotacheck@.service.
Let's also check if kmod command exists.
Closes#38179.
Systemd-stub supports loading addons, credentials, system and
configuration
extensions from ESP and while addons and credentials can be both global
and
per-UKI, sysext/confext are only per-UKI.
Add support for global sysext/confext to systemd-stub/systemd-sysext.
Fixes#37993
vmspawn systems might take quite a while to boot in particular if they
go through uefi and wait for a network lease. Hence let's increase the
start timeout to 2min (from 45s). We'll do that for both nspawn and
vmspawn, even though the UEFI thing certainly doesn't apply there (but
the DHCP thing still does).
Load global sysext/confext from /.extra/global_{sysext,confext} which
systemd-stub puts there from ESP/loader/credentials/*.{sysext,confext}.raw.
Global extensions are handled the exact same way as per-UKI ones.
To implement --bind-user in systemd-vmspawn, we need a transient
version of these credentials. These are useful when the home directory
of the user is mounted into the container/vm and every trace of the user
will be (mostly) gone again when the container/vm is shut down.
closes#37602, see there for extra motivation and considered
alternatives.
On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a
default to globally restrict creation of suid/sgid files makes it easier
to apply this restriction precisely.
## testing done
- built on aarch64-linux and x86_64-linux
- ran a VM test on x86_64-linux, checking for:
- VM system boots successfully
- defaults apply (both `yes`, `no`, and undefined)
- systemd tmpfiles can set suid/sgid on journal log path
- Other services explicitly defining `RestrictSUIDSGID=no` can create
suid files
This is very similar to 327cd2d3db:
If emergency.target is started while initrd-cleanup.service/start is queued,
the initrd-cleanup job did not get canceled. In parallel to the emergency
units, it eventually runs the service, which in turn isolates and starts
initrd-switch-root.target. This stops the emergency units and effectively
starts the initrd boot process again, which likely fails again like the
initial attempt. The system is thus stuck in a loop, never really reaching
emergency.target.
This can be triggered if a service in between initrd-parse-etc.service
and initrd.target fails.
With this conflict added, starting emergency.target automatically cancels
initrd-cleanup.service/start, avoiding the loop.
ssh-generator: generate /etc/issue.d/ with VSOCK ssh info data
I find myself trying to log into a fresh ParticleOS VM started via
systemd-vmspawn all the time, but I don't know its CID. Let's show it on
the getty screen, to make it immediately visible.
By default agetty will not display /run/issue.d/ if /etc/issue exists.
This is quite unfortunate and has actually been fixed upstream in:
508fb0e7ac
However, no release has been tagged with this yet, and it doesn't look
like this will happen any time soon. Hence, for now, let's add a
work-around and manually override the issue files to include.
This should be reverted once a new util-linux/agetty release has been
tagged, and found its way into the relevant distributions. Given this is
mostly about cosmetics we do not have to precisely sync the package
updates on this, but only roughly.
Refer to d766c75acd for the rationale
behind the udevd change.
systemd-journald.service conflicts with soft-reboot.target,
so make sure anything surviving soft-reboot and trying
to log to journal doesn't fail the socket units.
This partially reverts d766c75acd.
The offending commit tries to block systemd-udevd.service
from being activated during switch-root, but it is a dirty hack
and causes problems with e.g. Ctrl-Alt-Delete handling which
actually need to start a conflicting target. Let's revert
this here, and the original issue will be resolved in a cleaner
fashion in later commits.
[1] says:
> Since 0.60.0 the name argument is optional and defaults to the basename of
> the first output
We specify >= 0.62 as the supported version, so drop the duplicate name in all cases
where it is the same as outputs[0], i.e. almost all cases.
[1] https://mesonbuild.com/Reference-manual_functions.html#custom_target
We need access to /dev/net/tun, hence make sure we can actually see
/dev/. Also make sure the module is properly loaded before we operate,
given that we run with limit caps. But then again give the CAP_NET_ADMIN
cap, since we need to configure the network tap/tun devices.
Follow-up for: 1365034727
This also makes initrd-cleanup.service explicitly start
initrd-switch-root.service with replace-irreversibly mode, to avoid
systemd-udevd.service being triggered by kernel events and the start
job of initrd-switch-root.service being cancelled.
Follow-ups for 676fb42aae.
Addresses https://github.com/systemd/systemd/pull/37374#issuecomment-2875990471.
The Synchronize() function is just too useful for clients, so that we
can make "systemd-run -v --user" actually useful. Hence let's make the
socket accessible without privs. Deny most method calls however, except
for the Synchronize() call.
You can configure the sysfail boot entry using the bootctl command:
$ bootctl set-sysfail sysfail.conf
The value will be stored in the `LoaderEntrySysFail` EFI variable.
The `LoaderEntrySysFail` EFI variable would be unset automatically
during next boot by `systemd-boot-clear-sysfail.service` if no
system failure occured, otherwise it would be kept as it is and a system
failure reason will be saved to `LoaderSysFailReason` EFI variable.
Signed-off-by: Igor Opaniuk <igor.opaniuk@foundries.io>
Otherwise, initrd-cleanup.service requests isolation thus the socket
is stopped before switching root, and several early events after
switching root may be lost.
We usually don't care, but here the existence of socket
is public API to a certain degree and signals availability
of the service (userdbd in particular, oomd is checked in
core-varlink.c). Hence let's be more careful and remove them
if stopped.
The current arrangement of service and socket units is
sort of all over the place. Let's clean it up a little,
roughly following the principles below:
- socket units have implicit ordering deps (not to be confused
with default ones which are subject to DefaultDependencies=)
before associated service, so drop any explicit After=
- If socket can be enabled, remember to link to it in service
via Also= and Sockets= (the latter replaces Wants=).
If the service Requires= socket however, Sockets= is omitted.
- If socket is statically enabled, no need for service
to pull it in - machined
Add two new socket units, one for each of systemd-resolved's varlink
servers:
systemd-resolved-varlink.socket
systemd-resolved-monitor.socket
Add logic to grab socket fds via sd_varlink_server_listen_name(), but
fallback to the existing sd_varlink_server_listen_address() calls if no
fds were given.
This will be used to make systemd-networkd-wait-online --dns more robust
against systemd-resolved restarts etc.
Delegation is enabled for udev so that it can mess around with the
cgroup hierarchy to avoid killing control processes when it calls
cg_kill in on_post() when it goes idle. We don't actually care about
any specific cgroup controllers in udev, so set Delegate= to enable
delegation without delegating any controllers
Follow up for https://github.com/systemd/systemd/pull/22752
This new tool looks for a three xattr on the root inode of a file system
that encode mount constraints of the file system. The tool is supposed
to be hooke into the mount logic and is supposed to protect against
misappropriating trusted file systems in unintended ways.
Consider the following scenario: we boot up on first boot and create a
tpm-locked pair of /var/ and /srv/ partitions via systemd-repart. An
attacker then offline modifies the partition table, exchanging the
metadata of the /var/ and /srv/ partition. So far we'd happily accept
that, honour the modified metadata and boot up. This could be used to
revert changes to /var/ or similar. And all that even though both
partitions are encrypted and locked to TPM!
With this new mechanism we can encode in the protected contents of the
file systems the ways it can be used: the partition type uuid, the
partition label and the intended mount point can be stored in xattrs,
and we can check them automatically on mount, and take action on
mismatch. (action would typically be immediate reboot).
PCR extensions are supposed to be useful for "destroying" the ability to
access TPM bound secrets. Hence, if for some reason we fail to extend a
PCR, it's safer to just reboot, instead of going on without the
extension, leaving secrets potentially accessible which should not be
accessible.
Note that the services exit gracefully if no TPM is found, hence this
should not be triggered on TPM-less systems. However, this enforces that
if there is a TPM that is accessible to Linux and that works properly,
the PCR measurement must complete too.
Inspired by this thread:
https://lists.freedesktop.org/archives/systemd-devel/2025-March/051244.html