Before calling io.systemd.MachineImage.List.
The systemd-nspawn process takes a lock in the run() function in
nspawn.c and holds it for the entire runtime of that function. If we
call `machinectl terminate` the machine gets unregistered _before_ we
release the lock, so the original `machinectl status` check would return
early, allowing for a race where we call io.systemd.MachineImage.List
over Varlink when systemd-nspawn still holds the lock because the
process is still running.:
```
[ 41.691826] TEST-13-NSPAWN.sh[1102]: + machinectl terminate long-running
[ 41.695009] systemd-nspawn[2171]: Trying to halt container by sending TERM to container PID 1. Send SIGTERM again to trigger immediate termination.
[ 41.698235] systemd-machined[1192]: Machine long-running terminated.
[ 41.709520] TEST-13-NSPAWN.sh[1102]: + systemctl kill --signal=KILL systemd-nspawn@long-running.service
[ 41.709169] systemd-nspawn[2171]: Failed to unregister machine: No machine 'long-running' known
[ 41.720869] TEST-13-NSPAWN.sh[2346]: + varlinkctl --more call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{}'
[ 41.723359] TEST-13-NSPAWN.sh[2347]: + grep long-running
...
[ 41.735453] TEST-13-NSPAWN.sh[2352]: + varlinkctl call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{"name":"long-running", "acquireMetadata": "yes"}'
[ 41.736222] TEST-13-NSPAWN.sh[2353]: + grep OSRelease
[ 41.739500] TEST-13-NSPAWN.sh[2352]: Method call io.systemd.MachineImage.List() failed: Device or resource busy
[ 41.740641] systemd[1]: Received SIGCHLD.
[ 41.740670] systemd[1]: Child 2171 (systemd-nspawn) died (code=killed, status=9/KILL)
[ 41.740725] systemd[1]: systemd-nspawn@long-running.service: Child 2171 belongs to systemd-nspawn@long-running.service.
[ 41.740748] systemd[1]: systemd-nspawn@long-running.service: Main process exited, code=killed, status=9/KILL
[ 41.740755] systemd[1]: systemd-nspawn@long-running.service: Will spawn child (service_enter_stop_post): systemd-nspawn
[ 41.740872] systemd[1]: systemd-nspawn@long-running.service: About to execute: systemd-nspawn --cleanup --machine=long-running
...
```
Let's mitigate this by waiting until the corresponding
systemd-nspawn@.service instance enters the 'inactive' state where the
lock should be properly released.
Resolves: https://github.com/systemd/systemd/issues/39547
It should exit on its own anyway and this will work even if the job has
already finished* (unlike kill).
[*] assuming job control is off, as it's the case when running the
test suite
Resolves: #39543
Before calling io.systemd.MachineImage.List.
The systemd-nspawn process takes a lock in the run() function in
nspawn.c and holds it for the entire runtime of that function. If we
call `machinectl terminate` the machine gets unregistered _before_ we
release the lock, so the original `machinectl status` check would return
early, allowing for a race where we call io.systemd.MachineImage.List
over Varlink when systemd-nspawn still holds the lock because the
process is still running.:
[ 41.691826] TEST-13-NSPAWN.sh[1102]: + machinectl terminate long-running
[ 41.695009] systemd-nspawn[2171]: Trying to halt container by sending TERM to container PID 1. Send SIGTERM again to trigger immediate termination.
[ 41.698235] systemd-machined[1192]: Machine long-running terminated.
[ 41.709520] TEST-13-NSPAWN.sh[1102]: + systemctl kill --signal=KILL systemd-nspawn@long-running.service
[ 41.709169] systemd-nspawn[2171]: Failed to unregister machine: No machine 'long-running' known
[ 41.720869] TEST-13-NSPAWN.sh[2346]: + varlinkctl --more call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{}'
[ 41.723359] TEST-13-NSPAWN.sh[2347]: + grep long-running
...
[ 41.735453] TEST-13-NSPAWN.sh[2352]: + varlinkctl call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{"name":"long-running", "acquireMetadata": "yes"}'
[ 41.736222] TEST-13-NSPAWN.sh[2353]: + grep OSRelease
[ 41.739500] TEST-13-NSPAWN.sh[2352]: Method call io.systemd.MachineImage.List() failed: Device or resource busy
[ 41.740641] systemd[1]: Received SIGCHLD.
[ 41.740670] systemd[1]: Child 2171 (systemd-nspawn) died (code=killed, status=9/KILL)
[ 41.740725] systemd[1]: systemd-nspawn@long-running.service: Child 2171 belongs to systemd-nspawn@long-running.service.
[ 41.740748] systemd[1]: systemd-nspawn@long-running.service: Main process exited, code=killed, status=9/KILL
[ 41.740755] systemd[1]: systemd-nspawn@long-running.service: Will spawn child (service_enter_stop_post): systemd-nspawn
[ 41.740872] systemd[1]: systemd-nspawn@long-running.service: About to execute: systemd-nspawn --cleanup --machine=long-running
...
Let's mitigate this by waiting until the corresponding
systemd-nspawn@.service instance enters the 'inactive' state where the
lock should be properly released.
Resolves: #39547
There might be a delay between an umount and a refcounted device
to disappear, so the test can be flaky:
[ 36.107128] TEST-50-DISSECT.sh[1662]: ++ dmsetup ls
[ 36.108314] TEST-50-DISSECT.sh[1663]: ++ grep loop
[ 36.109283] TEST-50-DISSECT.sh[1664]: ++ grep -c verity
[ 36.110284] TEST-50-DISSECT.sh[1360]: + test 1 -eq 1
[ 36.111555] TEST-50-DISSECT.sh[1360]: + umount -R /tmp/TEST-50-IMAGES.hxm/mount
[ 36.112237] TEST-50-DISSECT.sh[1668]: ++ dmsetup ls
[ 36.113039] TEST-50-DISSECT.sh[1669]: ++ grep loop
[ 36.113833] TEST-50-DISSECT.sh[1670]: ++ grep -c verity
[ 36.114517] TEST-50-DISSECT.sh[1360]: + test 0 -eq 1
[ 36.116734] TEST-50-DISSECT.sh[1000]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-50-DISSECT.dissect.sh failed'
https://github.com/systemd/systemd/actions/runs/19062162467/job/54444112653?pr=39540#logs
Switch to searching for the dm entry and check for it specifically,
and wait for it to disappear before checking that it is no longer
in the dm table.
Follow-up for 10fc43e504
TEST-07-PID.user-namespace-path.sh is flaky as Type=simple is used
(implicitly), explicitly use Type=exec instead to ensure the namespaces
are created before starting another service reusing the same namespaces.
Fixes#39546.
Both sysext and confext used the host's /etc/initrd-release file even
when --root=/somewhere was specified. A workaround was the
SYSTEMD_IN_INITRD= env var but without knowing this it was quite
confusing. Aside from users validating their extensions, the primary
use case for this to matter is when the extensions are set up from the
initrd where the initrd-release file is present when running but we want
to prepare the extensions for the final system and thus should match
for the right scope.
Make systemd-sysext check for /etc/initrd-release inside the given
--root= tree. An alternative would be to always ignore the
initrd-release check when --root= is passed but this way it is more
consistent. The image policy logic for EFI-loader-passed extensions
won't take effect when --root= is used, though.
The last sysext test leaked things into new tests added later,
uncovered by any new tests leftover check.
Remove the mutable folder state through a trap as done in other tests.
This allows a service to reuse the user namespace created for an
existing service, similarly to NetworkNamespacePath=. The configuration
is the initial user namespace (e.g. ID mapping) is preserved.
I recently found out (the hard way) that on an older version
there was a bug when the verity sharing is disabled: the
deferred close flag was not set correctly, so verity devices
were leaked.
This is not an issue in main currently, but add a test case
to cover it just in case, to avoid future regressions.
RootDirectory= but via a open_tree() file descriptor. This allows
setting up the execution environment for a service by the client in a
mount namespace and then starting a transient unit in that execution
environment using the new property.
We also add --root-directory= and --same-root-dir= to systemd-run to
have it run services within the given root directory. As systemd-run
might be invoked from a different mount namespace than what systemd is
running in, systemd-run opens the given path with open_tree() and then
sends it to systemd using the new RootDirectoryFileDescriptor= property.
RootDirectory= but via a open_tree() file descriptor. This allows
setting up the execution environment for a service by the client in
a mount namespace and then starting a transient unit in that execution
environment using the new property.
We also add --root-directory= and --same-root-dir= to systemd-run to
have it run services within the given root directory. As systemd-run
might be invoked from a different mount namespace than what systemd is
running in, systemd-run opens the given path with open_tree() and then
sends it to systemd using the new RootDirectoryFileDescriptor= property.
--empower gives full privileges to a non-root user. Currently this
includes all capabilities but we leave the option open to add more
privileges via this option in the future.
Why is this useful? When running privileged development or debugging
commands from your home directory (think bpftrace, strace and such),
you want any files written by these tools to be owned by your current
user, and not by the root user. run0 --empower will allow you to run
all privileged operations (assuming the tools check for capabilities
and not UIDs), while any files written by the tools will still be owned
by the current user.
As 'systemctl stop' is called with --no-block, previously systemd-resolved
might not be stopped when 'resolvectl' is called, and the DBus connection
might be closed during the call:
```
TEST-07-PID1.sh[5643]: + systemctl stop --no-block systemd-resolved.service
TEST-07-PID1.sh[5643]: + resolvectl
TEST-07-PID1.sh[5732]: Failed to get global data: Remote peer disconnected
```
Follow-up for 8eefd0f4de.
Fixes https://github.com/systemd/systemd/pull/39388#issuecomment-3439277442.
As the modified service requires about ~10 seconds for stopping, the
service never hit the start limit even if we tried to restart the
service more than 5 times.
This also checks that the service is actually triggered by dbus method
call.
Follow-up for 8eefd0f4de.
Otherwise, e.g. requesting to start a unit that is under stopping may
enter the failed state.
This makes
- rename .can_start() -> .test_startable(), and make it allow to return
boolean and refuse to start units when it returns false,
- refuse earlier to start units that are in the deactivating state, so
several redundant conditions in .start() can be dropped,
- move checks for unit states mapped to UNIT_ACTIVATING from .start() to
.test_startable().
Fixes#39247.
The process forked off by `systemd-notify --fork` is not a child of the
current shell, so using `wait` doesn't work. This then later causes a
race, when the test occasionally fails because it attempts to start a
new systemd-socket-activate instance before the old one is completely
gone:
[ 1488.947744] TEST-74-AUX-UTILS.sh[1938]: Child 1947 died with code 0
[ 1488.947952] TEST-74-AUX-UTILS.sh[1933]: + assert_eq hello hello
[ 1488.949716] TEST-74-AUX-UTILS.sh[1948]: + set +ex
[ 1488.950112] TEST-74-AUX-UTILS.sh[1950]: ++ cat /proc/1938/comm
[ 1488.945555] systemd[1]: Started systemd-networkd.service - Network Management.
[ 1488.950365] TEST-74-AUX-UTILS.sh[1933]: + assert_in systemd-socket systemd-socket-
[ 1488.950563] TEST-74-AUX-UTILS.sh[1951]: + set +ex
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + kill 1938
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + wait 1938
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: .//usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh: line 14: wait: pid 1938 is not a child of this shell
[ 1488.950766] TEST-74-AUX-UTILS.sh[1933]: + :
[ 1488.951486] TEST-74-AUX-UTILS.sh[1952]: ++ systemd-notify --fork -- systemd-socket-activate -l 1234 --now socat ACCEPT-FD:3 PIPE
[ 1488.952222] TEST-74-AUX-UTILS.sh[1953]: Failed to listen on [::]🔢 Address already in use
[ 1488.952222] TEST-74-AUX-UTILS.sh[1953]: Failed to open '1234': Address already in use
[ 1488.956831] TEST-74-AUX-UTILS.sh[1933]: + PID=1953
[ 1488.957078] TEST-74-AUX-UTILS.sh[102]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh failed'
[ 1488.957078] TEST-74-AUX-UTILS.sh[102]: Subtest /usr/lib/systemd/tests/testdata/units/TEST-74-AUX-UTILS.socket-activate.sh failed
This is useful when we start to call mountfsd from root, for example
from the tests where we just use a simple squashfs/erofs.
Note that this requires the caller to be root, and it will be rejected
otherwise, as such images are classified as 'unprotected' and the
enforced policy does not accept them for unprivileged users.
Fixes the following warning:
TEST-75-RESOLVED.sh[2251]: ++ restart_resolved
TEST-75-RESOLVED.sh[2251]: ++ systemctl stop systemd-resolved.service
TEST-75-RESOLVED.sh[2271]: Stopping 'systemd-resolved.service', but its triggering units are still active:
TEST-75-RESOLVED.sh[2271]: systemd-resolved-monitor.socket, systemd-resolved-varlink.socket
This test occasionally fails due to a race where systemd processes
kernel's SIGKILL before the OOM notification, so the test service dies
with Result=signal instead of the expected Result=oom-kill:
[ 51.008765] TEST-55-OOMD.sh[906]: + systemd-run --wait --unit oom-kill -p OOMPolicy=kill -p Delegate=yes -p DelegateSubgroup=init.scope /tmp/script.sh
[ 51.048747] TEST-55-OOMD.sh[907]: Running as unit: oom-kill.service; invocation ID: 456645347d554ea2878463404b181bd8
[ 51.066296] sysrq: Manual OOM execution
[ 51.066596] kworker/1:0 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=-1, oom_score_adj=0
[ 51.066915] CPU: 1 UID: 0 PID: 27 Comm: kworker/1:0 Not tainted 6.17.1-arch1-1 #1 PREEMPT(full) d2b229857b2eb4001337041f41d3c4f131433540
[ 51.066919] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.17.0-2-2 04/01/2014
[ 51.066921] Workqueue: events moom_callback
[ 51.066928] Call Trace:
[ 51.066931] <TASK>
[ 51.066936] dump_stack_lvl+0x5d/0x80
[ 51.066942] dump_header+0x43/0x1aa
<...snip...>
[ 51.087814] 47583 pages reserved
[ 51.087969] 0 pages cma reserved
[ 51.088208] 0 pages hwpoisoned
[ 51.088519] Out of memory: Killed process 908 (sleep) total-vm:3264kB, anon-rss:256kB, file-rss:1916kB, shmem-rss:0kB, UID:0 pgtables:44kB oom_score_adj:1000
[ 51.090263] TEST-55-OOMD.sh[907]: Finished with result: signal
[ 51.094416] TEST-55-OOMD.sh[907]: Main processes terminated with: code=killed, status=9/KILL
[ 51.094898] TEST-55-OOMD.sh[907]: Service runtime: 58ms
[ 51.095436] TEST-55-OOMD.sh[907]: CPU time consumed: 22ms
[ 51.095854] TEST-55-OOMD.sh[907]: Memory peak: 1.6M (swap: 0B)
[ 51.096722] TEST-55-OOMD.sh[912]: ++ systemctl show oom-kill -P Result
[ 51.106549] TEST-55-OOMD.sh[879]: + assert_eq signal oom-kill
[ 51.107394] TEST-55-OOMD.sh[913]: + set +ex
[ 51.108256] TEST-55-OOMD.sh[913]: FAIL: expected: 'oom-kill' actual: 'signal'
[FAILED] Failed to start TEST-55-OOMD.service.
To mitigate this, let's spawn a child process and move it to the
subcgroup to get killed instead of the main process, so systemd has more
time to react to the OOM notification and terminate the service with the
expected oom-kill result.
Needed to implement support for RootHashSignature=/RootVerity=/RootHash=
and friends when going through mountfsd, for example with user units,
so that system and user units provide the same features at the same
level
RootDirectory= and other options already implicitly enable PrivateUsers=
since 6ef721cbc7 if they are set in user
units, so that they can work out of the box.
Now with mountfsd support we can do the same for the images settings,
so enable them and document them.
It looks like the 4 second sleep might not be enough on some slower
machines (like the ARM GH Actions nodes) which can lead to the DS RRs
propagation to clash with the manual test zone edit, and the
signed.test zone then might end up not properly signed:
TEST-75-RESOLVED.sh[749]: + : '--- ZONE: signed.test (static DNSSEC) ---'
TEST-75-RESOLVED.sh[749]: + run_delv @ns1.unsigned.test signed.test
TEST-75-RESOLVED.sh[749]: + run delv -a /etc/bind.keys @ns1.unsigned.test signed.test
TEST-75-RESOLVED.sh[778]: + delv -a /etc/bind.keys @ns1.unsigned.test signed.test
TEST-75-RESOLVED.sh[779]: + tee /tmp/tmp.2KOIiyrgth
TEST-75-RESOLVED.sh[779]: ;; /etc/bind.keys:1: option 'managed-keys' is deprecated
TEST-75-RESOLVED.sh[779]: ;; validating signed.test/DS: no valid signature found
TEST-75-RESOLVED.sh[779]: ;; validating signed.test/A: no valid signature found
TEST-75-RESOLVED.sh[779]: ; unsigned answer
TEST-75-RESOLVED.sh[779]: signed.test. 86400 IN A 10.0.0.10
TEST-75-RESOLVED.sh[779]: signed.test. 86400 IN RRSIG A 13 2 86400 20251028114356 20251014101356 39330 signed.test. oo3ca8WPusbBPRhzsEKw3bsBBqFtI8i4bckoMVNzt7lY+udGW6PlaSYj OjpQGgY9oglowVM9bteNtwJKHUbvtw==
TEST-75-RESOLVED.sh[749]: + grep -qF '; fully validated' /tmp/tmp.2KOIiyrgth
[FAILED] Failed to start TEST-75-RESOLVED.service - TEST-75-RESOLVED.
Let's explicitly wait for the DS records propagation to finish before we
start editing the test zone to avoid this.
I'm still not completely sure if this is the root cause, but it's the
best shot I currently have, so I'll let the CIs decide.