With 430cc5d3ab,
the value of GENHD_FL_NO_PART, previously named as GENHD_FL_NO_PART_SCAN,
is changed from 0x0200 to 0x0004. So, we need to check both flags.
Follow-up for 14631951ce
Before this commit, if WorkingDirectory= is empty or literally "-",
'simplified' is not populated, resulting in the ASSERT_PTR
in unit_write_settingf() below getting triggered.
Also, do not accept "-", so that the parser is consistent
with load-fragment.c
Fixes#33015
If the runtime journal is opened, we will anyway write journal entries
to the runtime journal, even if the persistent journal is writable.
Hence, we need to flush the runtime journal file later.
If an initrd has an empty or uninitialized /etc/machine-id file,
then PID1 write a valid machine ID. So, the logic is important only on
soft-reboot. Let's mention that explicitly.
Follow-up for 16718dcf78.
This effectively reverts ba540e9f1c.
https://github.com/systemd/systemd/pull/32915#discussion_r1608258136
> In many cases we allow --root=/ as a mechanism for forcing an "offline" mode,
> while still operating on the root dir. if we do the getenv_for_pid() thing
> below I'd claim this is very much an "online" operation, and hence --root=/
> should really disable that.
Before:
/etc/kernel/install.conf:6: Unknown key name 'asdf' in section '(null)', ignoring.
After:
/etc/kernel/install.conf:6: Unknown key 'asdf', ignoring.
Also make the message a bit better.
If both literal and signed PCR bindings are not used then we won't
determine a PCR bank to use, and hence we shouldnt attempt to serialize
it either.
Hence, if the bank is zero, skip serialization.
(And while we are at it, also skip serialization of the primary
algorithm if not set, purely to make things systematic).
[This effectively results in little change, as previously we'd then
seralize a json "null", while now we simply won't genreate the field]
We so far derived the PCR bank to use from the PCR values specified fr
literal PCR binding. However, when that's not used then we left the bank
uninitialized – which will break if signed PCR binds are used (where we
need to pick a bank too after all).
Hence, let's explicitly pick a bank to use if literal PCR values are not
used, to make things just work.
Fixes: #32946
In varlink.c we generally do not make failing callback functions fatal,
since that should be up to the app. Hence, in case of varlinkctl (where
we want failures to be fatal), make sure to propagate the error back
explicitly.
Before this change a failing call to "varlinkctl --more call …" would result in
a zero exit code. With this it will correctly exit with a non-zero exit
code.
When running in LXC with AppArmor we'll most likely get an error when creating
a network namespace due to a kernel regression in < v6.2 affecting AppArmor,
resulting in denials. Like other tests, avoid failing in case of permission
issues and handle it gracefully.
As per the documentation, EACCES is only returned when F_SETLK is
used, and only on some platforms, which doesn't seem to include
Linux:
https://github.com/torvalds/linux/blob/master/fs/locks.c
F_OFD_SETLK is documented to only return EAGAIN, and F_SETLKW/F_OFD_SETLKW
are blocking operations so this logic doesn't apply to them in the
first place.
Hence, only automatically convert EACCES into EAGAIN for F_SETLK
operations, and propagate the original error in the other cases.
This is important because in some cases we catch permission errors
and gracefully fallback, which is not possible if the original error
is lost.
This is an issue in practice because, due to a kernel bug present
before v6.2, AppArmor denies locking on file descriptors to LXC
containers. We support all currently maintained LTS kernels,
including v6.1, where despite a lot of effort and attempts over almost
a year, the bugfix still hasn't been backported, as it is complex and
requires large changes to AppArmor.
On affected kernels, all services running with PrivateNetwork=yes
fail and do not recover, instead of the normal behaviour of gracefully
downgrading to PrivateNetwork=no.
The integration tests in the Debian CI fail due to this issue:
https://ci.debian.net/packages/s/systemd/testing/arm64/46828037/
Coverity gets confused since the iterator change, so add an
assert to indicate that this is allocated if n_old_groups is > 0
CID#1545922
Follow-up for 125cca1b51
Follow-up for ade0789fab
The change in behavior was partly intentional, as I think
if both --wait and --pty are used, manually disconnecting
from PTY forwarder should not result in systemd-run exiting
with "Finished with ..." log. But we should check for
--wait here.
Closes#32953
Fixes https://github.com/systemd/systemd/issues/28514.
Quoting https://github.com/systemd/systemd/issues/28514#issuecomment-1831781486:
> Whenever PAM is enabled for a service, we set up the PAM session and then
> fork off a process whose only job is to eventually close the PAM session when
> the service dies. That services we run with service privileges, both to
> minimize attack surface and because we want to use PR_SET_DEATHSIG to be get
> a notification via signal whenever the main process dies. But that only works
> if we have the same credentials as that main process.
>
> Now, if pam_systemd runs inside the PAM stack (which it normally does) it's
> session close hook will ask logind to synchronously end the session via a bus
> call. Currently that call is not accessible to unprivileged clients. And
> that's the part we need to relax: allow users to end their own sessions.
The check is implemented in a way that allows the kill if the sender is in
the target session.
I found 'sudo systemctl --user -M "zbyszek@" is-system-running' to
be a convenient reproducer.
Before:
May 16 16:25:26 x1c systemd[1]: run-u24754.service: Deactivated successfully.
May 16 16:25:26 x1c dbus-broker[1489]: A security policy denied :1.24757 to send method call /org/freedesktop/login1:org.freedesktop.login1.Manager.ReleaseSession to org.freedesktop.login1.
May 16 16:25:26 x1c (sd-pam)[3036470]: pam_systemd(login:session): Failed to release session: Access denied
May 16 16:25:26 x1c systemd[1]: Stopping session-114.scope...
May 16 16:25:26 x1c systemd[1]: session-114.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd[1]: Stopped session-114.scope.
May 16 16:25:26 x1c systemd[1]: session-c151.scope: Deactivated successfully.
May 16 16:25:26 x1c systemd-logind[1513]: Session c151 logged out. Waiting for processes to exit.
May 16 16:25:26 x1c systemd-logind[1513]: Removed session c151.
After:
May 16 17:02:15 x1c systemd[1]: run-u24770.service: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopping session-115.scope...
May 16 17:02:15 x1c systemd[1]: session-c153.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: session-115.scope: Deactivated successfully.
May 16 17:02:15 x1c systemd[1]: Stopped session-115.scope.
May 16 17:02:15 x1c systemd-logind[1513]: Session c153 logged out. Waiting for processes to exit.
May 16 17:02:15 x1c systemd-logind[1513]: Removed session c153.
Edit: this seems to also fix https://github.com/systemd/systemd/issues/8598.
It seems that with the call to ReleaseSession, we wait for the pam session
close hooks to finish. I inserted a 'sleep(10)' after the call to ReleaseSession
in pam_systemd, and things block on that, nothing is killed prematurely.
We have the following timestamp status:
$ systemctl show systemd-fsck-root.service | grep InactiveExitTimestamp
InactiveExitTimestamp=Thu 2023-11-02 12:27:24 CET
InactiveExitTimestampMonotonic=15143158
$ systemctl show | grep UserspaceTimestamp
UserspaceTimestamp=Thu 2023-11-02 12:27:25 CET
UserspaceTimestampMonotonic=15804273
i.e. UserspaceTimestamp is before InactiveExit of systemd-fsck-root.service.
This is fine, but on display, we'd subtract those values and print a huge
negative value bogusly:
$ build/systemd-analyze critical-chain systemd-remount-fs.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
systemd-remount-fs.service +137ms
└─systemd-fsck-root.service @584542y 2w 2d 20h 1min 48.890s +45ms
└─systemd-journald.socket
└─system.slice
└─-.slice
In fact, list_dependencies_print() already had a branch where the check that
'times->activating > boot->userspace_time', but it didn't cover all cases. So
make it cover both branches, and also change to '>=', since it's fine if
something happened with the same timestamp.
With the patch:
$ build/systemd-analyze critical-chain systemd-remount-fs.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
systemd-remount-fs.service +42ms
└─systemd-fsck-root.service
└─systemd-journald.socket
└─system.slice
└─-.slice
Fixes https://github.com/systemd/systemd/issues/17191.
When running inside an LXC container the 'su' process will not be part of
any unit or slice.
manager_get_user_by_pid() which was used until v255 (included) does not fail
if it cannot find a unit/slice, but simply returns 'not found'. Do the same
in manager_get_session_by_pidref().
This was not detected as Semaphore CI does not reboot the testbed before
the logind test, so the session is started by the old logind from the base
distro, instead of the one being tested.
Follow-up for 8494f562c8
Follow-up for 5099a50d43
Fixes https://github.com/systemd/systemd/issues/32929