This also ports over things to use chase() to create/pin the underlying
to mount, and in particular checks that the path does not contain any
symlinks. That's crucial since we cannot allow mounts to be established
with that, since it would mean we couldn't recognize the entries in
/proc/self/mountinfo anymore.
09fbff57fc introduced new knob
for such functionality. However, that seems unnecessary.
The mount option string is ubiquitous in that all of fstab,
kernel cmdline, credentials, systemd-mount, ... speak it.
And we already have x-systemd.device-bound= that's parsed
by pid1 instead of fstab-generator. It feels hence more natural
for graceful options to be an extension of that, rather than
its own property.
There's also one nice side effect that the setting itself
is now more graceful for systemd versions not supporting
such feature.
Follow-up for a1d315730f
and 6ac62d61db
With the aforementioned commits, unit_release_resources()
is dispatched in a dedicated queue, and Service.n_keep_fd_store
has been dropped, hence the comment is outdated. Moreover,
the unit is added to GC queue in unit_notify() already.
No other unit types do this in corresponding _enter_dead()
functions, nor does Service need it anymore.
I.e. don't perform any action if we can't spawn mount task anyway.
Later the same check would be added to mount_can_start/reload(),
so this makes things more coherent too.
There are cases where templated Socket unit files are used for network services
with interface name used as an instance. This patch allows using %i for
BindToDevice setting to limit the scope automatically.
Some bind mounts, e.g. /tmp bind mount when PrivateTmp=disconnected,
must be explicitly relabeled because now it would have incorrect SELinux
label. /tmp is expected to have well-known SELinux label, tmp_t. Now it
has label inherited from the source directory of the bind mount.
This follows the same recent change in util-linux:
https://github.com/util-linux/util-linux/pull/3354
i.e. we generally want that PAM modules can override $HOME and it is
honoured for the CWD after login.
(This renames the 'home' variable we maintained sofar to 'pwent_home',
to clarify that it's the home directory listed in the struct passwd
entry, and thus not necessarily the one actually used)
In clock_apply_epoch() function, the /usr/lib/clock-epoch timestamp was set to timesyncd_usec instead of epoch_usec and vice-versa which produced a misleading log message about the clock source systemd used for early clock sanitization. This trivial commit fix the mistake.
This introduces libmount_parse_mountinfo() and libmount_parse_with_utab().
The former one parses only mountinfo, but the latter one also parse
utab. Hopefully this avoids pitfalls like issue #35949.
systemctl has a --job-mode= argument, and adding the same argument to
systemd-run is useful for starting transient scopes with dependencies.
For example, if a transient scope BindsTo a service that is stopping,
specifying --job-mode=replace will wait for the service to stop before
starting it again, while the default job mode of "fail" will cause the
systemd-run invocation to fail.
Let consider the following udev rules:
```
PROGRAM="/usr/bin/systemd-escape foo-bar-baz", ENV{SYSTEMD_WANTS}+="test1@$result.service"
PROGRAM="/usr/bin/systemd-escape aaa-bbb-ccc", ENV{SYSTEMD_WANTS}+="test2@$result.service"
```
Then, a device expectedly gains a property:
```
SYSTEMD_WANTS=test1@foo\x2dbar\x2dbaz.service test2@aaa\x2dbbb\x2dccc.service
```
After the event being processed by udevd, PID1 processes the device, the
property previously was parsed with
`extract_first_word(EXTRACT_UNQUOTE)`, then the device unit gained the
following dependencies:
```
Wants=test1@foox2dbarx2dbaz.servicetest2@aaax2dbbbx2dccc.service
```
So both `%i` and `%I` for the template services did not match with the
original data, and it was hard to use `systemd-escape` in `PROGRAM=`
udev rule token.
This makes the property parsed with
`extract_first_word(EXTRACT_UNQUOTE|EXTRACT_RETAIN_ESCAPE)`, hence the
device unit now gains the following dependencies:
```
Wants=test1@foo\x2dbar\x2dbaz.service test2@aaa\x2dbbb\x2dccc.service
```
and `%I` for the template services match with the original data.
Fixes a bug caused by ceed8f0c8b (v233).
Fixes#16735.
Replaces #16737 and #35768.
Follow-up for 656bbffc6c
The commit reworked job merging logic so that reload jobs
won't get merged. However, they might get dropped from
transaction due to being deemed redundant, i.e. way before
it even hits job_install(). Let's make sure reload jobs
are always kept during transaction construction stage, too.
Let consider the following udev rules:
===
PROGRAM="/usr/bin/systemd-escape foo-bar-baz", ENV{SYSTEMD_WANTS}+="test1@$result.service"
PROGRAM="/usr/bin/systemd-escape aaa-bbb-ccc", ENV{SYSTEMD_WANTS}+="test2@$result.service"
===
Then, a device expectedly gains a property:
===
SYSTEMD_WANTS=test1@foo\x2dbar\x2dbaz.service test2@aaa\x2dbbb\x2dccc.service
===
After the event being processed by udevd, PID1 processes the device, the
property previously was parsed with extract_first_word(EXTRACT_UNQUOTE),
then the device unit gained the following dependencies:
===
Wants=test1@foox2dbarx2dbaz.servicetest2@aaax2dbbbx2dccc.service
===
So both '%i' and '%I' for the template services did not match with the original
data, and it was hard to use systemd-escape in PROGRAM= udev rule token.
This makes the property parsed with extract_first_word(EXTRACT_UNQUOTE|EXTRACT_RETAIN_ESCAPE),
hence the device unit now gains the following dependencies:
===
Wants=test1@foo\x2dbar\x2dbaz.service test2@aaa\x2dbbb\x2dccc.service
===
and '%I' for the template services match with the original data.
Fixes a bug caused by ceed8f0c8b (v233).
Fixes#16735.
Replaces #16737 and #35768.
Note that this drops a lot of "const" qualifiers on PidRef arguments.
That's because pidref_is_self() suddenly might end changing the PidRef
because it acquires the pidfd ID.
We had this previously already with pidfd_equal(), but this amplifies
the problem.
I guess we C's "const" doesn't really work for stuff that contains
caches, that is just conceptually constant, but not actually.
Given we have the generic interface, let's hook it up everywhere.
This doesnt bother with the Reload() call usually, since that's more
involved, but hooks up the other relevant functions where applicable.
This new setting can be used to specify mount options that shall only be
added to the mount option string if the kernel supports them.
This shall be used for adding "usrquota" to tmp.mount without breaking compat,
but is generally be useful.
Background: Fedora/RHEL are switching to sysusers.d metadata for
creation of users and groups for system users defined by packages
(https://fedoraproject.org/wiki/Changes/RPMSuportForSystemdSysusers).
Packages carry sysusers files. During package installation, rpm calls an
program to execute on this config. This program may either be
/usr/lib/rpm/sysusers.sh which calls useradd/groupadd, or
/usr/bin/systemd-sysusers. To match the functionality provided by
useradd/groupadd from the shadow-utils project, systemd-sysusers must
emit audit events so that it provides a drop-in replacement.
systemd-sysuers will emit audit events AUDIT_ADD_USER/AUDIT_ADD_GROUP
when adding users and groups. The operation "names" are copied from
shadow-utils, so the format of the events that is generated on success
should be identical. On failure, things are more complicated. We write
the whole file at once, once, so we first generate "success" messages
for each entry, then we try to write the files, and if things fail, we
generate failure messages to all entries that we failed to write.
In containers securityfs is typically not mounted. Our lsm-bpf code
so far detected this situation and claimed the kernel was lacking
lsm-bpf support. Which isn't quite true though, it might very well
support it. This made boots of systemd in systemd-nspawn a bit ugly,
because of the misleading log message at boot.
Let's improve things, and make clearer what is going on.