Some configuration files that need updates are directly under in /etc. To
update them atomically, we need write access to /etc. For Ubuntu Core this is
an issue as /etc is not writable. Only a selection of subdirectories can be
writable. The general solution is symlinks or bind mounts to writable places.
But for atomic writes in /etc, that does not work. So Ubuntu has had a patch
for that that did not age well.
Instead we would like to introduce some environment variables for alternate
paths.
* SYSTEMD_ETC_HOSTNAME: /etc/hostname
* SYSTEMD_ETC_MACHINE_INFO: /etc/machine-info
* SYSTEMD_ETC_LOCALTIME: /etc/localtime
* SYSTEMD_ETC_LOCALE_CONF: /etc/locale.conf
* SYSTEMD_ETC_VCONSOLE_CONF: /etc/vconsole.conf
* SYSTEMD_ETC_ADJTIME: /etc/adjtime
While it is for now expected that there is a symlink from the standard, we
still try to read them from that alternate path. This is important for
`/etc/localtime`, which is a symlink, so we cannot have an indirect symlink or
bind mount for it.
Since machine-id is typically written only once and not updated. This commit
does not cover it. An initrd can properly create it and bind mount it.
Since 435c372ce5 added in v256,
num_entries part of the Range header is mandatory and error is returned
when it's not filled in. This makes using the "follow" argument clumsy,
because for an indefinite following of the logs, arbitrary high number
must be specified. This change makes it possible to omit it again and
documents this behavior in the man page.
Moreover, as the cursor part of the header was never mandatory, enclose
it in square brackets in the documentation as well and elaborate how
indexing works.
Following are some concrete examples of the Range header which are now
accepted:
entries= (or entries=:)
- everything starting from the first event
entries=cursor
- everything starting from `cursor`
entries=:-9:10
- last 10 events and close the connection
If the follow flag is set:
entries=:-4:10
- last 5 events, wait for 5 new and close connection
entries=:-9:
- last 10 events and keep streaming
Note that only the very last one is changing current behavior, but
reintroduces pre-v256 compatibility.
Fixes#37172
When adding a new interface to the object add it at the end of the list.
This way, when iterating over the list, e.g., during handling introspect
call, the order of returned interfaces will mach the order in which they
were added.
Not sure why the test failed, but maybe the test environment is too
slow? Even this does not fix the failure, by enabling debugging logs,
this hopefully provides more useful information for debugging.
For issue #37685.
Currently, if we use a cgroup namespace together with DelegateSubgroup=,
the subgroup becomes the root of the cgroup namespace because we move the
service process to the subgroup before we unshare the cgroup namespace, and
the current cgroup becomes the root of the cgroup namespace when we unshare
the cgroup namespace.
Let's fix the problem by not moving the service process to the subgroup until
we've unshared the cgroup namespace. Note that this doesn't break the primary use
case of CLONE_INTO_CGROUP since we still use it to immediately clone into the service
main cgroup, just not anymore into the subgroup, but this shouldn't matter in practice.
Additionally, we need special handling for control processes, as those *do*
need to get spawned into the subcgroup immediately if delegation is configured to
avoid violating the cgroupsv2 "no inner processes" rule.
Effectively, this leaves us with the following logic:
- In exec_spawn(), spawn into subgroup if we're spawning a control process
that needs to be spawned into a subgroup immediately. Otherwise, spawn into
main service cgroup.
- In exec_invoke(), move into subgroup early if we don't need a cgroup namespace.
Otherwise, move into subgroup after we've unshared the cgroup namespace.
systemd-pcrlock requires support for the PolicyAuthorizeNV command,
which is not implemented in the first TPM2 releases. We also strictly
require SHA-256 support. Hence add a tool for checking for both of
these.
This is a tighter version of "systemd-analyze has-tpm2", that checks for
the precise feature that systemd-pcrlock needs, on top of basic TPM2
functionality.
Fixes: #37607
A new core was added to the test, but the loop counter was not increased
to wait for it, so the test races against systemd-coredump's processing.
This failed at least once in debci:
8015s [ 32.227813] TEST-87-AUX-UTILS-VM.sh[1038]: + coredumpctl info COREDUMP_TIMESTAMP=1679509902000000
8015s [ 32.228684] TEST-87-AUX-UTILS-VM.sh[1723]: No coredumps found.
Follow-up for 0c49e0049b
Fixes https://github.com/systemd/systemd/issues/37666
The kernel provides %d which is documented as
"dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE".
We already query /proc/pid/auxv for this information, but unfortunately this
check is subject to a race, because the crashed process may be replaced by an
attacker before we read this data, for example replacing a SUID process that
was killed by a signal with another process that is not SUID, tricking us into
making the coredump of the original process readable by the attacker.
With this patch, we effectively add one more check to the list of conditions
that need be satisfied if we are to make the coredump accessible to the user.
Reportedy-by: Qualys Security Advisory <qsa@qualys.com>
In principle, %d might return a value other than 0, 1, or 2 in the future.
Thus, we accept those, but emit a notice.
We have already exposed device ID in the output of device ID in J
fields. Also sd_device_get_device_id() and sd_device_new_from_device_id()
are already public. Hence, making udevadm accept device IDs may be
useful.
With this change, as we save several data in /run/udev with device ID,
we can call udevadm something like the following:
```
udevadm info $(ls /run/udev/tags/uaccess)
```
Then, we can show all devices that has uaccess tag.
Add endpoint for listing boots. Output format mimics `journalctl
--list-boots -o json`, so it's a plain array containing index, boot ID
and timestamps of the first and last entry. Initial implementation
returns boots ordered starting with the current one and doesn't allow
any filtering (i.e. equivalent of --lines argument).
Fixes: #37573
per Daan's explanation:
other subtests running as testuser apparently use systemd-run --user
--machine testuser@.host which turns user tracking in logind into "by
pin" mode. when the last pinning session exits it terminates the user.
This was broken in f45b801551. Unfortunately
the review does not talk about backward compatibility at all. There are
two places where it matters:
- During upgrades, the replacement of kernel.core_pattern is asynchronous.
For example, during rpm upgrades, it would be updated a post-transaction
file trigger. In other scenarios, the update might only happen after
reboot. We have a potentially long window where the old pattern is in
place. We need to capture coredumps during upgrades too.
- With --backtrace. The interface of --backtrace, in hindsight, is not
great. But there are users of --backtrace which were written to use
a specific set of arguments, and we can't just break compatiblity.
One example is systemd-coredump-python, but there are also reports of
users using --backtrace to generate coredump logs.
Thus, we require the original set of args, and will use the additional args if
found.
A test is added to verify that --backtrace works with and without the optional
args.
`ExtensionImages=` and `ExtensionDirectories=` now let you specify
vpick-named extensions; however, since they just get set up once when
the service is started, you can't see newer versions without restarting
the service entirely. Here, also reload confext extensions when you
reload a service. This allows you to deploy a new version of some
configuration and have it picked up at reload time without interruption
to your workload.
Right now, we would only reload confext extensions and leave the sysext
ones behind, since it didn't seem prudent to swap out what is likely
program code at reload. This is made possible by only going for the
`SYSTEMD_CONFEXT_HIERARCHIES` overlays (which only contains `/etc`).
This PR:
- Adjusts `service.c` to also refresh extensions when needed.
- Adds integration tests to check that a confext reload actually
occurred.
- Adds to the `systemd.exec` man pages to document this behavior.
This is a follow up to #24864 and #31364. Thank you to @bluca and
@goenkam for help in getting this up.
In TEST-50-DISSECT.dissect, this adds the following cases:
- testservice-50g: vpick extension in ExtensionDirectories
- testservice-50h: vpick extension in ExtensionImages
- testservice-50i: ExtensionDirectories + RootImage
- testservice-50j: ExtensionDirectories + RootDirectory
This integration test demonstrates that a containerized systemd instance can
write to a bind mounted file observable to the host. Specifically, the bash
script uses systemd-run to start a systemd instance as a transient unit
container. This systemd-run command bind mounts a directory the container will
share with the host, and runs an internal service which creates and writes to a
file from the container's view of this directory. When finished writing, the
service runs the exit target, terminating the internal systemd instance, and
ending the lifetime of the container.
The script waits for the container to finish running, then verifies that the
expected file contents were written on the host side of the filesystem mount.
This test employs a workaround, creating an unmasked procfs mount on the host
which enables the privileged guest to create its own mounts internally. This
may indicate a systemd bug, as the privileged container should not rely on
the existence of an unmasked procfs on the host in order to mount its own
filesystems internally.
Our baseline is v5.4 and cgroup v2 is enforced now,
which means CPU accounting is cheap everywhere without
requiring any controller, hence just remove the directive.
capability_quintet_mangle() can be called with capability sets
containing unknown capabilities. Let's not crash when this is the
case but instead ignore the unknown capabilities.
Fixes d5e12dc75e
Follow-up for f44e7a8c11
This breaks the rule stated at the beginning of help_sudo_mode():
> NB: Let's not go overboard with short options: we try to keep a modicum of compatibility with
> sudo's short switches, hence please do not introduce new short switches unless they have a roughly
> equivalent purpose on sudo. Use long options for everything private to run0.