Commit Graph

6448 Commits

Author SHA1 Message Date
Franck Bui
c972880640 core: really skip automatic restart when a JOB_STOP job is pending
It's not clear why we rescheduled a service auto restart while a stop job for
the unit was pending. The comment claims that the unit shouldn't be restarted
but the code did reschedule an auto restart meanwhile.

In practice that was rarely an issue because the service waited for the next
auto restart to be rescheduled, letting the queued stop job to be proceed and
service_stop() to be called preventing the next restart to complete.

However when RestartSec=0, the timer expired right away making PID1 to
reschedule the unit again, making the timer expired right away... and so
on. This busy loop prevented PID1 to handle any queued jobs (and hence giving
no chance to the start rate limiting to trigger), which made the busy loop last
forever.

This patch breaks this loop by skipping the reschedule of the unit auto restart
and hence not depending on the value of u->restart_usec anymore.

Fixes: #13667
2022-02-22 11:45:12 +01:00
Lennart Poettering
de90700f36 pid1: set SYSTEMD_NSS_DYNAMIC_BYPASS=1 env var for dbus-daemon
There's currently a deadlock between PID 1 and dbus-daemon: in some
cases dbus-daemon will do NSS lookups (which are blocking) at the same
time PID 1 synchronously blocks on some call to dbus-daemon. Let's break
that by setting SYSTEMD_NSS_DYNAMIC_BYPASS=1 env var for dbus-daemon,
which will disable synchronously blocking varlink calls from nss-systemd
to PID 1.

In the long run we should fix this differently: remove all synchronous
calls to dbus-daemon from PID 1. This is not trivial however: so far we
had the rule that synchronous calls from PID 1 to the dbus broker are OK
as long as they only go to interfaces implemented by the broke itself
rather than services reachable through it. Given that the relationship
between PID 1 and dbus is kinda special anyway, this was considered
acceptable for the sake of simplicity, since we quite often need
metadata about bus peers from the broker, and the asynchronous logic
would substantially complicate even the simplest method handlers.

This mostly reworks the existing code that sets SYSTEMD_NSS_BYPASS_BUS=
(which is a similar hack to deal with deadlocks between nss-systemd and
dbus-daemon itself) to set SYSTEMD_NSS_DYNAMIC_BYPASS=1 instead. No code
was checking SYSTEMD_NSS_BYPASS_BUS= anymore anyway, and it used to
solve a similar problem, hence it's an obvious piece of code to rework
like this.

Issue originally tracked down by Lukas Märdian. This patch is inspired
and closely based on his patch:

       https://github.com/systemd/systemd/pull/22038

Fixes: #15316
Co-authored-by: Lukas Märdian <slyon@ubuntu.com>
2022-02-18 10:49:36 +01:00
Lennart Poettering
e39eb045a5 pid1: lookup owning PID of BusName= name of services asynchronously
A first step of removing blocking calls to the D-Bus broker from PID 1.
There's a lot more to got (i.e. grep src/core/ for sd_bus_creds
basically), but it's a start.

Removing blocking calls to D-Bus broker deals systematicallly with
deadlocks caused by dbus-daemon blocking on synchronous IPC calls back
to PID1 (e.g. Varlink calls through nss-systemd). Bugs such as #15316.

Also-see: https://github.com/systemd/systemd/pull/22038#issuecomment-1042958390
2022-02-18 10:49:31 +01:00
Lennart Poettering
1e8b312e5a pid1: watch bus name always when we have it
Previously we'd only watch configured service bus names if Type=dbus was
set. Let's also watch it for other types. This is useful to pick up the
main PID of such a service. In fact the code to pick it up was already
in place, alas it didn't do anything given the signal was never received
for it. Fix that.

(It's also useful for debugging)
2022-02-18 10:45:47 +01:00
Luca Boccassi
5d11af60ac Merge pull request #22498 from yuwata/cgroup-threaded-mode
cgroup: ignore error in attaching process when threaded mode is used
2022-02-16 18:59:06 +00:00
Yu Watanabe
e43a418f86 Merge pull request #22271 from keszybz/manager-reexec-freeze
Freeze manager if reexec fails
2022-02-16 23:02:21 +09:00
Zbigniew Jędrzejewski-Szmek
667030bff6 manager: add {} around cpu sets, use range formatting
We would print "Setting NUMA policy to bind, with nodes .".
This is not very clear, change it to "… with nodes {}.".

Also use range formatting for masks to make output shorter.
2022-02-16 08:07:20 +01:00
Zbigniew Jędrzejewski-Szmek
6b1fa53997 manager: add few ", ignoring" and adjust level in one message 2022-02-16 08:07:20 +01:00
Yu Watanabe
702cf08fce core/execute: warn when threaded mode is detected
Prompted by #22486.
2022-02-16 15:59:03 +09:00
Zbigniew Jędrzejewski-Szmek
1e3eee8cf0 manager: if we are reexecuting, do not invoke any fallbacks
For https://bugzilla.redhat.com/show_bug.cgi?id=1986176:
if we are trying to reexecute, and this fails for any reason, we shouldn't
try to execute /sbin/init or /bin/sh. It is better to just freeze.
If we freeze it is easier to diagnose what happened, but if we execute
one of the fallbacks, we don't really know what will happen. In particular
the new init might just return, causing the machine to shut down. Or we
may successfully spawn /bin/sh, which could leave the machine open.
2022-02-15 11:13:26 +01:00
Zbigniew Jędrzejewski-Szmek
5409c6fcc5 manager: do not ignore the return value from the main loop
If manager_loop() fails, we would print an error message, but then actually
ignore the error in main(), and potentially execute the shutdown binary.
I'm not sure how likely this is to happen in practice, but it seems sloppy.
So let's do the cleanup, but actually freeze() if manager_loop() returned
an error.

invoke_main_loop() is refactored to return the manager objective. This way
we don't need to pass a separate parameter to specify whether we are
reexecuting. Subsequent patch will make further use of the returned objective.
2022-02-15 11:13:24 +01:00
Lennart Poettering
5483fca07a pid1: export cgroup ID among per-unit cgroup information
It's really interesting for debugging purposes and we have it already,
hence expose it as dbus property.
2022-02-11 13:36:39 +01:00
Lennart Poettering
1b42022388 cgroup: downgrade warning if we can't get ID off cgroup
The cgroupid feature was not available in old cgroupvs2 kernels, hence
try to get it but if we can't because it's not supported, then only
debug log about it and proceed.

(We only needs this for cgroup bpf stuff, but that isn't available on
such old kernels anyway)

Fixes: #22483
2022-02-11 13:36:39 +01:00
Frantisek Sumsal
da185cd04d tree-wide: move unsigned to the start of type declaration
Even though ISO C11 doesn't mandate in which order the type specifiers
should appear, having `unsigned` at the beginning of each type
declaration feels more natural and, more importantly, it unbreaks
Coccinelle, which has a hard time parsing `long unsigned` and others:

```
init_defs_builtins: /usr/lib64/coccinelle/standard.h
init_defs: /home/mrc0mmand/repos/systemd/coccinelle/macros.h
HANDLING: src/shared/mount-util.c
: 1: strange type1, maybe because of weird order: long unsigned
```

Most of the codebase already "complies", so let's fix the remaining
"offenders".
2022-02-10 21:00:22 +01:00
Lennart Poettering
a99a85242c tree-wide: use config_parse_safe_string() at various places 2022-02-09 10:17:33 +01:00
Luca Boccassi
dde009a879 core: simply freeing list in job_free()
Follow-up for cdebedb4d4
2022-02-02 16:33:25 +00:00
Luca Boccassi
b7b4252443 core: use strextend instead of strextendf when possible
Follow-up for cdebedb4d4
2022-02-02 16:33:25 +00:00
Yu Watanabe
e4de58c823 core/mount: fail early if directory cannot be created
Prompted by #22334.
2022-02-02 15:09:45 +09:00
Luca Boccassi
86838bf08b core: warn on ExitType=cgroup with legacy cgroup setup
'cgroup empty' notifications are not reliable on v1, so log a warning.

See: https://github.com/systemd/systemd/issues/22320
2022-02-02 07:07:47 +09:00
Lennart Poettering
421bb42d1b execute: document that the 'env' param is input *and* output 2022-02-01 13:50:28 +01:00
Lennart Poettering
cafc5ca147 execute: line break comments a bit less aggressively 2022-02-01 13:50:13 +01:00
Lennart Poettering
46e5bbab58 execute: use _cleanup_ logic where appropriate 2022-02-01 13:49:56 +01:00
Lennart Poettering
7feb2b5737 pid1: pass PAM_DATA_SILENT to pam_end() in child
Fixes: #22318
2022-02-01 12:37:51 +01:00
James Hilliard
04660b10d3 meson: use full argument names for bpftool gen commands
This should be a purely cosmetic change.
2022-02-01 12:26:30 +09:00
Lennart Poettering
69339ae9f7 tree-wide: some additional checks to avoid CVE-2021-4034 style weaknesses 2022-01-31 23:07:19 +00:00
Luca Boccassi
0ec3af43f2 Merge pull request #22300 from yuwata/bus-fix-error-handling
tree-wide: fix bus method error handling
2022-01-31 14:03:00 +00:00
Luca Boccassi
9d6d4c305a core: don't fail on EEXIST when creating mount point
systemd[1016]: Failed to mount /tmp/app1 (type n/a) on /run/systemd/unit-extensions/1 (MS_BIND ): No such file or directory
systemd[1016]: Failed to create destination mount point node '/run/systemd/unit-extensions/1': File exists
2022-01-31 13:53:47 +01:00
Frantisek Sumsal
61b9769bda core: check argc/argv uncoditionally
as `assert()` might be dropped with `-DNDEBUG`.

Follow-up to cf3095a and 1637e75.
2022-01-31 20:00:40 +09:00
James Hilliard
e3759ac43a meson: use bpftool based strip when available
This should be useable in bpftool v5.13 or newer based on:
d80b2fcbe0
2022-01-31 16:42:07 +09:00
Yu Watanabe
cf3095ac2b core: check if argc > 0 and argv[0] is set
Follow-up for 1637e75707.
2022-01-30 13:07:51 +00:00
Yu Watanabe
3332218555 core/unit: use bus_error_message() at one more place 2022-01-30 05:43:56 +09:00
Julia Kartseva
e0c694c73d bpf: load firewall with name only if supported
BPF firewall is supported starting from v4.9 kernel where
BPF_PROG_TYPE_SOCKET_FILTER support was added [0].

However, program name support was added to v4.15 [1] and BPF_PROG_LOAD
syscall will fail on older kernels if called with prog_name attribute.
BPF_F_ALLOW_MULTI was also added to v4.15 kernel which allows reusing
BPF_F_ALLOW_MULTI probe to indicate that program name is also supported.

It is no problem for BPF_PROG_TYPE_CGROUP_DEVICE since it was added in
v4.15.

[0] https://elixir.bootlin.com/linux/v4.9/source/include/uapi/linux/bpf.h#L92
[1] https://elixir.bootlin.com/linux/v4.15/source/include/uapi/linux/bpf.h#L191

Follow-up of https://github.com/systemd/systemd/pull/22214
2022-01-28 12:42:18 +09:00
Luca Boccassi
3fa80e5e75 core: do not attempt to add 'private' symlinks when RootImage/RootDirectory are used
A bind mount is added directly from private on the host to the actual
destination directory, no need for the symlinks (which cannot be created
as the bind mount happens first and creates the target as an actual directory)

Fixes https://github.com/systemd/systemd/issues/22264
2022-01-28 00:54:10 +00:00
Luca Boccassi
6d7c999ab5 core: add clearer debug log when setting up ExecDirectories symlinks fails 2022-01-27 14:21:29 +00:00
Yu Watanabe
a21440f6d6 Merge pull request #22259 from bluca/exec_cond_restart
core: do not restart a service with Restart=always when ExecCondition fails
2022-01-27 15:09:47 +09:00
Anita Zhang
1d3b68f6e1 tree-wide: don't use strjoina() on getenv() values
Avoid doing stack allocations on environment variables.
2022-01-27 13:45:00 +09:00
Luca Boccassi
abb99360d3 core: do not restart a service with Restart=always when ExecCondition fails
When a Condition*= fails, and a service has Restart=always,
the service is not restarted.
Follow the same behaviour for ExecCondition= to avoid inconsistencies.

Fixes #22257
2022-01-26 19:02:11 +00:00
Luca Boccassi
cb94b8acc5 Merge pull request #22203 from brauner/2022-01-21.procsubset.pid
core/namespace: allow using ProtectSubset=pid and ProtectHostname=tru…
2022-01-24 13:04:23 +00:00
Christian Brauner
fbf90c0d5c core/namespace: s/normalize_mounts()/drop_unused_mounts()
Rename the normalize_mounts() helper to drop_unused_mounts. All the
helpers called in there get rid of mounts that are unused for a variety
of reasons. And whereas the helpers are aptly prefixed with "drop" the
overall helper isn't and instead uses "normalize".

Make it more obvious what the helper actually does by renaming it from
normalize_mounts() to drop_unused_mounts(). Readers of code calling this
helper will immediately see that it will get rid of unused mounts.

Link: https://github.com/systemd/systemd/issues/22206
2022-01-24 10:22:47 +01:00
Christian Brauner
1361f01577 core/namespace: allow using ProtectSubset=pid and ProtectHostname=true together
If a service requests both ProtectSubset=pid and ProtectHostname=true
then it will currently fail to start. The ProcSubset=pid option
instructs systemd to mount procfs for the service with subset=pid which
hides all entries other than /proc/<pid>. Consequently trying to
interact with the two files /proc/sys/kernel/{hostname,domainname}
covered by ProtectHostname=true will fail.

Fix this by only performing this check when ProtectSubset=pid is not
requested. Essentially ProtectSubset=pid implies/provides
ProtectHostname=true.
2022-01-24 09:41:28 +01:00
Zbigniew Jędrzejewski-Szmek
398a500916 core/execute: use _cleanup_ in exec_context_load_environment()
Also rename variables.
2022-01-23 14:39:46 +09:00
Julia Kartseva
8fe9dbb926 bpf: name unnamed bpf programs
bpf-firewall and bpf-devices do not have names. This complicates
debugging with bpftool(8).

Assign names starting with 'sd_' prefix:
* firewall program names are 'sd_fw_ingress' for ingress attach
point and 'sd_fw_egress' for egress.
* 'sd_devices' for devices prog

'sd_' prefix is already used in source-compiled programs, e.g.
sd_restrictif_i, sd_restrictif_e, sd_bind6.

The name must not be longer than 15 characters or BPF_OBJ_NAME_LEN - 1.

Assign names only to programs loaded to kernel by systemd since
programs pinned to bpffs are already loaded.
2022-01-22 16:48:42 +09:00
Luca Boccassi
a07b992606 core: add ExtensionDirectories= setting
Add a new setting that follows the same principle and implementation
as ExtensionImages, but using directories as sources.
It will be used to implement support for extending portable images
with directories, since portable services can already use a directory
as root.
2022-01-21 22:53:12 +09:00
Luca Boccassi
071be9701a Merge pull request #22195 from keszybz/more-specifiers
Add unit specifiers for fragment path and directory
2022-01-21 11:22:22 +00:00
Zbigniew Jędrzejewski-Szmek
607f032858 core: add %y/%Y specifiers for the fragment path of the unit
Fixes #6308: people want to be able to link a unit file via 'systemctl enable'
from a git checkout or such and refer to other files in the same repo.
The new specifiers make that easy.

%y/%Y is used because other more obvious choices like %d/%D or %p/%P are
not available because at least on of the two letters is already used.

The new specifiers are only available in units. Technically it would be
trivial to add then in [Install] too, but I don't see how they could be
useful, so I didn't do that.

I added both %y and %Y because both were requested in the issue, and because I
think both could be useful, depending on the case. %Y to refer to other files
in the same repo, and %y in the case where a single repo has multiple unit files,
and e.g. each unit has some corresponding asset named after the unit file.
2022-01-21 08:00:41 +01:00
Zbigniew Jędrzejewski-Szmek
601dc59be2 Use ASSERT_PTR() in more places 2022-01-20 17:29:51 +01:00
Luca Boccassi
78ab2b5064 core: refuse to mount ExtensionImages if the base layer doesn't at least have ID in os-release
We can't match an extension if we don't at least have an ID,
so refuse to continue
2022-01-19 00:08:57 +00:00
Yu Watanabe
1fb50408ce pid1,cgroup-show: ignore -EOPNOTSUPP in cg_read_pid()
The function is called in recursion, and cgroup.procs in some subcgroups
may not be read.

Fixes #22089.
2022-01-18 12:34:30 +01:00
Yu Watanabe
adc1b76c30 core: add missing dependency DBus properties
Follow-up for 0bc488c99a.

Also sort dependency properties to make them match the definition of
`enum UnitDependency` in basic/unit-def.h.

Fixes #22133.
2022-01-16 14:05:33 +00:00
Yu Watanabe
cc8943b84a core: update log message
Fixes CID#1469009.
2022-01-16 14:05:18 +00:00