Commit Graph

7744 Commits

Author SHA1 Message Date
Mike Yuan
e03975b90f core/execute: use assertion for _done function
As per our usual coding style.
2023-12-20 21:56:49 +08:00
Mike Yuan
a7774a8ccb core/execute: remove unneeded brackets
I did not merge the if-s, since I think it's easier to read
in the current form with those long socketpair() calls.
2023-12-20 21:52:59 +08:00
networkException
4e0db87e4c core: allow interface altnames in RestrictNetworkInterfaces=
This patch enables IFNAME_VALID_ALTERNATIVE for checks guarding the
parsing of RestrictNetworkInterfaces=.

The underlying implementation for this option already supports
altnames.
2023-12-18 15:12:10 +01:00
Mike Yuan
3e5b96eed3 core/unit: clean up unit_log_resources
* Use a unified struct to store accounting fields/suffixes
* Use strextendf_with_separator where appropriate
* Don't mix stack and heap allocation for one iovec array
2023-12-13 20:42:06 +08:00
Mike Yuan
b1b84bc590 core/unit: raise log level for unit_log_resources on certain memory thresholds
We already do this for all other types of accountings. Let's
make this nicer for memory accounting too.
2023-12-13 20:42:06 +08:00
Lennart Poettering
026a8b022e execute: improve log message about TTY ownership reset failures 2023-12-12 16:06:08 +01:00
Lennart Poettering
f121efd392 execute: handle gracefully if we cannot lock /dev/console when resetting tty due to perms
This is the common case in --user instances, hence handle this
gracefully.

This should be safe since user instances won't get access to
/dev/console-related ttys anyway, but only their own ptys.
2023-12-12 22:02:12 +09:00
Mike Yuan
d8deb18720 core/job: emit job start message if we're only waiting for unit state
Currently, start/stop messages for device units are not used, since
job_perform_on_unit() does nothing and we simply wait for unit status
change. I think we still want some nice log messages explaining what
the start jobs for devices are doing, so let's fix this.
2023-12-12 17:04:30 +08:00
Mike Yuan
3f4a7a472f core/device: add stopping job message
The use case for stopping a device unit is indeed narrow,
but we still want to show a clear message.

Preparation for later commits.
2023-12-12 16:45:30 +08:00
Luca Boccassi
1eeaa93de3 executor: don't duplicate FD array to avoid double closing
Just use ExecParam directly, as these are all internal to sd-exec now
anyway. Avoids double close when execution fails after FDs are set up
for inheritance and were already re-arranged.

Fixes https://github.com/systemd/systemd/issues/30412
2023-12-11 15:55:50 +00:00
Mike Yuan
c8f7c9a11d core/exec-invoke: sigwait() returns positive errno and never EINTR
Follow-up for 5b6319dcee (gosh this is
ancient), and effectively reverts 3dead8d925.

sigwait() is documented to "suspend execution of the calling thread
until one of the signals specified in the signal set becomes pending".
And the only error it returns is EINVAL, when "set contains an invalid
signal number". Therefore, there's no need to run it in a loop or
to check for runtime error.
2023-12-10 09:44:44 +01:00
Mike Yuan
ba8245a77a core/executor: do destruct static variables and selinux before exiting
I was wondering why I couldn't trigger the assertion in safe_fclose()
when submitting #30251. It turned out that the static destructor was
not run at all :/

Replace main() with a minimized version of main-func.h. This also
prevents emitting negative exit codes.
2023-12-10 14:13:35 +09:00
Yu Watanabe
f1e89cb9b1 Merge pull request #30399 from YHNdnzj/memory-accounting-always-peak
systemctl-show: always show memory peak if available
2023-12-10 14:11:05 +09:00
Mike Yuan
ad009380e1 core/cgroup: cache the last memory usage values before destroying cgroup
Currently, memory accounting values are only cached if it was queued
at least once before destroying cgroup. Let's always cache it like
what we already do for CPU usage.

Preparation for later changes.
2023-12-09 20:42:48 +08:00
Luca Boccassi
9614dd542b mount: check that MountParameters is valid before use
Follow-up for 6c75eff6af

CID#1530430
2023-12-09 11:57:01 +00:00
Mike Yuan
b041175e08 core/executor: save argv for later use by rename_process()
Partially fixes #30352
2023-12-08 21:49:27 +08:00
Mike Yuan
c0e82e3a23 core/exec-invoke: voidify one rename_process call 2023-12-08 19:46:53 +08:00
Luca Boccassi
6c75eff6af core: create workdir/upperdir when mounting a Type=overlay mount unit
So far we created the target directory, and the source for bind mounts,
but not workdir/upperdir for overlays, so it has to be done separately
and strictly before the unit is started, which is annoying. Check the
options when creating directories, and if upper/work directories are
specified, create them.
2023-12-08 11:22:14 +09:00
Luca Boccassi
ebc7510380 core: relax dependency on RootImage= storage from Requires= to Wants=
If a unit is running in an image and wants to survive a soft-reboot,
then it can't be deactivated by the storage of the image going away.
Relax the dependency to a Wants=. Access to the image is not needed
when the unit is running anyway, so downgrade to Wants=.
2023-12-08 11:16:31 +09:00
Luca Boccassi
ae7482b994 core: do not make private /dev/ read-only too soon
The read-only bit is flipped after setting up all the mounts, so that
bind mounts can be added. Remove the early config, and add a unit
test.

Fixes https://github.com/systemd/systemd/issues/30372
2023-12-08 11:09:14 +09:00
Lennart Poettering
4482ea0c24 Merge pull request #30271 from YHNdnzj/executor-cloexec
fdset,core/executor: ocloexecification ™️
2023-12-06 22:26:40 +01:00
Lennart Poettering
4d56442755 recurse-dir: add new readdir_all_at() helper
This new helper combines open() with readdir_all() to simplify a few
callers.
2023-12-06 22:12:48 +01:00
Lennart Poettering
936fcc4668 show-status: suffix output ith CRNL rather than just NL
This is similar to #30183 but focusses on the status output rather than
the log output.

Since the status output always goes to a TTY we don't have to
conditionalize things on isatty().

Fixes: #30184
2023-12-06 22:11:54 +01:00
Lennart Poettering
6498a0c2cc user-util: add new helper fully_set_uid_gid()
Usually when we do setresuid() we also do setesgid() and setgroups().
Let's add a common helper that does all three, and use it everywhere.
2023-12-06 22:11:38 +01:00
Lennart Poettering
ffc1ec73b3 pid1: add ProtectSystem= as system-wide configuration, and default it to true in the initrd
This adds a new ProtectSystem= setting that mirrors the option of the
same of services, but in a more restrictive way. If enabled will remount
/usr/ to read-only, very early at boot. Takes a special value "auto"
(which is the default) which is equivalent to true in the initrd, and
false otherwise.

Unlike the per-service option we don't support full/strict modes, but
the door is open to eventually support that too if it makes sense. It's
not entirely trivial though as we have very little mounted this early,
and hence the mechanism might not apply 1:1. Hence in this PR is a
conservative first step.

My primary goal with this is to lock down initrds a bit, since they
conceptually are mostly immutable, but they are unpacked into a mutable
tmpfs. let's tighten the screws a bit on that, and at least make /usr/
immutable.

This is particularly nice on USIs (i.e. Unified System Images, that pack
a whole OS into a UKI without transitioning out of it), such as
diskomator.
2023-12-06 22:10:20 +01:00
Alan Liang
67001c2534 core: add specifier expansion to AllowedCPUs= and friends 2023-12-06 22:04:28 +01:00
Luca Boccassi
f9a284f02d Merge pull request #30214 from bluca/wants_mounts_for
Add WantsMountsFor= and use it in the cryptsetup generator
2023-12-06 21:00:37 +00:00
Luca Boccassi
cc9f4cad8c executor: apply LogLevelMax earlier
SELinux logs before we have a chance to apply it, move it up as it
breaks TEST-04-JOURNAL:

[  408.578624] testsuite-04.sh[11463]: ++ journalctl -b -q -u silent-success.service
[  408.578743] testsuite-04.sh[11098]: + [[ -z Dec 03 13:38:41 H systemd-executor[11459]: SELinux enabled state cached to: disabled ]]

Follow-up for: bb5232b6a3
2023-12-04 11:45:22 +09:00
Luca Boccassi
ebaf2821e6 Merge pull request #30291 from keszybz/seccomp-unknown-syscall
Backwardscompatibly handle syscalls unknown to us or libseccomp
2023-12-02 02:04:24 +00:00
Mike Yuan
a8aed6a9b6 core/cgroup: for non-cached attrs, don't return ENODATA blindly
Follow-up for f17b07f4d7

Hope I won't break this thing again...
2023-12-02 00:13:46 +00:00
Zbigniew Jędrzejewski-Szmek
86a1ee93f3 core: fix comment 2023-12-01 19:40:26 +01:00
Luca Boccassi
f4a35f2ad9 core: do not drop CAP_SETUID if it is in AmbientCapabilities=
Follow-up for 24832d10b6
2023-12-01 10:48:14 +00:00
Mike Yuan
5a5fdfe3ac core/exec-invoke: prevent potential double-close of exec_fd
If exec_fd is closed in add_shifted_fd() by close_and_replace(),
but something goes wrong later, we may close exec_fd twice
in exec_params_shallow_clear().
2023-12-01 00:14:37 +08:00
Mike Yuan
f38cbaff63 core/exec-invoke: remove redundant fd_cloexec() call 2023-12-01 00:14:37 +08:00
Mike Yuan
a2467ea894 fdset: set all collected fds to CLOEXEC in fdset_new_fill() 2023-12-01 00:14:37 +08:00
Mike Yuan
d8da25b5d9 core/exec-invoke: rename flags_fds to flag_fds 2023-12-01 00:07:04 +08:00
Mike Yuan
a3e8e15480 core/execute-serialize: FOREACH_ARRAY at one more place 2023-12-01 00:07:04 +08:00
Daan De Meyer
ef90e8f9db Make sure we close bpf outer map fd in systemd-executor
Not doing so leaks it into the child service and causes selinux
denials.
2023-12-01 00:06:24 +08:00
Mike Yuan
79bad078bb core/executor: avoid double closing serialization fd
Before this commit, between fdopen() (in parse_argv()) and fdset_remove(),
the serialization fd is owned by both arg_serialization FILE stream and fdset.
Therefore, if something wrong happens between the two calls, or if --deserialize=
is specified more than once, we end up closing the serialization fd twice.
Normally this doesn't matter much, but I still think it's better to fix this.

Let's call fdset_new_fill() after parsing serialization fd hence.
We set the fd to CLOEXEC in parse_argv(), so it will be filtered
when the fdset is created.

While at it, also move fdset_new_fill() under the second log_open(), so
that we always log to the log target specified in arguments.
2023-11-30 09:56:59 +00:00
Daan De Meyer
5c314412f0 core: Always call log_open() in systemd-executor
log_setup() will open the console in systemd-executor because it's
not pid 1 and it's not connected to the journal. So if the log target
is later changed to kmsg, we have to reopen the log.

But since log_open() won't open the same log twice, let's just call it
unconditionally since it will be a noop if we try to reopen the same log.

This makes sure that systemd-executor will log to the log target passed
via --log-target= after parsing arguments.
2023-11-29 22:56:50 +00:00
Luca Boccassi
8284c2cb68 core: switch var-tmp.mount to WantsMountsFor for PrivateTmp=yes
Align with tmp.mount
2023-11-29 11:04:59 +00:00
Luca Boccassi
46d45f90b3 core: use new WantsMountsFor= for PrivateTmp=yes and tmp.mount 2023-11-29 11:04:59 +00:00
Luca Boccassi
61aa5f707e core: add WantsMountsFor= on WorkingDirectory= if it's allowed to be missing 2023-11-29 11:04:59 +00:00
Luca Boccassi
9e615fa3aa core: add WantsMountsFor=
This is the equivalent of RequiresMountsFor=, but adds Wants= instead
of Requires=. It will be useful for example for the autogenerated
systemd-cryptsetup units.

Fixes https://github.com/systemd/systemd/issues/11646
2023-11-29 11:04:59 +00:00
Yu Watanabe
14338cca99 core/cgroup: fix compile error
With gcc-13,
```
CFLAGS="-O3 -fno-semantic-interposition" meson setup build
```
triggers the following error:
```
../src/core/cgroup.c: In function ‘cgroup_context_dump’:
../src/core/cgroup.c:633:44: error: ‘%s’ directive argument is null [-Werror=format-overflow=]
  633 |                         "%sDeviceAllow: %s %s\n",
      |                                            ^~
cc1: some warnings being treated as errors
```

Fixes #30223.
2023-11-28 10:35:52 +01:00
Luca Boccassi
04fc5b6047 Merge pull request #30170 from bluca/exec_bpf_fd
core: pass bpf_outer_map_fd to sd-executor only if RestrictFileSystems was set
2023-11-27 15:44:50 +00:00
Mike Yuan
f17b07f4d7 core/cgroup: use the cached memory accounting value when cgroup is gone
Follow-up for 9824ab1f00

Fixes https://github.com/systemd/systemd/issues/28542#issuecomment-1825413237
2023-11-25 00:38:49 +08:00
Mike Yuan
35c08a56a1 core/dbus-unit: don't log cgroup v1 property name 2023-11-24 23:22:40 +08:00
Luca Boccassi
2d042c75ff core: remove redundant check when serializing FDs
The helpers already skip if the FD is < 0
2023-11-23 19:14:52 +00:00
Luca Boccassi
60ef4baeed core: pass bpf_outer_map_fd to sd-executor only if RestrictFileSystems was set
It causes SELinux denials to be raised, so restrict it only where needed

Follow-up for beb4ae8755
2023-11-23 19:08:38 +00:00