Commit Graph

307 Commits

Author SHA1 Message Date
Mike Yuan
d6c8e0ce95 process-util: use procfs_file_get_field() where appropriate 2025-05-01 13:10:26 +09:00
Mike Yuan
15036f8555 fileio: modernize get_proc_field()
- Drop effectively unused "terminator" param, imply whitespace
- Make ret param optional
- Return ENODATA if the requested key is not found, rather than
  ENOENT
- Turn ENOENT -> ENOSYS if /proc/ is not mounted
- Don't skip whitespaces before ':', nothing needs this handling
  anyways
- Remove the special treatment for all "0"s. We don't actually
  use this for capabilities given pidref_get_capability() exists
- Switch away from read_full_virtual_file() - files using "field"
  scheme under /proc/ seem all to be "seq_file"s (refer to
  da65941c3e for details on file types)
2025-05-01 13:10:26 +09:00
Yu Watanabe
147f1d7b26 process-util: make wait_for_terminate() as trivial wrapper of its PidRef version 2025-04-03 00:11:27 +09:00
Lennart Poettering
4ecc87bf1c process-util: add pidref-based version of wait_for_terminate_and_check() 2025-04-03 00:11:27 +09:00
Yu Watanabe
3cf6a3a3d4 tree-wide: check more log message format in log_struct() and friends
This introduce LOG_ITEM() macro that checks arbitrary formats in
log_struct().
Then, drop _printf_ attribute from log_struct_internal(), as it does not
help so much, and compiler checked only the first format string.

Hopefully, this silences false-positive warnings by Coverity.
2025-03-19 01:56:48 +09:00
Mike Yuan
33db9f214b missing_syscall: drop raw_getpid()
This used to be relevant since in old versions of glibc an internal
cache is maintained, while we might sidestep their invalidation
with raw_clone(). After glibc 2.25 getpid() is a trivial wrapper
for the syscall, and hence there's no need to have a separate
raw_getpid().
2025-03-04 23:03:24 +01:00
Yu Watanabe
e75372958d missing_threads.h: threads.h exists since glibc-2.28 2025-03-04 02:24:49 +09:00
Mike Yuan
3ddbc34e15 process-util: refuse FORK_WAIT + FORK_FREEZE combination 2025-02-21 21:35:05 +00:00
Daan De Meyer
dc2f960b78 process-util: Allow setting ret_pid with FORK_DETACH in safe_fork()
Let's allow getting the pid even if the caller sets FORK_DETACH. We
do this via a socketpair() over which we send the inner child pid.
2025-02-20 21:00:52 +01:00
Daan De Meyer
f48103ea61 process-util: Implement safe_fork_full() on top of pidref_safe_fork_full()
Let's switch things around, and move the internals of safe_fork_full() into
pidref_safe_fork_full() and make safe_fork_full() a trivial wrapper on top
of pidref_safe_fork_full().
2025-02-20 20:13:53 +01:00
Lennart Poettering
4ace93da8c pidref: now that we have the cached pidfdid of our own process, use it
Note that this drops a lot of "const" qualifiers on PidRef arguments.
That's because pidref_is_self() suddenly might end changing the PidRef
because it acquires the pidfd ID.

We had this previously already with pidfd_equal(), but this amplifies
the problem.

I guess we C's "const" doesn't really work for stuff that contains
caches, that is just conceptually constant, but not actually.
2025-01-20 21:51:40 +01:00
Yu Watanabe
f0159e2b5b process-util: fix typo
Also rebreak comment.

Follow-up for 03b89cf213.
2025-01-19 04:24:08 +09:00
Lennart Poettering
277255e814 process-util: slightly update comment in freeze() 2025-01-16 11:55:21 +01:00
Lennart Poettering
d6267b9b18 process-util: port pid_from_same_root_fs() to pidref, and port three places over to it 2025-01-16 11:55:21 +01:00
Lennart Poettering
6eeeef9f66 process-util: introduce new FORK_FREEZE flag for safe_fork()
Often we want to fork off a process that just hangs until we kill it,
let's add a simple flag to create one of this type, and use it at
various places.
2025-01-16 11:55:21 +01:00
Lennart Poettering
9ef559a036 tree-wide: drop support for kernels without pidfd_open() and pidfd_send_signal() (#35971) 2025-01-16 11:37:17 +01:00
Yu Watanabe
132a164d97 Follow-ups for recent namespace PRs (#35923) 2025-01-15 14:10:21 +09:00
Mike Yuan
e755cde735 process-util: depend on CLONE_PIDFD 2025-01-12 00:17:20 +01:00
Lennart Poettering
361327e929 convert more code to PidRef (#35895) 2025-01-11 23:14:33 +01:00
Mike Yuan
dfef02c675 process-util: drop duplicate assertions 2025-01-11 15:53:13 +01:00
Lennart Poettering
47e45ea738 process-util: make pidref_safe_fork_full() work with FORK_WAIT
(This is useful for the test case added in the next commit, where it's
kinda nice being able to use pidref_safe_fork_full() and acquiring a
pidref of the child in the child in one go. There's no other value in
this than a bit of synctactic sugar for that test. But otoh thre's no
good reason to prohibit FORK_WAIT use like this, hence either way, this
commit should be a good thing.)
2025-01-10 14:14:17 +01:00
Lennart Poettering
9237a63a80 process-util: add new helper pidref_get_ppid_as_pidref() 2025-01-10 14:14:17 +01:00
Ivan Kruglov
03b89cf213 basic: fixes in read_errno()
follow ups for https://github.com/systemd/systemd/pull/35880
2025-01-10 11:49:49 +01:00
Lennart Poettering
7893362508 process-util: do not unblock unrelated signals while forking
This makes sure when we are blocking signals in preparation for fork()
we'll not temporarily unblock any signals previously set, by mistake.

It's safe for us to block more, but not to unblock signals already
blocked. Fix that.

Fixes: #35470
2025-01-10 16:10:31 +09:00
Ivan Kruglov
64db44f7fb process-util: read_errno() 2025-01-09 10:47:24 +01:00
Lennart Poettering
9ed2725867 process-util: a process from a foreign pidns is definitely not our child
Addresses: https://github.com/systemd/systemd/pull/35242#pullrequestreview-2531712318
2025-01-07 08:55:21 +01:00
Mike Yuan
223d455670 process-util: make pid_is_unwaited() wrapper around pidref version 2025-01-04 17:48:22 +01:00
Mike Yuan
47f64104d1 process-util: port pidref_get_uid() and pidref_is_my_child() to pidfd helpers 2025-01-04 17:48:22 +01:00
Mike Yuan
a33f691374 process-util: move namespace_get_leader() to namespace-util
This allows us to drop the hack for recursive includes.
2025-01-04 17:08:00 +01:00
Mike Yuan
b234026d09 process-util: extract pidfd-related funcs into pidfd-util.[ch] 2025-01-04 16:58:13 +01:00
Lennart Poettering
25b1a73f71 journald: get rid of get_process_capeff(), use pidref_get_capability() instead
This does pretty much the same, but is nicer, since it parses things
properly.
2024-12-17 19:06:54 +01:00
Mike Yuan
61263e1436 process-util: make sure we don't report ppid == 0
Previously, if pid == 0 and we're PID 1, get_process_ppid()
would set ret to getppid(), i.e. 0, which is inconsistent
when pid is explicitly set to 1. Ensure we always handle
such case by returning -EADDRNOTAVAIL.
2024-12-11 14:44:08 +01:00
Mike Yuan
07612aab66 process-util: use our usual tristate semantics for is_main_thread()
While at it, _unlikely_ is dropped, as requested in
https://github.com/systemd/systemd/pull/35242#discussion_r1880096233
2024-12-11 14:44:07 +01:00
Mike Yuan
f87863a8ff process-util: refuse to operate on remote PidRef
Follow-up for 7e3e540b88
2024-11-20 18:10:26 +00:00
Mike Yuan
c8590ad60d process-util: refuse FORK_DETACH + FORK_DEATHSIG_*
There's no synchoronization between the intermediate process
and the double-forked child, and the semantics are not useful.
Refuse such combination.
2024-11-14 12:22:15 +00:00
Lennart Poettering
7bf0149e9b process-util: more gracefully handle oom adjust parsing/setting
Who knows what kind of mount shenanigans people employ, let's gracefully
handle parse failures of proc files, like we alway do otherwsie.
2024-11-12 23:03:40 +01:00
Ivan Kruglov
a567de392d process-util: introduce report_errno_and_exit() as part of src/basic/process-util.{h,c} 2024-11-06 11:18:38 +01:00
Daan De Meyer
406f177501 core: Introduce PrivatePIDs=
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.

Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.

We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.

When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.

Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.

Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.

Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
2024-11-05 05:32:02 -08:00
Mike Gilbert
ff94426f8a posix_spawn_wrapper: do not set POSIX_SPAWN_SETSIGDEF flag
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.

If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.

This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.

As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
2024-10-31 18:16:58 +01:00
Mike Yuan
e06c5be29a process-util: always retry with pidfd_spawn() w/o cgroup first
Follow-up for 7ac58157ca

With the mentioned commit, iff E2BIG we'd retry pidfd_spawn()
with POSIX_SPAWN_SETCGROUP disabled. However, the same strategy
should actually apply to EOPNOTSUPP/ENOSYS/EPERM too -
they can mean two things here: no clone3() or no CLONE_PIDFD.
Therefore, let's first try clone() + CLONE_PIDFD, and fall further back
to plain clone() (posix_spawn()) only as last resort. Plus, record
the fact so that we don't unnecessarily retry every single time
if CLONE_PIDFD is the one that's unavailable.
2024-08-21 15:27:57 +02:00
Mike Yuan
df99a8ef3d process-util: check the flag instead of 'cgroup' param
We might skip CLONE_INTO_CGROUP wholly if not supported.
2024-08-21 15:17:05 +02:00
Kornilios Kourtis
7ac58157ca process-util: handle pidfd_spawn() returning E2BIG
In some kernels (specifically, 5.4) even though the clone3 syscall is
supported, setting CLONE_INTO_CGROUP is not. The error message returned
in this case is E2BIG.

If posix_spawn_wrapper encounters this error, it does not retry, and
cannot spawn any programs in said kernels.

This commit adds a check for the E2BIG error and retries pidfd_spawn()
without the POSIX_SPAWN_SETCGROUP flag.

If we encounter an E2BIG error, and the pidfd_spawn() succeeds after
removing the POSIX_SPAWN_SETCGROUP flag, then we cache the result so
that we do not retry every time.

Originally, this issue was reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077204.

Signed-off-by: Kornilios Kourtis <kornilios@gmail.com>
2024-08-21 02:04:57 +09:00
Mike Yuan
f32538e1cc basic/process-util: modernize setpriority_closest()
Before this commit, the "Cannot raise nice level" branch
is rather confusing, as we're actually lowering the nice.
Also, it's better to log about the final nice value
for both cases, no matter whether we need to set to limit
or not.
2024-08-18 15:16:03 +02:00
Mike Yuan
8dc303d3c8 process-util: modernize pidfd_get_pid() 2024-07-21 22:48:53 +02:00
Mike Yuan
6fb97a85c7 process-util: make pid*_get_start_time return usec_t 2024-05-22 18:47:16 +08:00
Mike Yuan
58ff2f1e38 core/execute: also check cg_is_threaded for clone3()
Prompted by #32259

We already have this check in exec_invoke(), i.e. child.
But if CLONE_INTO_CGROUP is used, the failure would
occur on parent's side, so do the check there too.
2024-04-14 23:22:13 +08:00
Zbigniew Jędrzejewski-Szmek
418b936d47 various: use strdup_to() after getenv() 2024-03-20 15:18:21 +01:00
Lennart Poettering
234bdd9c99 process-util: use proc_mounted() check at one more place 2024-02-21 09:25:46 +01:00
Adrian Vovk
85f660d46b fd-util: Expose helper to pack fds into 3,4,5,...
This is useful for situations where an array of FDs is to be passed into
a child process (i.e. by passing it through safe_fork). This function
can be called in the child (before calling exec) to pack the FDs to all
be next to each-other starting from SD_LISTEN_FDS_START (i.e. 3)
2024-02-19 11:18:11 +00:00
Frantisek Sumsal
3dc51ab2cf process-util: use only the least significant byte from personality()
The personality() syscall returns a 32-bit value where the top three
bytes are reserved for flags that emulate historical or architectural
quirks, and only the least significant byte reflects the actual
personality we're interested in (in opinionated_personality()).

Use the newly defined mask in the corresponding test as well, otherwise
the test fails on some more "exotic" architectures that set some of the
"quirk" flags:

~# uname -m
armv7l
~# build/test-seccomp
...
/* test_lock_personality */
current personality=0x0
safe_personality(PERSONALITY_INVALID)=0x800000
Assertion '(unsigned long) safe_personality(current) == current' failed at src/test/test-seccomp.c:970, function test_lock_personality(). Aborting.
lockpersonalityseccomp terminated by signal ABRT.
Assertion 'wait_for_terminate_and_check("lockpersonalityseccomp", pid, WAIT_LOG) == EXIT_SUCCESS' failed at src/test/test-seccomp.c:996, function test_lock_personality(). Aborting.
Aborted (core dumped)

See: personality(2) and comments in sys/personality.h
2024-02-07 19:29:53 +01:00