Commit Graph

890 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek
e019ea738d pid1: order units using TTYVHangup= after vconsole setup
The goal of this change is to delay getty services until after
systemd-vconsole-setup has finished. systemd-vconsole-setup starts loadkeys,
and it seems that when loadkeys is interrupted by the TTY hangup call we do
when starting tty services [1], so that loadkeys starts getting EIO from the
ioctl("/dev/tty1", KDSKBENT) syscall it does.

Fixes #26908.

[1] https://github.com/legionus/kbd/issues/92#issuecomment-1554451788

Initially I wanted to add ordering dependencies to individual units, but
TTYVHangup= can be added to other various external units too. The solution with
an implicit dependency should cover those cases too.
2023-05-19 17:46:30 +02:00
David Tardon
e652663a04 tree-wide: use parse_fd() 2023-05-05 09:10:56 +02:00
Lennart Poettering
3aaa376342 execute: remove credentials dir again when empty
This is closely related to the previous commit: if the credentials dir
is empty and nothing mounted on it, let's remove it again.

This will in particular happen if we decided to not actually install the
mount we prepared for the credentials because it is empty. In that case
the mount point inode is already there, and with this we'll remove it.
Primary effect, users will see ENOENT rather than EACCESS when trying to
access it, which should be preferable, given we already handle that
nicely in our credential consumption code.

This should also be useful on systems where we lack any privs to create
mounts, and thus operate on a regular dir anyway.
2023-05-04 12:10:06 +02:00
Lennart Poettering
21dd1de659 execute: suppress credentials mount if empty
Let's avoid creating another mount in the system if it's empty anyway.

This is mostl a cosmetic thing in one (pretty common) special case: if
creds settings are used in a unit but no creds actually available to be
passed.

(While we are at it this also does one more minor optimization: it
adjusts the MS_RDONLY/MS_NOSUID/… flags of the source mount we are about
to MS_MOVE into the right place only if we actually really move it, and
if we instead unmount it again we won't bother with the flags either)
2023-05-04 12:10:01 +02:00
Lennart Poettering
bcd9b98159 core: change ownership of subcgroup we create recursively, it shall be owned by the user delegated to
If we create a subcroup (regardless if the '.control' subgroup we
always created or one configured via DelegateSubgroup=) it's inside of
the delegated territory of the cgroup tree, hence it should be owned
fully by the unit's users. Hence do so.
2023-04-27 12:18:32 +02:00
Lennart Poettering
18c1e481b6 execute: don't apply journal + oomd xattrs to subcgroup
We don't need to apply the journal/oomd xattrs to the subcgroups we add,
since those daemons already look for the xattrs up the tree anyway.
Hence remove this.

This is in particular relevant as it means later changes to the xattr
don#t need to be replicated on the subcgroup either.
2023-04-27 12:18:32 +02:00
Lennart Poettering
a8b993dc11 core: add DelegateSubgroup= setting
This implements a minimal subset of #24961, but in a lot more
restrictive way: we only allow one level of subcgroup (as that's enough
to address the no-processes in inner cgroups rule), and does not change
anything about threaded cgroup logic or similar, or make any of this new
behaviour mandatory.

All this does is this: all non-control processes we invoke for a unit
we'll invoke in a subgroup by the specified name.

We'll later port all our current services that use cgroup delegation
over to this, i.e. user@.service, systemd-nspawn@.service and
systemd-udevd.service.
2023-04-27 12:18:32 +02:00
Luca Boccassi
6ef721cbc7 user units: implicitly enable PrivateUsers= when sandboxing options are set
Enabling these options when not running as root requires a user
namespace, so implicitly enable PrivateUsers=.
This has a side effect as it changes which users are visible to the unit.
However until now these options did not work at all for user units, and
in practice just a handful of user units in Fedora, Debian and Ubuntu
mistakenly used them (and they have been all fixed since).

This fixes the long-standing confusing issue that the user and system
units take the same options but the behaviour is wildly (and sometimes
silently) different depending on which is which, with user units
requiring manually specifiying PrivateUsers= in order for sandboxing
options to actually work and not be silently ignored.
2023-04-13 21:33:48 +01:00
Lennart Poettering
4fb8f1e883 service: allow freeing the fdstore via cleaning
Now that we have a potentially pinned fdstore let's add a concept for
cleaning it explicitly on user requested. Let's expose this via
"systemctl clean", i.e. the same way as user directories are cleaned.
2023-04-13 06:44:27 +02:00
Lennart Poettering
3af48a86d9 Merge pull request #25608 from poettering/dissect-moar
dissect: add dissection policies
2023-04-12 13:46:08 +02:00
Yu Watanabe
f643ca1767 Merge pull request #27033 from dtardon/array-cleanup
Use CLEANUP_ARRAY more
2023-04-12 16:43:39 +09:00
David Tardon
29933daf9e execute: use CLEANUP_ARRAY 2023-04-11 16:25:07 +02:00
David Tardon
93404d340e execute: use more automatic cleanup 2023-04-11 16:16:33 +02:00
David Tardon
ed8267c727 execute: use CLEANUP_ARRAY 2023-04-11 16:11:14 +02:00
Lennart Poettering
84be0c710d tree-wide: hook up image dissection policy logic everywhere 2023-04-05 20:45:30 +02:00
Lennart Poettering
e43911a78e execute: add one more assert() 2023-04-04 21:29:22 +02:00
Zbigniew Jędrzejewski-Szmek
3ff67ec43a core: unify two similar paths, avoid formatting of unused string
After 'if (DEBUG_LOGGING)' is added, the two call sites are almost identical,
except that we forgot LOG_UNIT_INVOCATION_ID(unit).

I removed the handling of the log_oom(). It's a debug message only after all,
and it's unlikely to fail.
2023-04-04 15:18:00 +02:00
Zbigniew Jędrzejewski-Szmek
4a055e5a3e core: typos in comments 2023-04-04 15:18:00 +02:00
Daan De Meyer
1522077269 core: Move DynamicCreds into ExecRuntime
This is just another piece of runtime data so let's store it in
ExecRuntime alongside the other runtime data.
2023-03-27 14:47:30 +02:00
Daan De Meyer
28135da3cd core: Introduce unit private exec runtime
Currently, exec runtimes can be shared between units (using
JoinsNamespaceOf=). Let's introduce a concept of a private exec
runtime that isn't shared with JoinsNamespaceOf=. The existing
ExecRuntime struct is renamed to ExecRuntimeShared and becomes a
private member of the new private ExecRuntime.
2023-03-27 14:46:57 +02:00
Daan De Meyer
e52a696a9a execute: Do not pass destroy as a boolean argument to unref()
Let's mimick what we do for DynamicUser and have two separate functions
for unreffing and destroying a ExecSharedRuntime object.
2023-03-27 14:32:58 +02:00
Daan De Meyer
e76506b748 execute: Rename ExecRuntime to ExecSharedRuntime
Preparation for next commit
2023-03-27 14:05:30 +02:00
Daan De Meyer
f461a28da7 chase-symlinks: Rename chase_symlinks() to chase()
Chasing symlinks is a core function that's used in a lot of places
so it deservers a less verbose names so let's rename it to chase()
and chaseat().

We also slightly change the pattern used for the chaseat() helpers
so we get chase_and_openat() and similar.
2023-03-24 13:43:51 +01:00
Lennart Poettering
50a4217bbe core: move encrypted credential check to execute.c
This is an operation on an ExecContext, hence it probably should be
placed there.
2023-03-23 18:22:27 +01:00
Daan De Meyer
a7253c7fec Merge pull request #26916 from DaanDeMeyer/log-context-ref
log: Avoid pushing the same fields more than once on the log context
2023-03-22 22:07:45 +01:00
Daan De Meyer
a3b00f91bb core: Settle log target if we're going to be closing all fds
Whenever we're going to close all file descriptors, we tend to close
the log and set it into open when needed mode. When this is done with
the logging target set to LOG_TARGET_AUTO, we run into issues because
for every logging call, we'll check if stderr is connected to the
journal to determine where to send the logging message. This check
obviously stops working when we close stderr, so we settle the log
target before we do that so that we keep using the same logging
target even after stderr is closed.
2023-03-22 13:20:08 +01:00
Daan De Meyer
4d62ee559d execute: Add kernel cmdline arguments for tty term, rows and columns
Let's allow configuring tty term and size using kernel cmdline arguments
so that when running in a VM we can communicate the terminal TERM and size
from the host via SMBIOS extra kernel cmdline arguments.
2023-03-21 20:50:17 +01:00
Daan De Meyer
4b2af439eb unit: Add LOG_CONTEXT_PUSH_UNIT()
A helper macro to push all unit related fields onto the log context.
We also modify exec_spawn() to use it.
2023-03-21 14:59:16 +01:00
Frantisek Sumsal
1da3cb8141 tree-wide: simplify x ? x : y to x ?: y where applicable 2023-03-18 14:23:11 +01:00
Luca Boccassi
d4b6ec980d core: make the memory pressure cgroup path writable when ProtectControlGroups=yes
The interface requires services to write to the cgroup file to activate notifications,
but with ProtectControlGroups=yes we make it read-only. Add a writable bind mount.

Follow-up for 6bb0084204
2023-03-15 09:23:17 +01:00
Lennart Poettering
874cdcbcf5 core: rename "mount_flags" → "mount_propagation_flag" internally where appropriate
ExecContext has a field that controls the mount propagation flag of the
mounts in the resulting namespace. This is exposed as "MountFlags="
which is super confusing, as it suggests one could control more than
propagation, and that it was actually a flags field. It's an enum
though only, and nothing else.

We might want to rename this externally one day, but given the compat
kludges this requires and the fact this is somewhat nichey it might not
be worth it. But internally let's rename it, as it makes things much
easier to grok, in particular as part of the codebase already exposed
the concept as mount_propagation_flag.

No actual code flow changes, just some renaming.
2023-03-14 13:00:27 +09:00
Topi Miettinen
7a114ed4b3 execute: use prctl(PR_SET_MDWE) for MemoryDenyWriteExecute=yes
On some ARM platforms, the dynamic linker could use PROT_BTI memory protection
flag with `mprotect(..., PROT_BTI | PROT_EXEC)` to enable additional memory
protection for executable pages. But `MemoryDenyWriteExecute=yes` blocks this
with seccomp filter denying all `mprotect(..., x | PROT_EXEC)`.

Newly preferred method is to use prctl(PR_SET_MDWE) on supported kernels. Then
in-kernel implementation can allow PROT_BTI as necessary, without weakening
MDWE. In-kernel version may also be extended to more sophisticated protections
in the future.
2023-03-13 18:44:36 +00:00
Daan De Meyer
dcebb015fb execute: Use log_unit_error_errno() instead of log_error_errno() 2023-03-13 12:33:11 +01:00
Lennart Poettering
4870133bfa basic: add RuntimeScope enum
In various tools and services we have a per-system and per-user concept.
So far we sometimes used a boolean indicating whether we are in system
mode, or a reversed boolean indicating whether we are in user mode, or
the LookupScope enum used by the lookup path logic.

Let's address that, in introduce a common enum for this, we can use all
across the board.

This is mostly just search/replace, no actual code changes.
2023-03-10 09:47:39 +01:00
Yu Watanabe
ce16d177dd tree-wide: replace IOVEC_INIT with IOVEC_MAKE
We use gnu11 to build, hence we can use structured initializer with
casting, and it is not necessary to use different patterns on
initialization and assignment.

Addresses https://github.com/systemd/systemd/pull/26560#discussion_r1118875447.
2023-03-06 09:30:28 +09:00
Lennart Poettering
1406bd66e4 tree-wide: error handling modernizations 2023-03-01 22:52:55 +00:00
Lennart Poettering
6bb0084204 pid1: add unit file settings to control memory pressure logic 2023-03-01 09:43:23 +01:00
Lennart Poettering
638fd8ccb8 execute: pass ambient caps from PAM through to invoked service
If a PAM service sets some ambient caps, we should honour that, hence
query it, and merge it with our own ambient settings.

This needs to be done manually since otherwise dropping privs via
setresuid() will undo all such caps, and we need to manually tweak
things to keep them.
2023-02-23 12:53:09 +01:00
Yu Watanabe
c2da3bf237 core/namespace: mount new sysfs when new network namespace is requested
Even when a mount namespace is created, previously host's sysfs is used,
especially with RootDirectory= or RootImage=, thus service processes can
still access the properties of the network interfaces in the main network
namespace through sysfs.

This makes, sysfs is remounted with the new network namespace tag, except
when PrivateMounts= is explicitly disabled. Hence, the properties of the
network interfaces in the main network namespace cannot be accessed by
service processes through sysfs.

Fixes #26422.
2023-02-23 15:09:13 +09:00
Yu Watanabe
2400212128 core/execute: make PrivateMounts= tristate
No functional change, just preparation for later commits.
2023-02-23 15:09:13 +09:00
Yu Watanabe
fde36d2581 core/execute: introduce exec_needs_ipc_namespace() helper function
This also fixes a missing condition in exec_runtime_make().
2023-02-23 15:09:13 +09:00
Yu Watanabe
fbbb9697b6 core/execute: introduce exec_needs_network_namespace() helper function 2023-02-23 15:09:13 +09:00
Yu Watanabe
06b3a2f6f0 core/namespace: drop unused field in NamespaceInfo 2023-02-23 15:09:13 +09:00
Lennart Poettering
a954b2492e execute: modernizations 2023-02-23 10:11:09 +09:00
Lennart Poettering
3fd5190b5e capability-util: add CAP_MASK_ALL + CAP_MASK_UNSET macros
We should be more careful with distinguishing the cases "all bits set in
caps mask" from "cap mask invalid". We so far mostly used UINT64_MAX for
both, which is not correct though (as it would mean
AmbientCapabilities=~0 followed by AmbientCapabilities=0) would result
in capability 63 to be set (which we don't really allow, since that
means unset).
2023-02-20 16:49:45 +01:00
Lennart Poettering
8142d73574 cap-list: rename capability_set_to_string_alloc() → capability_set_to_string()
We typically don't use the _alloc() suffix anymore for anything, hence
drop it here too.
2023-02-20 16:13:49 +01:00
Lennart Poettering
2264a20d91 execute: drop spurious empty line 2023-02-16 11:48:18 +01:00
Yu Watanabe
f0353cf2e9 core/execute: fix comment 2023-02-15 10:10:13 +09:00
Luca Boccassi
71c6f0ac52 Merge pull request #23309 from DaanDeMeyer/log-context
basic: Add log context
2023-01-20 15:01:03 +00:00
Yu Watanabe
d09df6b94e tree-wide: fix typo 2023-01-20 15:32:16 +09:00