Commit Graph

55655 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek
85830b0d62 ukify: fix version detection for aarch64 zboot kernels with gzip or lzma compression
Fixes https://github.com/systemd/systemd/issues/34780. The number in the header
is the size of the *compressed* data, so for gzip we'd read the initial part of
the decompressed data (equal to the size of the compressed data) and not find
the version string. Later on, Fedora switched to zstd compression, and there we
correctly use the number as the size of the compressed data, so we stopped
hitting the issue, but we should still fix it for older kernels.

I verified that the fix works for gzip-compressed kernels. I also made the same
change for the code for lzma compression. I'm pretty sure it is the right thing,
even though I don't have such a kernel at hand to test.

>>> ukify.Uname.scrape('/lib/modules/6.12.0-0.rc2.24.fc42.aarch64/vmlinuz')
Real-Mode Kernel Header magic not found
+ readelf --notes /lib/modules/6.12.0-0.rc2.24.fc42.aarch64/vmlinuz
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
Found uname version: 6.12.0-0.rc2.24.fc42.aarch64
2025-07-10 13:37:07 +09:00
Lennart Poettering
03b4a607f6 core: followups for the recent subgroup killing commits
This is a follow-up for 0f23564ad4 and
6b02854f50, as suggested here:

https://github.com/systemd/systemd/pull/37855#pullrequestreview-2997596953
2025-07-10 13:32:51 +09:00
Antonio Alvarez Feijoo
dee77ac201 generate-bpf-delegate-configs: fix compatibility with Python 3.7
- Operator ":=" requires Python 3.8 or newer.
- list[str] requires Python 3.9 or newer.

Follow-up for ea9826eb94
2025-07-10 13:30:44 +09:00
Yu Watanabe
1cf5b39d64 core: add 'DefaultRestrictSUIDSGID' config option (#38126)
closes #37602, see there for extra motivation and considered
alternatives.

On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a
default to globally restrict creation of suid/sgid files makes it easier
to apply this restriction precisely.

## testing done
- built on aarch64-linux and x86_64-linux
- ran a VM test on x86_64-linux, checking for:
    - VM system boots successfully
    - defaults apply (both `yes`, `no`, and undefined)
    - systemd tmpfiles can set suid/sgid on journal log path
- Other services explicitly defining `RestrictSUIDSGID=no` can create
suid files
2025-07-10 13:30:07 +09:00
Matteo Croce
6b099b8369 man/systemd.exec: use constant instead of literal
Use <constant> instead of <literal> otherwise every configuration item
is wrapped in double quotes.
2025-07-10 01:26:46 +02:00
Grimmauld
aa668230c9 core/varlink-manager: Support 'DefaultRestrictSUIDSGID' option 2025-07-09 21:45:41 +02:00
Grimmauld
97998d1cbe core/dbus-manager: Support 'DefaultRestrictSUIDSGID' option 2025-07-09 21:45:38 +02:00
Lennart Poettering
726183627b cgroup: handle ENODEV on cg_read_pid() gracefully
The recently added test case TEST-07-PID1.subgroup-kill.sh surfaced a
race: if we enumerate PIDs in a cgroup, and the cgroup is unlinked at
the very same time reading will result in ENODEV. We need to handle that
gracefully. Hence let's do so.

Noticed while looking at:

https://github.com/systemd/systemd/actions/runs/16143084441/job/45554929264?pr=38120
2025-07-09 20:45:59 +02:00
Grimmauld
30bbdf0771 core: add 'DefaultRestrictSUIDSGID' config option
closes #37602

On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a default
to globally restrict creation of suid/sgid files makes it easier to apply
this restriction precisely.
2025-07-09 11:08:34 +02:00
Mike Yuan
56c6d90f8c mount-util: teach open_tree_attr_fallback() our usual AT_EMPTY_PATH trick
While at it, rename it to _with_fallback following
the naming scheme we use elsewhere.
2025-07-09 10:14:00 +02:00
Mike Yuan
2b4999acb4 mount-util: regroup functions 2025-07-09 10:14:00 +02:00
Mike Yuan
ba010e14f2 recurse-dir: switch to FOREACH_ARRAY 2025-07-09 10:13:59 +02:00
Mike Yuan
8d4b2689ca recurse-dir: use -EBADF as placeholder for invalid fd
As per our coding style.
2025-07-09 10:13:59 +02:00
Matteo Croce
ea9826eb94 core: add options to delegate BPFFS token creation
Add four new options BPFDelegate{Commands,Maps,Programs,Attachments}=
in order to delegate to a BPFFS instance the permission to create tokens.

The value is a list of options taken from:
https://github.com/torvalds/linux/blob/v6.14/include/uapi/linux/bpf.h#L922-L1121
The special value "any" means to allow every possible values.

More informations about BPF tokens here:
https://lwn.net/Articles/947173/
2025-07-08 22:35:29 +02:00
Matteo Croce
3a47437fc9 core: Introduce PrivateBPF= to mount a private BPFFS
Add a new option PrivateBPF= to mount a new instance of bpffs within a
namespace.
PrivateBPF= can be set to "no" to use the host bpffs in readonly mode
and "yes" to do a new mount.
The mount is done with the new fsopen()/fsmount() API because in future
we'll hook some commands between the two calls.
2025-07-08 22:33:28 +02:00
Matteo Croce
2c7dabff50 core: split out setup_private_users_child()
Drop support for kernels older than 3.19, as this is where
/proc/<pid>/setgroups was added.

9cc46516dd
2025-07-08 18:23:46 +02:00
Matteo Croce
a80c06cf02 nspawn: create mountpoint for bpffs
When we mount a tmpfs as /sys, create a mountpoint for bpf, as we
already do for cgroup
2025-07-08 18:23:46 +02:00
Yu Watanabe
48e0f7bc2f core: fix owner check of PIDFile=, and update document (#38115)
Closes #38108.
2025-07-08 23:58:19 +09:00
Zbigniew Jędrzejewski-Szmek
048a94c8f6 basic/stdio-util: use a fixed message in xsprintf
We put the name of the variable in the message, but it is a local variable
and the name does not have global meaning. We end up with pointless copies
of the error string:

$ strings build/libsystemd.so.0.40.0 | grep 'big enough'
xsprintf: p[] must be big enough
xsprintf: error[] must be big enough
xsprintf: prefix[] must be big enough
xsprintf: pty[] must be big enough
xsprintf: mode[] must be big enough
xsprintf: t[] must be big enough
xsprintf: s[] must be big enough
xsprintf: spid[] must be big enough
xsprintf: header_priority[] must be big enough
xsprintf: header_pid[] must be big enough
xsprintf: path[] must be big enough
xsprintf: buf[] must be big enough

The error message already shows the file, line, and function name, which
is enough to identify the problem:

  Assertion 'xsprintf: buffer too small' failed at src/test/test-string-util.c:20, function test_xsprintf(). Aborting.
2025-07-08 13:02:37 +02:00
Zbigniew Jędrzejewski-Szmek
1e99c4e2be test-string-util: add a small test for xsprintf 2025-07-08 13:02:37 +02:00
Zbigniew Jędrzejewski-Szmek
c179466616 Merge shared/exec-directory-util.? into basic/unit-def.?
Suggested in
https://github.com/systemd/systemd/pull/35892#discussion_r2180322856.

This is a tiny amount of code and does not warrant having a separate file
and spawning a separate instance of the compiler during the build.

Note: it took me a while to confirm that the contents of that table and
function don't end up in libsystemd.so. The issue is that they _are_ present in
it, unless LTO is used. We actually use link_whole[libbasic_static] for
libsystemd, so we end up with all that code there. LTO is needed to clean
that up.
2025-07-08 12:57:33 +02:00
Yu Watanabe
7e26912677 core: allow to use PIDFile= in user session services
Fixes #38108.

Co-authored-by: 铝箔 <38349409+Sodium-Aluminate@users.noreply.github.com>
2025-07-08 18:02:34 +09:00
Zbigniew Jędrzejewski-Szmek
f283459b9f shared/open-file: add line break
We don't generally parenthesize additions, so drop that too.
2025-07-08 10:22:59 +02:00
Zbigniew Jędrzejewski-Szmek
d9a460b2b6 Adjust bitfields in struct Condition
As is usually the case, the bitfields don't create the expected space savings,
because the field that follows needs to be aligned. But we don't want to fully
drop the bitfields here, because then ConditionType and ConditionResult are
each 4 bytes, and the whole struct grows from 32 to 40 bytes (on amd64). We
potentially have lots of little Conditions and that'd waste some memory.

Make each of the four fields one byte. This still allows the compiler to
generate simpler code without changing the struct size:

E.g. in condition_test:
                 c->result = CONDITION_ERROR;
-   78fab:      48 8b 45 e8             mov    -0x18(%rbp),%rax
-   78faf:      0f b6 50 01             movzbl 0x1(%rax),%edx
-   78fb3:      83 e2 03                and    $0x3,%edx
-   78fb6:      83 ca 0c                or     $0xc,%edx
-   78fb9:      88 50 01                mov    %dl,0x1(%rax)
+   78f8b:      48 8b 45 e8             mov    -0x18(%rbp),%rax
+   78f8f:      c6 40 03 03             movb   $0x3,0x3(%rax)
2025-07-08 10:22:59 +02:00
Yu Watanabe
5cc21b78b6 minor fixes to nspawn, machined, vmspawn (#38110)
Nothing earth shattering. Just clean-ups.
2025-07-08 15:54:49 +09:00
Lennart Poettering
18eafedb1a nspawn: Support idmapped mounts on homed managed home directories (#38069)
Christian made this possible in Linux 6.15 with a new system call
open_tree_attr() that combines open_tree() and mount_setattr(). Because
idmapped mounts are (rightfully) not nested, we have to do some extra
shenanigans to make source we're putting the right source uid in the
userns for any idmapped mounts that we do in nspawn.

Of course we also add the necessary boilerplate to make open_tree_attr()
available in our code and wrap open_tree_attr() and the corresponding
fallback in a new function which we then use everywhere else.
2025-07-08 06:51:41 +02:00
Lennart Poettering
5279acb58d vmspawn: tighten parser of EXIT_STATUS=
The EXIT_STATUS is supposed to encapuslate an ANSI C process exit
status, which is 8bit unsigned. Hence parse it as such, do not accept
negative values, or values > 255.
2025-07-08 06:43:17 +02:00
Lennart Poettering
ba4624ff6c nspawn: fix parser of --notify-ready=
This switch takes a bool only, not an enum, hence don't claim otherwise
in the error log message.
2025-07-08 06:42:14 +02:00
Lennart Poettering
3779bdd5a3 nspawn: add argument comments to various calls 2025-07-08 06:42:04 +02:00
Lennart Poettering
93555abe29 nspawn: don't use strjoina() for user controlled strings 2025-07-08 06:40:46 +02:00
Lennart Poettering
a13fda9e67 machinectl: fix status output indentation
All other status output lines use tabs, use that for the ID shift line
too. otherwise output will appear unaligned if log viewers have fixed
tab stop positions.
2025-07-08 06:40:35 +02:00
Lennart Poettering
0d8f8be2fd add api to kill subcgroups of units (#38102) 2025-07-08 06:33:32 +02:00
Lennart Poettering
a5ddad2795 tree-wide: switch a bunch of sd_bus_error_setf() to sd_bus_error_set() 2025-07-08 06:00:33 +02:00
Lennart Poettering
6b02854f50 systemctl: add --kill-subgroup= switch for killing subcgroup 2025-07-08 03:14:53 +02:00
Lennart Poettering
0f23564ad4 pid1: add ability to kill processes in a subgroup of a unit
This is useful for things like machined, where the system machined wants
to manage a machine owned by the user somewhere down the tree.
2025-07-08 03:14:53 +02:00
Lennart Poettering
9afe65d974 pid1: properly report if we managed to kill a process by cgroup 2025-07-08 02:32:42 +02:00
Yu Watanabe
3ef791876b core: add quota support for State, Cache, and Log exec directories (#35892)
Based on https://github.com/systemd/systemd/issues/7820, this adds support for
quota enforcement to State, Cache, and Log exec directories.
* Add new directives, StateDirectoryQuota=, CacheDirectoryQuota=, and
  LogDirectoryQuota=, to define quotas as percentages (hard limits for
  blocks and inodes) or absolute values (hard limits for blocks only).
* Add new directives, StateDirectoryQuotaAccounting=,
  CacheDirectoryQuotaAccounting= and LogDirectoryQuotaAccounting= to keep
  track of storage quotas but not enforce them (effectively just assigning
  a project ID to defined exec directories).

Example:
```
StateDirectory=quotadir
StateDirectoryQuota=1%

Jan 06 22:55:46 abeltran: Storage quotas set for /var/lib/private/quotadir. Block limit = 2639404, inode limit = 671088

root@abeltran:/var/lib/private# lsattr -pR
3153000189 --------------e----P-- ./quotadir

root@abeltran:/var/lib/private# repquota  -P /datadrive
*** Report for project quotas on device /dev/sdc1
Block grace time: 7days; Inode grace time: 7days
                        Block limits                File limits
Project         used    soft    hard  grace    used  soft  hard  grace
----------------------------------------------------------------------
#0        --  213200       0       0           4086     0     0         
#3153000189 -- 2639404       0 2639404              2     0 671088   
```
2025-07-08 09:18:20 +09:00
Yu Watanabe
ef6b6f31c7 bootspec: fix string table naming for BootEntryType/BootEntrySource (#38106)
This was all very confusing and not matching our coding style
recommendations. Let's fix that.

Prompted by #37897, which really should make use of BootEntryType, but
we better clean it up first.
2025-07-08 09:11:30 +09:00
Andres Beltran
e8e274c8da Add quota support for systemctl 2025-07-07 17:31:05 +00:00
Andres Beltran
a89afe1948 Add quota support for DBus 2025-07-07 17:31:03 +00:00
Andres Beltran
26c6f3271a core: add quota support for State, Cache, and Log exec directories 2025-07-07 17:28:47 +00:00
Andres Beltran
744086b58d shared: add exec-directory-util.ch 2025-07-07 17:28:47 +00:00
Andres Beltran
81e6b3685a quota-util: add methods to read and set project IDs 2025-07-07 17:28:47 +00:00
Andres Beltran
652ba6e0dc chattr-util: add helpers to read and set project IDs 2025-07-07 17:28:47 +00:00
Lennart Poettering
1e7ba4780d bootspec: boot_entry_source_to_json_string() to boot_entry_source_to_string()
As with the previous changes for BootEntryType, let's also clean up the
naming for BootEntrySource.
2025-07-07 18:26:59 +02:00
Lennart Poettering
2030922e2d bootspec: rename boot_entry_source_to_string() to boot_entry_source_description_to_string()
Similar to the previous changes, let's make clear this string table
contains *descriptive*, i.e. meaningful human-readable strings.
2025-07-07 18:25:22 +02:00
Lennart Poettering
9880c7f103 bootspec: rename BootEntryType values
So we exposed different names for the entry types in JSON than we named
our enum values. Which is very confusing. Let's unify that. Given that
the JSON fields are externally visible let's stick to that naming, even
though I think "unified" and "conf" would have been more descriptive.

This ensures we follow our usual logic that the enum identifiers and the
strings they map to use the same naming.
2025-07-07 18:23:59 +02:00
Lennart Poettering
a1c7aa6a95 bootspec: include 'UKI' in descriptive name for type #2
I am pretty sure that "UKI" is the best known name for type #2 boot
loader spec entries, hence we really should put it in the name.
2025-07-07 18:13:06 +02:00
Lennart Poettering
199989e168 bootspec: rename boot_entry_type_to_string() to boot_entry_type_description_to_string()
This helper does not translate BootEntryType to a string matching the
enum's value names, but instead returns a human readable descriptive
string. Let's make it clearer what this, by including "description" in
the name.
2025-07-07 18:13:06 +02:00
Mike Yuan
f273212797 core/cgroup: unit_realize_cgroup_now_disable() is NOP for non-slice units 2025-07-07 17:55:14 +02:00