Commit Graph

209 Commits

Author SHA1 Message Date
Yu Watanabe
594c383554 cgroup-util: use string_hash_ops_free 2021-09-11 20:29:34 +09:00
Yu Watanabe
dccdbf9b35 cgroup-util: use _cleanup_free_ attribute 2021-09-11 20:26:58 +09:00
Mauricio Vásquez
6f50d4f7d6 core: implement RestrictNetworkInterfaces=
This commit introduces all the logic to load and attach the BPF
programs to restrict network interfaces when a unit specifying it is
loaded.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
2021-08-18 15:55:53 -05:00
Albert Brox
8a513eee30 pid1: add support for cgroup.kill 2021-08-09 12:14:26 +02:00
Yu Watanabe
4ff361cc86 tree-wide: always drop unnecessary dot in path 2021-05-28 13:44:38 +09:00
Lennart Poettering
319a4f4bc4 alloc-util: simplify GREEDY_REALLOC() logic by relying on malloc_usable_size()
We recently started making more use of malloc_usable_size() and rely on
it (see the string_erase() story). Given that we don't really support
sytems where malloc_usable_size() cannot be trusted beyond statistics
anyway, let's go fully in and rework GREEDY_REALLOC() on top of it:
instead of passing around and maintaining the currenly allocated size
everywhere, let's just derive it automatically from
malloc_usable_size().

I am mostly after this for the simplicity this brings. It also brings
minor efficiency improvements I guess, but things become so much nicer
to look at if we can avoid these allocation size variables everywhere.

Note that the malloc_usable_size() man page says relying on it wasn't
"good programming practice", but I think it does this for reasons that
don't apply here: the greedy realloc logic specifically doesn't rely on
the returned extra size, beyond the fact that it is equal or larger than
what was requested.

(This commit was supposed to be a quick patch btw, but apparently we use
the greedy realloc stuff quite a bit across the codebase, so this ends
up touching *a*lot* of code.)
2021-05-19 16:42:37 +02:00
Julia Kartseva
a8e5eb1788 core: add socket-bind cgroup mask harness
Standard cgroup harness for bpf feature.
2021-04-26 16:21:59 -07:00
Julia Kartseva
506ea51b48 core: add bpf-foreign cgroup mask and harness
Add CGROUP_MASK_BPF_FOREIGN to CGROUP_MASK_BPF and standard cgroup
context harness.
2021-04-09 20:28:47 -07:00
Zbigniew Jędrzejewski-Szmek
7756528e9b tree-wide: use the same comment for work-around initializations
This should make it easier to remove those warnings when the compiler
gets smarter. Not sure if I got them all...

Double space before the comment start to make it easier to separate from the
preceding line.
2021-04-07 16:04:22 +02:00
Zbigniew Jędrzejewski-Szmek
bc20c31bbc basic/cgroup-util: silence gcc warning about unitialized variable 2021-03-31 18:24:53 +02:00
Lennart Poettering
627055ce9a tree-wide: use read_full_virtual_file() where appropriate
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
2021-03-17 18:43:42 +01:00
Mike Gilbert
2156061fb3 cg_unified_cached: return ENOMEDIUM if we cannot find a known hierarchy
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.

Bug: https://bugs.gentoo.org/771819
2021-03-17 15:42:22 +01:00
Zbigniew Jędrzejewski-Szmek
e4645ca599 basic/group-util: optimize alloca use
Follow-up for 0fa7b50053.
2021-03-11 14:43:16 +01:00
Zbigniew Jędrzejewski-Szmek
749c4c8ed1 Merge pull request #18553 from Werkov/cgroup-user-instance-controllers
Make (user) instance aware of delegated cgroup controllers
2021-03-10 09:41:40 +01:00
Lennart Poettering
8ca94009f8 basic: tighten two filename length checks
This fixes two checks where we compare string sizes when validating with
FILENAME_MAX. In both cases the check apparently wants to check if the
name fits in a filename, but that's not actually what FILENAME_MAX can
be used for, as it — in contrast to what the name suggests — actually
encodes the maximum length of a path.

In both cases the stricter change doesn't actually change much, but the
use of FILENAME_MAX is still misleading and typically wrong.
2021-03-08 22:47:14 +01:00
Yu Watanabe
f5fbe71d95 tree-wide: use UINT64_MAX or friends 2021-03-05 07:10:13 +09:00
Zbigniew Jędrzejewski-Szmek
b3c57df0f5 Merge pull request #18401 from anitazha/oomdxattr
oomd: implement avoid/omit support for cgroups
2021-02-13 10:00:31 +01:00
Michal Koutný
0fa7b50053 core: Make (user) instance aware of delegated cgroup controllers
systemd user instance assumed same controllers are available to it as to
PID 1. That is not true generally, in v1 (legacy, hybrid) we don't delegate any
controllers to anyone and in v2 (unified) we may delegate only subset of
controllers.
The user instance would fail silently when the controller cgroup cannot
be created or the controller cannot be enabled on the unified hierarchy.

The changes in 7b63961415 ("cgroup: Swap cgroup v1 deletion and
migration") caused some attempts of operating on non-delegated
controllers to be logged.

Make the user instance first check what controllers are availble to it
and narrow operations only to these controllers. The original checks are
kept in place.

Note that daemon-reexec needs to be invoked in order to update the set
of unabled controllers after a change.

Fixes: #18047
Fixes: #17862
2021-02-11 16:58:34 +01:00
Michal Koutný
81504017f4 cgroup: Simplify cg_get_path_and_check
The function controller_is_accessible() doesn't do really much in case
of the unified hierarchy. Move common parts into cg_get_path_and_check
and make controller check v1 specific. This is refactoring only.
2021-02-11 11:51:59 +01:00
Anita Zhang
59331b8e29 oom: implement avoid/omit xattr support
There may be situations where a cgroup should be protected from killing
or deprioritized as a candidate. In FB oomd xattrs are used to bias oomd
away from supervisor cgroups and towards worker cgroups in container
tasks. On desktops this can be used to protect important units with
unpredictable resource consumption.

The patch allows systemd-oomd to understand 2 xattrs:
"user.oomd_avoid" and "user.oomd_omit". If systemd-oomd sees these
xattrs set to 1 on a candidate cgroup (i.e. while attempting to kill something)
AND the cgroup is owned by root, it will either deprioritize the cgroup as
a candidate (avoid) or remove it completely as a candidate (omit).

Usage is restricted to root owned cgroups to prevent situations where an
unprivileged user can set their own cgroups lower in the kill priority than
another user's (and prevent them from omitting their units from
systemd-oomd killing).
2021-02-09 02:27:40 -08:00
Anita Zhang
242d75bdaa cgroup-util: add ManagedOOMPreference enum to use between pid1 and oomd 2021-02-09 02:27:37 -08:00
Yu Watanabe
8087644a13 tree-wide: replace strverscmp() and str_verscmp() with strverscmp_improved() 2021-02-09 14:25:03 +09:00
Susant Sahani
fe96c0f86d treewide: tighten variable scope in loops (#18372)
Also use _cleanup_free_ in one more place.
2021-01-27 08:19:39 +01:00
Lennart Poettering
c2bc710b24 string-util: imply NULL termination of strextend() argument list
The trailing NULL in the argument list is now implied (similar to
what we already have in place in strjoin()).
2021-01-06 17:24:46 +01:00
Yu Watanabe
d51c4fca29 tree-wide: fix "a the" or "the a" 2020-11-13 16:28:47 +09:00
Yu Watanabe
db9ecf0501 license: LGPL-2.1+ -> LGPL-2.1-or-later 2020-11-09 13:23:58 +09:00
Anita Zhang
b41dcc51eb cgroup-util: add cg_get_attribute_as_bool() helper 2020-10-07 17:12:24 -07:00
Anita Zhang
4d824a4e0b core: add ManagedOOM*= properties to configure systemd-oomd on the unit
This adds the hook ups so it can be read with the usual systemd
utilities. Used in later commits by sytemd-oomd.
2020-10-07 16:17:23 -07:00
Zbigniew Jędrzejewski-Szmek
ae7ef63f21 basic/cgroup-util: port over to string_contains_word() 2020-09-09 09:34:54 +02:00
Michal Sekletár
d9e45bc3ab core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
2020-04-30 19:02:51 +02:00
Michal Sekletár
25a1f04c68 basic/cgroup-util: introduce cg_get_keyed_attribute_full()
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
2020-04-29 18:41:19 +02:00
Dan Streetman
0bc5f001db cgroup-util: check for SYSFS_MAGIC when detecting cgroup format
When nothing at all is mounted at /sys/fs/cgroup, the fs.f_type is
SYSFS_MAGIC (0x62656572) which results in the confusing debug log:

"Unknown filesystem type 62656572 mounted on /sys/fs/cgroup."

Instead, if the f_type is SYSFS_MAGIC, a more accurate message is:

"No filesystem is currently mounted on /sys/fs/cgroup."
2020-04-25 10:00:43 +02:00
Anita Zhang
baa358df32 cgroup-util: cg_get_xattr_malloc helper
`cg_get_xattr_malloc` to read a cgroup xattr value and allocate space
to hold said value (simple helper combining existing functions).
2020-03-24 16:06:32 -07:00
Anita Zhang
613328c3e2 cgroup-util: helper to cg_get_attribute and convert to uint64_t
A common pattern in the codebase is reading a cgroup memory value
and converting it to a uint64_t. Let's make it a helper and refactor a
few places to use it so it's more concise.
2020-03-24 16:05:16 -07:00
Zbigniew Jędrzejewski-Szmek
2a8020fe9d basic/cgroup-util: modernize cg_split_spec()
Those cryptic one letter variable names, yuck!
2020-03-10 10:50:34 +01:00
Lennart Poettering
bf25f1657f cgroup-util: add new cg_remove_xattr() for removing xattr from cgroup 2019-11-20 17:50:12 +01:00
Yu Watanabe
e30e8b5073 tree-wide: drop stat.h or statfs.h when stat-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe
996f7e1cd0 tree-wide: drop dirent.h when dirent-util.h is included 2019-11-04 00:30:32 +09:00
Yu Watanabe
455fa9610c tree-wide: drop string.h when string-util.h or friends are included 2019-11-04 00:30:32 +09:00
Yu Watanabe
f5947a5e92 tree-wide: drop missing.h 2019-10-31 17:57:03 +09:00
Zbigniew Jędrzejewski-Szmek
86e94d95d0 Merge pull request #13246 from keszybz/add-SystemdOptions-efi-variable
Add efi variable to augment /proc/cmdline
2019-10-03 12:19:44 +02:00
Pavel Hrdina
1fbbb526ee cgroup-util: fix obsolete comment about supported controllers
The list might grow so make the comment more generic to not worry about
it if some controller is implemented.
2019-09-24 15:16:11 +02:00
Pavel Hrdina
047f5d63d7 cgroup: introduce support for cgroup v2 CPUSET controller
Introduce support for configuring cpus and mems for processes using
cgroup v2 CPUSET controller.  This allows users to limit which cpus
and memory NUMA nodes can be used by processes to better utilize
system resources.

The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems
where the requested configuration is written.  However, it doesn't mean
that the requested configuration will be actually used as parent cgroup
may limit the cpus or mems as well.  In order to reflect the real
configuration cgroup v2 provides read-only files cpuset.cpus.effective
and cpuset.mems.effective which are exported to users as well.
2019-09-24 15:16:07 +02:00
Frantisek Sumsal
38288f0bb8 tree-wide: various code-formatting improvements
Reported/found by Coccinelle
2019-09-22 07:17:27 +02:00
Zbigniew Jędrzejewski-Szmek
fdb3decaa7 util-lib: move some functions from basic/cgroup-util to shared/cgroup-setup
This way less stuff needs to be in basic. Initially, I wanted to move all the
parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because
efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the
move done in this patch is not necessary anymore. Nevertheless, it is still
valid on its own. If at some point we want to expose libbasic, it is better to
to not have stuff that belong in libshared there.
2019-09-16 18:08:00 +02:00
Zbigniew Jędrzejewski-Szmek
d4d99bc6e4 basic/cgroup-util: let cgroup_unified_flush() return the detected hierarchy
This avoid the use of the global variable.

Also rename cgroup_unified_update() to cgroup_unified_cached() and
cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
2019-09-16 18:06:20 +02:00
Topi Miettinen
cda5ccdb34 cgroup-util: update comment to reflect stable kernel fixes 2019-08-19 09:46:50 +02:00
Zbigniew Jędrzejewski-Szmek
95b21cff0e Apply empty_to_root() in three more spots for safety 2019-07-15 18:39:26 +02:00
Lennart Poettering
bf21be1050 util-lib: fix comment
As suggested by @ralt.

Fixes: #12896
2019-07-12 09:37:49 +02:00
Lennart Poettering
b910cc72c0 tree-wide: get rid of strappend()
It's a special case of strjoin(), so no need to keep both. In particular
as typing strjoin() is even shoert than strappend().
2019-07-12 14:31:12 +09:00