man/systemd.exec: reword description of SystemCallFilter=

The existing text grew organically as features were added and was
not very organized. Reorder it and break into paragraphs grouped
by topic. The description of the :errno syntax is replaced by a short
reference to the SystemCallErrorNumber= setting. This makes the
text shorter and makes it easier to explain how the two settings combine.
This commit is contained in:
Zbigniew Jędrzejewski-Szmek
2025-05-06 21:04:00 +02:00
parent e4a08721c3
commit 802d23fcfb

View File

@@ -2589,40 +2589,42 @@ RestrictNamespaces=~cgroup net</programlisting>
<varlistentry>
<term><varname>SystemCallFilter=</varname></term>
<listitem><para>Takes a space-separated list of system call names. If this setting is used, all
system calls executed by the unit processes except for the listed ones will result in immediate
process termination with the <constant>SIGSYS</constant> signal (allow-listing). (See
<varname>SystemCallErrorNumber=</varname> below for changing the default action). If the first
character of the list is <literal>~</literal>, the effect is inverted: only the listed system calls
will result in immediate process termination (deny-listing). Deny-listed system calls and system call
groups may optionally be suffixed with a colon (<literal>:</literal>) and <literal>errno</literal>
error number (between 0 and 4095) or errno name such as <constant>EPERM</constant>,
<constant>EACCES</constant> or <constant>EUCLEAN</constant> (see <citerefentry
project='man-pages'><refentrytitle>errno</refentrytitle><manvolnum>3</manvolnum></citerefentry> for a
full list). This value will be returned when a deny-listed system call is triggered, instead of
terminating the processes immediately. Special setting <literal>kill</literal> can be used to
explicitly specify killing. This value takes precedence over the one given in
<varname>SystemCallErrorNumber=</varname>, see below. This feature makes use of the Secure Computing Mode 2
interfaces of the kernel ('seccomp filtering') and is useful for enforcing a minimal sandboxing environment.
Note that the <function>execve()</function>, <function>exit()</function>, <function>exit_group()</function>,
<function>getrlimit()</function>, <function>rt_sigreturn()</function>, <function>sigreturn()</function>
system calls and the system calls for querying time and sleeping are implicitly allow-listed and do not
need to be listed explicitly. This option may be specified more than once, in which case the filter masks are
<listitem><para>Takes a space-separated list of system call names or system call groups. If this
setting is used, system calls executed by the unit processes except for the listed ones will result
in the system call being denied (allow-listing). If the first character of the list is
<literal>~</literal>, the effect is inverted: only the listed system calls will be denied
(deny-listing). This option may be specified more than once, in which case the filter masks are
merged. If the empty string is assigned, the filter is reset, all prior assignments will have no
effect. This does not affect commands prefixed with <literal>+</literal>.</para>
effect.</para>
<para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off
alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
<para>Commands prefixed with <literal>+</literal> are not subject to filtering. The
<function>execve()</function>, <function>exit()</function>, <function>exit_group()</function>,
<function>getrlimit()</function>, <function>rt_sigreturn()</function>,
<function>sigreturn()</function> system calls and the system calls for querying time and sleeping are
implicitly allow-listed and do not need to be listed explicitly.</para>
<para>The default action when a system call is denied is to terminate the processes with a
<constant>SIGSYS</constant> signal. This can changed using <varname>SystemCallErrorNumber=</varname>,
see below. In addition, deny-listed system calls and system call groups may optionally be suffixed
with a colon (<literal>:</literal>) and an argument in the same format as
<varname>SystemCallErrorNumber=</varname>, to take this action when the matching system call is made.
This takes precedence over the action specified in <varname>SystemCallErrorNumber=</varname>.</para>
<para>This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp
filtering') and is useful for enforcing a minimal sandboxing environment.</para>
<para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn
off alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
option. Specifically, it is recommended to combine this option with
<varname>SystemCallArchitectures=native</varname> or similar.</para>
<para>Note that strict system call filters may impact execution and error handling code paths of the service
invocation. Specifically, access to the <function>execve()</function> system call is required for the execution
of the service binary — if it is blocked service invocation will necessarily fail. Also, if execution of the
service binary fails for some reason (for example: missing service executable), the error handling logic might
require access to an additional set of system calls in order to process and log this failure correctly. It
might be necessary to temporarily disable system call filters in order to simplify debugging of such
failures.</para>
<para>Note that strict system call filters may impact execution and error handling code paths of the
service invocation. Specifically, access to the <function>execve()</function> system call is required
for the execution of the service binary — if it is blocked service invocation will necessarily fail.
Also, if execution of the service binary fails for some reason (for example: missing service
executable), the error handling logic might require access to an additional set of system calls in
order to process and log this failure correctly. It might be necessary to temporarily disable system
call filters in order to allow debugging of such failures.</para>
<para>If you specify both types of this option (i.e. allow-listing and deny-listing), the first
encountered will take precedence and will dictate the default action (termination or approval of a
@@ -2632,8 +2634,8 @@ RestrictNamespaces=~cgroup net</programlisting>
<function>write()</function>, and right after it add a deny list rule for <function>write()</function>,
then <function>write()</function> will be removed from the set.)</para>
<para>As the number of possible system calls is large, predefined sets of system calls are provided. A set
starts with <literal>@</literal> character, followed by name of the set.
<para>As the number of possible system calls is large, predefined groups of system calls are
provided. A group starts with <literal>@</literal> character, followed by name of the set.
<table>
<title>Currently predefined system call sets</title>