mirror of
https://github.com/morgan9e/systemd
synced 2026-04-14 00:14:32 +09:00
man/systemd.exec: reword description of SystemCallFilter=
The existing text grew organically as features were added and was not very organized. Reorder it and break into paragraphs grouped by topic. The description of the :errno syntax is replaced by a short reference to the SystemCallErrorNumber= setting. This makes the text shorter and makes it easier to explain how the two settings combine.
This commit is contained in:
@@ -2589,40 +2589,42 @@ RestrictNamespaces=~cgroup net</programlisting>
|
||||
<varlistentry>
|
||||
<term><varname>SystemCallFilter=</varname></term>
|
||||
|
||||
<listitem><para>Takes a space-separated list of system call names. If this setting is used, all
|
||||
system calls executed by the unit processes except for the listed ones will result in immediate
|
||||
process termination with the <constant>SIGSYS</constant> signal (allow-listing). (See
|
||||
<varname>SystemCallErrorNumber=</varname> below for changing the default action). If the first
|
||||
character of the list is <literal>~</literal>, the effect is inverted: only the listed system calls
|
||||
will result in immediate process termination (deny-listing). Deny-listed system calls and system call
|
||||
groups may optionally be suffixed with a colon (<literal>:</literal>) and <literal>errno</literal>
|
||||
error number (between 0 and 4095) or errno name such as <constant>EPERM</constant>,
|
||||
<constant>EACCES</constant> or <constant>EUCLEAN</constant> (see <citerefentry
|
||||
project='man-pages'><refentrytitle>errno</refentrytitle><manvolnum>3</manvolnum></citerefentry> for a
|
||||
full list). This value will be returned when a deny-listed system call is triggered, instead of
|
||||
terminating the processes immediately. Special setting <literal>kill</literal> can be used to
|
||||
explicitly specify killing. This value takes precedence over the one given in
|
||||
<varname>SystemCallErrorNumber=</varname>, see below. This feature makes use of the Secure Computing Mode 2
|
||||
interfaces of the kernel ('seccomp filtering') and is useful for enforcing a minimal sandboxing environment.
|
||||
Note that the <function>execve()</function>, <function>exit()</function>, <function>exit_group()</function>,
|
||||
<function>getrlimit()</function>, <function>rt_sigreturn()</function>, <function>sigreturn()</function>
|
||||
system calls and the system calls for querying time and sleeping are implicitly allow-listed and do not
|
||||
need to be listed explicitly. This option may be specified more than once, in which case the filter masks are
|
||||
<listitem><para>Takes a space-separated list of system call names or system call groups. If this
|
||||
setting is used, system calls executed by the unit processes except for the listed ones will result
|
||||
in the system call being denied (allow-listing). If the first character of the list is
|
||||
<literal>~</literal>, the effect is inverted: only the listed system calls will be denied
|
||||
(deny-listing). This option may be specified more than once, in which case the filter masks are
|
||||
merged. If the empty string is assigned, the filter is reset, all prior assignments will have no
|
||||
effect. This does not affect commands prefixed with <literal>+</literal>.</para>
|
||||
effect.</para>
|
||||
|
||||
<para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn off
|
||||
alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
|
||||
<para>Commands prefixed with <literal>+</literal> are not subject to filtering. The
|
||||
<function>execve()</function>, <function>exit()</function>, <function>exit_group()</function>,
|
||||
<function>getrlimit()</function>, <function>rt_sigreturn()</function>,
|
||||
<function>sigreturn()</function> system calls and the system calls for querying time and sleeping are
|
||||
implicitly allow-listed and do not need to be listed explicitly.</para>
|
||||
|
||||
<para>The default action when a system call is denied is to terminate the processes with a
|
||||
<constant>SIGSYS</constant> signal. This can changed using <varname>SystemCallErrorNumber=</varname>,
|
||||
see below. In addition, deny-listed system calls and system call groups may optionally be suffixed
|
||||
with a colon (<literal>:</literal>) and an argument in the same format as
|
||||
<varname>SystemCallErrorNumber=</varname>, to take this action when the matching system call is made.
|
||||
This takes precedence over the action specified in <varname>SystemCallErrorNumber=</varname>.</para>
|
||||
|
||||
<para>This feature makes use of the Secure Computing Mode 2 interfaces of the kernel ('seccomp
|
||||
filtering') and is useful for enforcing a minimal sandboxing environment.</para>
|
||||
|
||||
<para>Note that on systems supporting multiple ABIs (such as x86/x86-64) it is recommended to turn
|
||||
off alternative ABIs for services, so that they cannot be used to circumvent the restrictions of this
|
||||
option. Specifically, it is recommended to combine this option with
|
||||
<varname>SystemCallArchitectures=native</varname> or similar.</para>
|
||||
|
||||
<para>Note that strict system call filters may impact execution and error handling code paths of the service
|
||||
invocation. Specifically, access to the <function>execve()</function> system call is required for the execution
|
||||
of the service binary — if it is blocked service invocation will necessarily fail. Also, if execution of the
|
||||
service binary fails for some reason (for example: missing service executable), the error handling logic might
|
||||
require access to an additional set of system calls in order to process and log this failure correctly. It
|
||||
might be necessary to temporarily disable system call filters in order to simplify debugging of such
|
||||
failures.</para>
|
||||
<para>Note that strict system call filters may impact execution and error handling code paths of the
|
||||
service invocation. Specifically, access to the <function>execve()</function> system call is required
|
||||
for the execution of the service binary — if it is blocked service invocation will necessarily fail.
|
||||
Also, if execution of the service binary fails for some reason (for example: missing service
|
||||
executable), the error handling logic might require access to an additional set of system calls in
|
||||
order to process and log this failure correctly. It might be necessary to temporarily disable system
|
||||
call filters in order to allow debugging of such failures.</para>
|
||||
|
||||
<para>If you specify both types of this option (i.e. allow-listing and deny-listing), the first
|
||||
encountered will take precedence and will dictate the default action (termination or approval of a
|
||||
@@ -2632,8 +2634,8 @@ RestrictNamespaces=~cgroup net</programlisting>
|
||||
<function>write()</function>, and right after it add a deny list rule for <function>write()</function>,
|
||||
then <function>write()</function> will be removed from the set.)</para>
|
||||
|
||||
<para>As the number of possible system calls is large, predefined sets of system calls are provided. A set
|
||||
starts with <literal>@</literal> character, followed by name of the set.
|
||||
<para>As the number of possible system calls is large, predefined groups of system calls are
|
||||
provided. A group starts with <literal>@</literal> character, followed by name of the set.
|
||||
|
||||
<table>
|
||||
<title>Currently predefined system call sets</title>
|
||||
|
||||
Reference in New Issue
Block a user