man: explain coredump handling in context of containers better

We have two different mechanisms, let's discuss them explicitly,
comparing their effect and intended usecase.
This commit is contained in:
Lennart Poettering
2025-04-16 15:32:45 +02:00
committed by Daan De Meyer
parent 505492d61c
commit 80653ba925
2 changed files with 67 additions and 13 deletions

View File

@@ -112,13 +112,30 @@
<varlistentry>
<term><varname>EnterNamespace=</varname></term>
<listitem><para>Controls whether <command>systemd-coredump</command> will attempt to use the mount tree of
a process that crashed in PID namespace. Access to the namespace's mount tree might be necessary to generate
a fully symbolized backtrace. If set to <literal>yes</literal>, then <command>systemd-coredump</command> will
obtain the mount tree from corresponding mount namespace and will try to generate the stack trace using the
binary and libraries from the mount namespace. Note that the coredump of the namespaced process might
still be saved in <filename>/var/lib/systemd/coredump/</filename> even if <varname>EnterNamespace=</varname>
is set to <literal>no</literal>. Defaults to <literal>no</literal>.</para>
<listitem><para>For processes belonging to a PID namespace, controls whether
<command>systemd-coredump</command> shall attempt to process core dumps on the host, using debug
information from the file system hierarchy (i.e. the mount namespace) of the process that
crashed. Access to the process' file system hierarchy might be necessary to generate a fully
symbolized backtrace. If set to <literal>yes</literal>, <command>systemd-coredump</command> will
obtain the tree of mounts from the crashing process' mount namespace and will try to generate the stack
trace in host context using the debug information of binaries and libraries contained in the crashing
process' hierarchy. Defaults to <literal>no</literal>, i.e. no attempt is made to acquire external
debug information from the process' mount namespace, in order to maximize security. This option has
no effect on processes that are part of the host's PID namespace.</para>
<para>Note that the coredump of the namespaced process is still saved in
<filename>/var/lib/systemd/coredump/</filename> on the host even if
<varname>EnterNamespace=</varname> is set to <literal>no</literal> (subject to
<varname>Storage=</varname>).</para>
<para>Note that <varname>EnterNamespace=</varname> only has an effect if a core dump is generated by
a container whose unit does not have <varname>CoredumpReceive=</varname> enabled.</para>
<para>Note that it's typically preferable to let containers and other namespace-based sandboxes
process their own coredumps, if possible, for best security. This may be enabled on the container's
unit via the <varname>CoredumpReceive=</varname> setting, see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>
for details.</para>
<xi:include href="version-info.xml" xpointer="v257"/>
</listitem>

View File

@@ -39,11 +39,11 @@
stack trace if possible. It may also save the core dump for later processing. See the "Information about
the crashed process" section below.</para>
<para>The behavior of a specific program upon reception of a signal is governed by a few
factors which are described in detail in
<citerefentry project='man-pages'><refentrytitle>core</refentrytitle><manvolnum>5</manvolnum></citerefentry>.
In particular, the core dump will only be processed when the related resource limits are sufficient.
</para>
<para>The behavior of a specific program upon reception of a signal is governed by a few factors which
are described in detail in <citerefentry
project='man-pages'><refentrytitle>core</refentrytitle><manvolnum>5</manvolnum></citerefentry>. In
particular, the core dump will only be processed when the related process resource limits
(<constant>RLIMIT_CORE</constant>) are sufficient.</para>
<para>Core dumps can be written to the journal or saved as a file. In both cases, they can be retrieved
for further processing, for example in
@@ -53,7 +53,7 @@
<para>By default, <command>systemd-coredump</command> will log the core dump to the journal, including a
backtrace if possible, and store the core dump (an image of the memory contents of the process) itself in
an external file in <filename>/var/lib/systemd/coredump</filename>. These core dumps are deleted after a
an external file in <filename>/var/lib/systemd/coredump/</filename>. These core dumps are deleted after a
few days by default; see <filename>/usr/lib/tmpfiles.d/systemd.conf</filename> for details. Note that the
removal of core files from the file system and the purging of journal entries are independent, and the
core file may be present without the journal entry, and journal entries may point to since-removed core
@@ -88,6 +88,43 @@
metadata fields in the same way it does for core dumps received from the kernel. In this mode, no core
dump is stored in the journal.</para>
</refsect2>
<refsect2>
<title>Core dumps in Containers/Namespaces</title>
<para>The <filename>systemd-coredump@.service</filename> service will automatically attempt to extract
a stacktrace from a process as it crashes. For this stacktrace symbols will be resolved based on debug
information embedded in the crashing ELF image, or equivalent debug information separately available on
the host OS. For processes that crash inside of local containers or other mount namespace-based
sandboxes, this auxiliary debug information is typically not available on the host (simply because
containers typically run different software versions than the
host). <filename>systemd-coredump</filename> provides two mechanisms to address this:</para>
<orderedlist>
<listitem><para>For full-OS containers running systemd inside it is a good idea to enable
<varname>CoredumpReceive=</varname> on the unit (see
<citerefentry><refentrytitle>systemd.resource-control</refentrytitle><manvolnum>5</manvolnum></citerefentry>),
which ensures that coredumps of a container are attempted to be forwarded to
<filename>systemd-coredump@.service</filename> running inside the container, i.e the container gets
to process and store its own core dumps. Note that
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>8</manvolnum></citerefentry>
defaults to this mode if invoked with the <option>--boot</option> switch. This mode of operation is
generally recommended for security reasons: the security-sensitive processing of the core dump is
done within the confinements of the container itself, by the container's own code, backed by the
container's own storage.</para></listitem>
<listitem><para>Alternatively, for more restricted containers (that do not run a proper
<filename>init</filename> system as PID 1) it is possible to enable processing of the core dump on
the host, with access to the debug information data from the container itself. This mode of operation
must be enabled via <varname>EnterNamespace=</varname> in
<citerefentry><refentrytitle>coredump.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry>,
and defaults to off, for security reasons.</para></listitem>
</orderedlist>
<para>If both <varname>CoredumpReceive=</varname> is enabled on the unit of the container the core dump
belongs to, and <varname>EnterNamespace=</varname> is enabled in the <filename>coredump.conf</filename>
configuration file, the former takes precedence.</para>
</refsect2>
</refsect1>
<refsect1>