Until now, using any form of seccomp while being unprivileged (User=)
resulted in systemd enabling no_new_privs.
There's no need for doing this because:
* We trust the filters we apply
* If User= is set and a process wants to apply a new seccomp filter, it
will need to set no_new_privs itself
An example of application that might want seccomp + !no_new_privs is a
program that wants to run as an unprivileged user but uses file
capabilities to start a web server on a privileged port while
benefitting from a restrictive seccomp profile.
We now keep the privileges needed to do seccomp before calling
enforce_user() and drop them after the seccomp filters are applied.
If the syscall filter doesn't allow the needed syscalls to drop the
privileges, we keep the previous behavior by enabling no_new_privs.
We should be more careful with distinguishing the cases "all bits set in
caps mask" from "cap mask invalid". We so far mostly used UINT64_MAX for
both, which is not correct though (as it would mean
AmbientCapabilities=~0 followed by AmbientCapabilities=0) would result
in capability 63 to be set (which we don't really allow, since that
means unset).
If the cleanup function returns the appropriate type, use that to reset the
variable. For other functions (usually the foreign ones which return void), add
an explicit value to reset to.
This causes a bit of code churn, but I think it might be worth it. In a
following patch static destructors will be called from a fuzzer, and this
change allows them to be called multiple times. But I think such a change might
help with detecting unitialized code reuse too. We hit various bugs like this,
and things are more obvious when a pointer has been set to NULL.
I was worried whether this change increases text size, but it doesn't seem to:
-Dbuildtype=debug:
before "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494520 Feb 16 15:06 build/systemd*
after "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494576 Feb 16 15:10 build/systemd*
now:
-rwxrwxr-x 1 zbyszek zbyszek 4117672 Feb 16 14:36 build/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 4494640 Feb 16 15:15 build/systemd*
-Dbuildtype=release:
before "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:09 build-rawhide/systemd*
after "tree-wide: return NULL from freeing functions":
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:10 build-rawhide/systemd*
now:
-rwxrwxr-x 1 zbyszek zbyszek 5252256 Feb 14 14:47 build-rawhide/libsystemd.so.0.30.0*
-rwxrwxr-x 1 zbyszek zbyszek 1834184 Feb 16 15:16 build-rawhide/systemd*
I would expect that the compiler would be able to elide the setting of a
variable if the variable is never used again. And this seems to be the case:
in optimized builds there is no change in size whatsoever. And the change in
size in unoptimized build is negligible.
Something strange is happening with size of libsystemd: it's bigger in
optimized builds. Something to figure out, but unrelated to this patch.
Up to now the capability CAP_SETPCAP was raised implicitly in the
function capability_bounding_set_drop.
This functionality is moved into a new function
(capability_gain_cap_setpcap).
The new function optionally provides the capability set as it was
before raisining CAP_SETPCAP.
We never return anything higher than 63, so using "long unsigned"
as the type only confused the reader. (We can still use "long unsigned"
and safe_atolu() to parse the kernel file.)
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).
Fixes#12539
linux/capability.h's CAP_TO_MASK potentially shifts a signed int "1"
(i.e. 32bit wide) left by 31 which means it becomes negative. That's
just weird, and ubsan complains about it. Let's introduce our own macro
CAP_TO_MASK_CORRECTED which doesn't fall into this trap, and make use of
it.
Fixes: #10347
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
This way we don't need to repeat the argument twice.
I didn't replace all instances. I think it's better to leave out:
- asserts
- comparisons like x & y == x, which are mathematically equivalent, but
here we aren't checking if flags are set, but if the argument fits in the
flags.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.
You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.
An example system service file might look like this:
[Unit]
Description=Service for testing caps
[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW
After starting the service it has these capabilities:
CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.