Rather than passing seeds up to userspace via EFI variables, pass seeds
directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID.
EFI variables can potentially leak and suffer from forward secrecy
issues, and processing these with userspace means that they are
initialized much too late in boot to be useful. In contrast,
LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so
is hidden from userspace entirely, and is parsed extremely early on by
the kernel, so that every single call to get_random_bytes() by the
kernel is seeded.
In order to do this properly, we use a bit more robust hashing scheme,
and make sure that each input is properly memzeroed out after use. The
scheme is:
key = HASH(LABEL || sizeof(input1) || input1 || ... || sizeof(inputN) || inputN)
new_disk_seed = HASH(key || 0)
seed_for_linux = HASH(key || 1)
The various inputs are:
- LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders
- 256 bits of seed from EFI's RNG
- The (immutable) system token, from its EFI variable
- The prior on-disk seed
- The UEFI monotonic counter
- A timestamp
This also adjusts the secure boot semantics, so that the operation is
only aborted if it's not possible to get random bytes from EFI's RNG or
a prior boot stage. With the proper hashing scheme, this should make
boot seeds safe even on secure boot.
There is currently a bug in Linux's EFI stub in which if the EFI stub
manages to generate random bytes on its own using EFI's RNG, it will
ignore what the bootloader passes. That's annoying, but it means that
either way, via systemd-boot or via EFI stub's mechanism, the RNG *does*
get initialized in a good safe way. And this bug is now fixed in the
efi.git tree, and will hopefully be backported to older kernels.
As the kernel recommends, the resultant seeds are 256 bits and are
allocated using pool memory of type EfiACPIReclaimMemory, so that it
gets freed at the right moment in boot.
We currently have a convoluted and complex selection of which random
numbers to use. We can simplify this down to two functions that cover
all of our use cases:
1) Randomness for crypto: this one needs to wait until the RNG is
initialized. So it uses getrandom(0). If that's not available, it
polls on /dev/random, and then reads from /dev/urandom. This function
returns whether or not it was successful, as before.
2) Randomness for other things: this one uses getrandom(GRND_INSECURE).
If it's not available it uses getrandom(GRND_NONBLOCK). And if that
would block, then it falls back to /dev/urandom. And if /dev/urandom
isn't available, it uses the fallback code. It never fails and
doesn't return a value.
These two cases match all the uses of randomness inside of systemd.
I would prefer to make both of these return void, and get rid of the
fallback code, and simply assert in the incredibly unlikely case that
/dev/urandom doesn't exist. But Luca disagrees, so this commit attempts
to instead keep case (1) returning a return value, which all the callers
already check, and fix the fallback code in (2) to be less bad than
before.
For the less bad fallback code for (2), we now use auxval and some
timestamps, together with various counters representing the invocation,
hash it all together and provide the output. Provided that AT_RANDOM is
secure, this construction is probably okay too, though notably it
doesn't have any forward secrecy. Fortunately, it's only used by
random_bytes() and not by crypto_random_bytes().
The actual minimum size of the pool across supported kernel versions is
32 bytes. So adjust this minimum.
I've audited every single usage of random_pool_size(), and cannot see
anywhere that this would have any impact at all on anything. We could
actually just not change the constant and everything would be fine, or
we could change it here and that's fine too. From both a functionality
and crypto perspective, it doesn't really seem to make a substantive
difference any which way, so long as the value is ≥32. However, it's
better to be correct and have the function do what it says, so clamp it
to the right minimum.
/dev/urandom is seeded with RDRAND. Calling genuine_random_bytes(...,
..., 0) will use /dev/urandom as a last resort. Hence, we gain nothing
here by having our own RDRAND wrapper, because /dev/urandom already is
based on RDRAND output, even before /dev/urandom has fully initialized.
Furthermore, RDRAND is not actually fast! And on each successive
generation of new x86 CPUs, from both AMD and Intel, it just gets
slower.
This commit simplifies things by just using /dev/urandom in cases where
we before might use RDRAND, since /dev/urandom will always have RDRAND
mixed in as part of it.
And above where I say "/dev/urandom", what I actually mean is
GRND_INSECURE, which is the same thing but won't generate warnings in
dmesg.
RANDOM_BLOCK has existed for a long time, but RANDOM_ALLOW_INSECURE was
added more recently, leading to an awkward relationship between the two.
It turns out that only one, RANDOM_BLOCK, is needed.
RANDOM_BLOCK means return cryptographically secure numbers no matter
what. If it's not set, it means try to do that, but if it fails, fall
back to using unseeded randomness.
This part of falling back to unseeded randomness is the intent of
GRND_INSECURE, which is what RANDOM_ALLOW_INSECURE previously aliased.
Rather than having an additional flag for that, it makes more sense to
just use it whenever RANDOM_BLOCK is not set. This saves us the overhead
of having to open up /dev/urandom.
Additionally, when getrandom returns too little data, but not zero data,
we currently fall back to using /dev/urandom if RANDOM_BLOCK is not set.
This doesn't quite make sense, because if getrandom returned seeded data
once, then it will forever after return the same thing as whatever
/dev/urandom does. So in that case, we should just loop again.
Since there's never really a time where /dev/urandom is able to return
some easily but more with difficulty, we can also get rid of
RANDOM_EXTEND_WITH_PSEUDO. Once the RNG is initialized, bytes
should just flow normally.
This also makes RANDOM_MAY_FAIL obsolete, because the only case this ran
was where we'd fall back to /dev/urandom on old kernels and return
GRND_INSECURE bytes on new kernels. So also get rid of that flag.
Finally, since we're always able to use GRND_INSECURE on newer kernels,
and we only fall back to /dev/urandom on older kernels, also only fall
back to using RDRAND on those older kernels. There, the only reason to
have RDRAND is to avoid a kmsg entry about unseeded randomness.
The result of this commit is that we now cascade like this:
- Use getrandom(0) if RANDOM_BLOCK.
- Use getrandom(GRND_INSECURE) if !RANDOM_BLOCK.
- Use /dev/urandom if !RANDOM_BLOCK and no GRND_INSECURE support.
- Use /dev/urandom if no getrandom() support.
- Use RDRAND if we would use /dev/urandom for any of the above reasons
and RANDOM_ALLOW_RDRAND is set.
kernel 5.6 added support for a new flag for getrandom(): GRND_INSECURE.
If we set it we can get some random data out of the kernel random pool,
even if it is not yet initializated. This is great for us to initialize
hash table seeds and such, where it is OK if they are crap initially. We
used RDRAND for these cases so far, but RDRAND is only available on
newer CPUs and some archs. Let's now use GRND_INSECURE for these cases
as well, which means we won't needlessly delay boot anymore even on
archs/CPUs that do not have RDRAND.
Of course we never set this flag when generating crypto keys or uuids.
Which makes it different from RDRAND for us (and is the reason I think
we should keep explicit RDRAND support in): RDRAND we don't trust enough
for crypto keys. But we do trust it enough for UUIDs.
The old flag name was a bit of a misnomer, as /dev/urandom cannot be
"drained". Once it's initialized it's initialized and then is good
forever. (Only /dev/random has a concept of 'draining', but we never use
that, as it's an obsolete interface).
The flag is still useful though, since it allows us to suppress accesses
to the random pool while it is not initialized, as that trips up the
kernel and it logs about any such attempts, which we really don't want.
Rename rdrand64 to rdrand, and switch from uint64_t to unsigned long.
This produces code that will compile/assemble on both x86-64 and x86-32.
This could be useful when running a 32-bit copy of systemd on a modern
Intel processor.
RDRAND is inherently arch-specific, so relying on the compiler-defined
'long' type seems reasonable.
We only use this when we don't require the best randomness. The primary
usecase for this is UUID generation, as this means we don't drain
randomness from the kernel pool for them. Since UUIDs are usually not
secrets RDRAND should be goot enough for them to avoid real-life
collisions.
Originally, the high_quality_required boolean argument controlled two
things: whether to extend any random data we successfully read with
pseudo-random data, and whether to return -ENODATA if we couldn't read
any data at all.
The boolean got replaced by RANDOM_EXTEND_WITH_PSEUDO, but this name
doesn't really cover the second part nicely. Moreover hiding both
changes of behaviour under a single flag is confusing. Hence, let's
split this part off under a new flag, and use it from random_bytes().
It's more descriptive, since we also have a function random_bytes()
which sounds very similar.
Also rename pseudorandom_bytes() to pseudo_random_bytes(). This way the
two functions are nicely systematic, one returning genuine random bytes
and the other pseudo random ones.
Pretty much all intel cpus have had RDRAND in a long time. While
CPU-internal RNG are widely not trusted, for seeding hash tables it's
perfectly OK to use: we don't high quality entropy in that case, hence
let's use it.
This is only hooked up with 'high_quality_required' is false. If we
require high quality entropy the kernel is the only source we should
use.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
During early boot, we'd call getrandom(), and immediately fall back to
reading from /dev/urandom unless we got the full requested number of bytes.
Those two sources are the same, so the most likely result is /dev/urandom
producing some pseudorandom numbers for us, complaining widely on the way.
Let's change our behaviour to be more conservative:
- if the numbers are only used to initialize a hash table, a short read is OK,
we don't really care if we get the first part of the seed truly random and
then some pseudorandom bytes. So just do that and return "success".
- if getrandom() returns -EAGAIN, fall back to rand() instead of querying
/dev/urandom again.
The idea with those two changes is to avoid generating a warning about
reading from an /dev/urandom when the kernel doesn't have enough entropy.
- only in the cases where we really need to make the best effort possible
(sd_id128_randomize and firstboot password hashing), fall back to
/dev/urandom.
When calling getrandom(), drop the checks whether the argument fits in an int —
getrandom() should do that for us already, and we call it with small arguments
only anyway.
Note that this does not really change the (relatively high) number of random
bytes we request from the kernel. On my laptop, during boot, PID 1 and all
other processes using this code through libsystemd request:
74780 bytes with high_quality_required == false
464 bytes with high_quality_required == true
and it does not eliminate reads from /dev/urandom completely. If the kernel was
short on entropy and getrandom() would fail, we would fall back to /dev/urandom
for those 464 bytes.
When falling back to /dev/urandom, don't lose the short read we already got,
and just read the remaining bytes.
If getrandom() syscall is not available, we fall back to /dev/urandom same
as before.
Fixes#4167 (possibly partially, let's see).
The only implementation that we care about — glibc — provides us
with 31 bits of entropy. Let's use 24 bits of that, instead of throwing
all but 8 away.