This ensures that shell string escape operations will not produce output
with invalid UTF-8 from the input by escaping invalid UTF-8 data as if
they were single byte characters.
cunescape() sets output on success, so initialization is not necessary. There
was no comment, but I think they may have been added because the compiler
wasn't convinced that the return value is non-negative on success. It could
have been confused by the int return type on escape*(), which was changed by
the one of preceeding commits to ssize_t, or by the length calculation, so add
an assert to help the compiler.
For some reason coverity thinks the output can be leaked here (CID #1458111).
I don't see how.
So far we would append "…" or "..." when the string was wider than the specified
output width. But let's add a mode where the caller knows that the string being
passed is already truncated.
The condition for jumping back in utf8_escape_non_printable_full() was
off-by-one. But we only jumped to that label after doing a check with a
stronger condition, so I think it didn't matter. Now it matters because we'd
output the forced ellipsis one column too early.
The comment in the code said that so far this didn't matter, but I want to use
shell quoting in more places where this will make a difference. So control
characters are now escaped. Normal utf-8 characters are passed through, it
is 2021 after all and pretty much everyone is (or should be) using utf-8.
While touching the code, change 'char *r' → 'char *buf', in line with modern
style.
shell_escape() is mostly used for mount paths and similar, where we assume
no newlines are present in the string. But if any were ever present, we
should escape them. So let's simplify the code by making this unconditional.
I want to tweak behaviour further, and that'll be easier when "style"
is converted to a bitfield.
Some callers used ESCAPE_BACKSLASH_ONELINE, and others not. But the
ones that didn't, simply didn't care, because the argument was assumed to
be one-line anyway (e.g. a service name). In environment-generator, this
could make a difference. But I think it's better to escape the newlines
there too. So newlines are now always escaped, to simplify the code and
the test matrix.
I think it's nicer to move it to the left, since the function
is already a pointer by itself, and it just happens to return a pointer,
and the two concepts are completely separate.
This restores show_pid_array() output in legacy locales on the console.
Only one call to get_process_cmdline() is changed, all others retain
utf8-only mode. This affects systemd-cgls, systemctl status, etc, when
working locally.
Calls to get_process_cmdline() that cross a process boundary always use
utf8. It's the callers responsibility to convert this to some encoding that
they use. This means that we always pass utf8 over the bus.
The function takes a pointer to a random block of memory and
the length of that block. It shouldn't crash every time it sees
a zero byte at the beginning there.
This should help the dev-kmsg fuzzer to keep going.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
Also called "ANSI-C Quoting" in info:(bash) ANSI-C Quoting.
The escaping rules are a POSIX proposal, and are described in
http://austingroupbugs.net/view.php?id=249. There's a lot of back-and-forth on
the details of escaping of control characters, but we'll be only using a small
subset of the syntax that is common to all proposals and is widely supported.
Unfortunately dash and fish and maybe some other shells do not support it (see
the man page patch for a list).
This allows environment variables to be safely exported using show-environment
and imported into the shell. Shells which do not support this syntax will have
to do something like
export $(systemctl show-environment|grep -v '=\$')
or whatever is appropriate in their case. I think csh and fish do not support
the A=B syntax anyway, so the change is moot for them.
Fixes#5536.
v2:
- also escape newlines (which currently disallowed in shell values, so this
doesn't really matter), and tabs (as $'\t'), and ! (as $'!'). This way quoted
output can be included directly in both interactive and noninteractive bash.
rework C11 utf8.[ch] to use char32_t instead of uint32_t when referring
to unicode chars, to make things more expressive.
[
@zonque:
* rebased to current master
* use AC_CHECK_DECLS to detect availibility of char{16,32}_t
* make utf8_encoded_to_unichar() return int
]