git.baikalelectronics.ru Git

kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS

include/{linux,asm-generic}/export.h defines a weak symbol, __crc_*
as a placeholder.

Genksyms writes the version CRCs into the linker script, which will be
used for filling the __crc_* symbols. The linker script format depends
on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset
to the reference of CRC.

It is time to get rid of this complexity.

Now that modpost parses text files (.*.cmd) to collect all the CRCs,
it can generate C code that will be linked to the vmlinux or modules.

Generate a new C file, .vmlinux.export.c, which contains the CRCs of
symbols exported by vmlinux. It is compiled and linked to vmlinux in
scripts/link-vmlinux.sh.

Put the CRCs of symbols exported by modules into the existing *.mod.c
files. No additional build step is needed for modules. As before,
*.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal.

No linker magic is used here. The new C implementation works in the
same way, whether CONFIG_RELOCATABLE is enabled or not.
CONFIG_MODULE_REL_CRCS is no longer needed.

Previously, Kbuild invoked additional $(LD) to update the CRCs in
objects, but this step is unneeded too.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)

modpost: extract symbol versions from *.cmd files

Currently, CONFIG_MODVERSIONS needs extra link to embed the symbol
versions into ELF objects. Then, modpost extracts the version CRCs
from them.

The following figures show how it currently works, and how I am trying
to change it.

Current implementation
======================
                                                           |----------|
                 embed CRC      -------------------------->| final    |
       $(CC)       $(LD)       /  |---------|              | link for |
       -----> *.o -------> *.o -->| modpost |              | vmlinux  |
      /              /            |         |-- *.mod.c -->| or       |
     / genksyms     /             |---------|              | module   |
  *.c ------> *.symversions                                |----------|

Genksyms outputs the calculated CRCs in the form of linker script
(*.symversions), which is used by $(LD) to update the object.

If CONFIG_LTO_CLANG=y, the build process is much more complex. Embedding
the CRCs is postponed until the LLVM bitcode is converted into ELF,
creating another intermediate *.prelink.o.

However, this complexity is unneeded. There is no reason why we must
embed version CRCs in objects so early.

There is final link stage for vmlinux (scripts/link-vmlinux.sh) and
modules (scripts/Makefile.modfinal). We can link CRCs at the very last
moment.

New implementation
==================
                                                           |----------|
                   --------------------------------------->| final    |
       $(CC)      /    |---------|                         | link for |
       -----> *.o ---->|         |                         | vmlinux  |
      /                | modpost |--- .vmlinux.export.c -->| or       |
     / genksyms        |         |--- *.mod.c ------------>| module   |
  *.c ------> *.cmd -->|---------|                         |----------|

Pass the symbol versions to modpost as separate text data, which are
available in *.cmd files.

This commit changes modpost to extract CRCs from *.cmd files instead of
from ELF objects.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)

modpost: add sym_find_with_module() helper

find_symbol() returns the first symbol found in the hash table. This
table is global, so it may return a symbol from an unexpected module.

There is a case where we want to search for a symbol with a given name
in a specified module.

Add sym_find_with_module(), which receives the module pointer as the
second argument. It is equivalent to find_module() if NULL is passed
as the module pointer.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)

modpost: change the license of EXPORT_SYMBOL to bool type

There were more EXPORT_SYMBOL types in the past. The following commits
removed unused ones.

- 53de25edcfba ("module: remove EXPORT_SYMBOL_GPL_FUTURE")
- a773164eac53 ("module: remove EXPORT_UNUSED_SYMBOL*")

There are 3 remaining in enum export, but export_unknown does not make
any sense because we never expect such a situation like "we do not know
how it was exported".

If the symbol name starts with "__ksymtab_", but the section name
does not start with "___ksymtab+" or "___ksymtab_gpl+", it is not an
exported symbol.

It occurs when a variable starting with "__ksymtab_" is directly defined:

int __ksymtab_foo;

Presumably, there is no practical issue for using such a weird variable
name (but there is no good reason for doing so, either).

Anyway, that is not an exported symbol. Setting export_unknown is not
the right thing to do. Do not call sym_add_exported() in this case.

With pointless export_unknown removed, the export type finally becomes
boolean (either EXPORT_SYMBOL or EXPORT_SYMBOL_GPL).

I renamed the field name to is_gpl_only. EXPORT_SYMBOL_GPL sets it true.
Only GPL-compatible modules can use it.

I removed the orphan comment, "How a symbol is exported", which is
unrelated to sec_mismatch_count. It is about enum export.
See commit c7fdb7d23db2 ("kbuild: export-type enhancement to modpost.c")

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>

modpost: remove left-over cross_compile declaration

This is a remnant of commit 7ea58f45f7cf ("mod/file2alias: make
modalias generation safe for cross compiling").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: record symbol versions in *.cmd files

When CONFIG_MODVERSIONS=y, the output from genksyms is saved in
separate *.symversions files, and will be used much later when
CONFIG_LTO_CLANG=y because it is impossible to update LLVM bit code
here.

This approach is not robust because:

- *.symversions may or may not exist. If *.symversions does not
   exist, we never know if it is missing for legitimate reason
   (i.e. no EXPORT_SYMBOL) or something bad has happened (for
   example, the user accidentally deleted it). Once it occurs,
   it is not self-healing because *.symversions is generated
   as a side effect.

- stale (i.e. invalid) *.symversions might be picked up if an
   object is generated in a non-ordinary way, and corresponding
   *.symversions (, which was generated by old builds) just happen
   to exist.

A more robust approach is to save symbol versions in *.cmd files
because:

- *.cmd always exists (if the object is generated by if_changed
   rule or friends). Even if the user accidentally deletes it,
   it will be regenerated in the next build.

- *.cmd is always re-generated when the object is updated. This
   avoid stale version information being picked up.

I will remove *.symversions later.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>

kbuild: generate a list of objects in vmlinux

A *.mod file lists the member objects of a module, but vmlinux does
not have such a file.

Generate this list to allow modpost to know all the member objects.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>

modpost: move *.mod.c generation to write_mod_c_files()

A later commit will add more code to this list_for_each_entry loop.

Before that, move the loop body into a separate helper function.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>

modpost: merge add_{intree_flag,retpoline,staging_flag} to add_header

add_intree_flag(), add_retpoline(), and add_staging_flag() are small
enough to be merged into add_header().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nathan Chancellor <nathan@kernel.org>

scripts/prune-kernel: Use kernel-install if available

If the new-kernel-pkg utility isn't present, try using kernel-install.
This is what the %preun scriptlet in scripts/package/mkspec does too.

Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: factor out the common installation code into scripts/install.sh

Many architectures have similar install.sh scripts.

The first half is really generic; it verifies that the kernel image
and System.map exist, then executes ~/bin/${INSTALLKERNEL} or
/sbin/${INSTALLKERNEL} if available.

The second half is kind of arch-specific; it copies the kernel image
and System.map to the destination, but the code is slightly different.

Factor out the generic part into scripts/install.sh.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>

modpost: split new_symbol() to symbol allocation and hash table addition

new_symbol() does two things; allocate a new symbol and register it
to the hash table.

Using a separate function for each is easier to understand.

Replace new_symbol() with hash_add_symbol(). Remove the second parameter
of alloc_symbol().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: make sym_add_exported() always allocate a new symbol

Currently, sym_add_exported() does not allocate a symbol if the same
name symbol already exists in the hash table.

This does not reflect the real use cases. You can let an external
module override the in-tree one. In this case, the external module
will export the same name symbols as the in-tree one. However,
modpost simply ignores those symbols, then Module.symvers for the
external module loses its symbols.

sym_add_exported() should allocate a new symbol.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: make multiple export error

This is currently a warning, but I think modpost should stop building
in this case.

If the same symbol is exported multiple times and we let it keep going,
the sanity check becomes difficult.

Only the legitimate case is that an external module overrides the
corresponding in-tree module to provide a different implementation
with the same interface.

Also, there exists an upstream example that exploits this feature.

$ make M=tools/testing/nvdimm

... builds tools/testing/nvdimm/libnvdimm.ko. This is a mocked module
that overrides the symbols from drivers/nvdimm/libnvdimm.ko.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: dump Module.symvers in the same order of modules.order

modpost dumps the exported symbols into Module.symvers, but currently
in random order because it iterates in the hash table.

Add a linked list of exported symbols in struct module, so we can
iterate on symbols per module.

This commit makes Module.symvers much more readable; the outer loop in
write_dump() iterates over the modules in the order of modules.order,
and the inner loop dumps symbols in each module.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: traverse the namespace_list in order

Use the doubly linked list to traverse the list in the added order.
This makes the code more consistent.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: use doubly linked list for dump_lists

This looks easier to understand (just because this is a pattern in
the kernel code). No functional change is intended.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: traverse unresolved symbols in order

Currently, modpost manages unresolved in a singly linked list; it adds
a new node to the head, and traverses the list from new to old.

Use a doubly linked list to keep the order in the symbol table in the
ELF file.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: add sym_add_unresolved() helper

Add a small helper, sym_add_unresolved() to ease the further
refactoring.

Remove the 'weak' argument from alloc_symbol() because it is sensible
only for unresolved symbols.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: traverse modules in order

Currently, modpost manages modules in a singly linked list; it adds
a new node to the head, and traverses the list from new to old.

It works, but the error messages are shown in the reverse order.

If you have a Makefile like this:

obj-m += foo.o bar.o

then, modpost shows error messages in bar.o, foo.o, in this order.

Use a doubly linked list to keep the order in modules.order; use
list_add_tail() for the node addition and list_for_each_entry() for
the list traverse.

Now that the kernel's list macros have been imported to modpost, I will
use them actively going forward.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: import include/linux/list.h

Import include/linux/list.h to use convenient list macros in modpost.

I dropped kernel-space code such as {WRITE,READ}_ONCE etc. and unneeded
macros.

I also imported container_of() from include/linux/container_of.h and
type definitions from include/linux/types.h.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: change mod->gpl_compatible to bool type

Currently, mod->gpl_compatible is tristate; it is set to -1 by default,
then to 1 or 0 when MODULE_LICENSE() is found.

Maybe, -1 was chosen to represent the 'unknown' license, but it is not
useful.

The current code:

if (!mod->gpl_compatible)
check_for_gpl_usage(exp->export, basename, exp->name);

... only cares whether gpl_compatible is zero or not.

Change it to a bool type with the initial value 'true', which has no
functional change.

The default value should be 'true' instead of 'false'.

Since commit 56d718f68b4f ("modpost: turn missing MODULE_LICENSE() into
error"), unknown module license is an error.

The error message, "missing MODULE_LICENSE()" is enough to explain the
issue. It is not sensible to show another message, "GPL-incompatible
module ... uses GPL-only symbol".

Add comments to explain this.

While I was here, I renamed gpl_compatible to is_gpl_compatible for
clarification, and also slightly refactored the code.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: use bool type where appropriate

Use 'bool' to clarify that the valid value is true or false.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

ia64: make the install target not depend on any build artifact

The install target should not depend on any build artifact.

The reason is explained in commit 349cb67cb341 ("arm, kbuild: make
"make install" not depend on vmlinux").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: drop $(objtree)/ prefix support for clean-files

I think this hack is a bad idea. arch/powerpc/boot/Makefile is the
only and last user. Let's stop doing this.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

Makefile: fix 2 typos

Fix typos in comments so that they make sense.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

modpost: move struct namespace_list to modpost.c

There is no good reason to define struct namespace_list in modpost.h

struct module has pointers to struct namespace_list, but that does
not require the definition of struct namespace_list.

Move it to modpost.c.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: retrieve the module dependency and CRCs in check_exports()

Do not repeat the similar code.

It is simpler to do this in check_exports() instead of add_versions().

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: add a separate error for exported symbols without definition

It took me a while to understand the intent of "exp->module == mod".

This code goes back to 2003. [1]

The commit is not in this git repository, and might be worth a little
explanation.

You can add EXPORT_SYMBOL() without having its definition in the same
file (but you need to put a declaration).

This is typical when EXPORT_SYMBOL() is added in a C file, but the
actual implementation is in a separate assembly file.

One example is arch/arm/kernel/armksyms.c

In the old days, EXPORT_SYMBOL() was only available in C files (but
this limitation does not exist any more). If you forget to add the
definition, this error occurs.

Add a separate, clearer message for this case. It should be an error
even if KBUILD_MODPOST_WARN is given.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=2763b6bcb96e6a38a2fe31108fe5759ec5bcc80a

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: remove stale comment about sym_add_exported()

The description,

it may have already been added without a
CRC, in this case just update the CRC

... is no longer valid.

In the old days, this function was used to update the CRC as well.

Commit ecc2647c7c68 ("kbuild: improved modversioning support for
external modules") started to use a separate function (sym_update_crc)
for updating the CRC.

The first part, "Add an exported symbol" is correct, but it is too
obvious from the function name. Drop this comment entirely.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: do not write out any file when error occurred

If an error occurs, modpost will fail anyway. Do not write out
any content (, which might be invalid).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: use snprintf() instead of sprintf() for safety

Use snprintf() to avoid the potential buffer overflow, and also
check the return value to detect the too long path.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

checksyscalls: ignore -Wunused-macros

The macros defined in this file are for testing only and are purposely
not used. When compiled with W=2, both gcc and clang yield some
-Wunused-macros warnings. Ignore them.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

scripts: dummy-tools, add pahole

CONFIG_PAHOLE_VERSION is a part of a config since the commit below. And
when multiple people update the config, this value constantly changes.
Even if they use dummy scripts.

To fix this, add a pahole dummy script returning v99.99. (This is
translated into 9999 later in the process.)

Thereafter, this script can be invoked easily for example as:
make PAHOLE=scripts/dummy-tools/pahole oldconfig

Fixes: 4fd1ff2c0d77 (kbuild: Add CONFIG_PAHOLE_VERSION)
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kheaders: Have cpio unconditionally replace files

For out-of-tree builds, this script invokes cpio twice to copy header
files from the srctree and subsequently from the objtree. According to a
comment in the script, there might be situations in which certain files
already exist in the destination directory when header files are copied
from the objtree:

"The second CPIO can complain if files already exist which can happen
with out of tree builds having stale headers in srctree. Just silence
CPIO for now."

GNU cpio might simply print a warning like "newer or same age version
exists", but toybox cpio exits with a non-zero exit code unless the
command line option "-u" is specified.

To improve compatibility with toybox cpio, add the command line option
"-u" to unconditionally replace existing files in the destination
directory.

Signed-off-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: support W=e to make build abort in case of warning

When developing new code/feature, CONFIG_WERROR is most
often turned off, especially for people using make W=12 to
get more warnings.

In such case, turning on -Werror temporarily would require
switching on CONFIG_WERROR in the configuration, building,
then switching off CONFIG_WERROR.

For this use case, this patch introduces a new 'e' modifier
to W= as a short hand for KCFLAGS+=-Werror" so that -Werror
got added to the kernel (built-in) and modules' CFLAGS.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: read *.mod to get objects passed to $(LD) or $(AR)

ld and ar support @file, which command-line options are read from.

Now that *.mod lists the member objects in the correct order, without
duplication, it is ready to be passed to ld and ar.

By using the @file syntax, people will not be worried about the pitfall
described in the NOTE.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: make *.mod not depend on *.o

The dependency

$(obj)/%.mod: $(obj)/%$(mod-prelink-ext).o

... exists because *.mod files previously contained undefined symbols,
which are computed from *.o files when CONFIG_TRIM_UNUSED_KSYMS=y.

Now that the undefined symbols are put into separate *.usyms files,
there is no reason to make *.mod depend on *.o files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: get rid of duplication in *.mod files

It is allowed to add the same objects multiple times to obj-y / obj-m:

  obj-y += foo.o foo.o foo.o
  obj-m += bar.o bar.o bar.o

It is also allowed to add the same objects multiple times to a composite
module:

  obj-m += foo.o
  foo-y := foo1.o foo2.o foo2.o foo1.o

This flexibility is useful because the same object might be selected by
different CONFIG options, like this:

  obj-m               += foo.o
  foo-y               := foo1.o
  foo-$(CONFIG_FOO_X) += foo2.o
  foo-$(CONFIG_FOO_Y) += foo2.o

The duplicated objects are omitted at link time. It works naturally in
Makefiles because GNU Make removes duplication in $^ without changing
the order.

It is working well, almost...

A small flaw I notice is, *.mod contains duplication in such a case.

This is probably not a big deal. As far as I know, the only small
problem is scripts/mod/sumversion.c parses the same file multiple
times.

I am fixing this because I plan to reuse *.mod for other purposes,
where the duplication can be problematic.

The code change is quite simple. We already use awk to drop duplicated
lines in modules.order (see cmd_modules_order in the same file).
I copied the code, but changed RS to use spaces as record separators.

I also changed the file format to list one object per line.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: split the second line of *.mod into *.usyms

The *.mod files have two lines; the first line lists the member objects
of the module, and the second line, if CONFIG_TRIM_UNUSED_KSYMS=y, lists
the undefined symbols.

Currently, we generate *.mod after constructing composite modules,
otherwise, we cannot compute the second line. No prerequisite is
required to print the first line.

They are orthogonal. Splitting them into separate commands will ease
further cleanups.

This commit splits the list of undefined symbols out to *.usyms files.

Previously, the list of undefined symbols ended up with a very long
line, but now it has one symbol per line.

Use sed like we did before commit 145a73d5e6d9 ("kbuild: avoid split
lines in .mod files").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>

kbuild: reuse real-search to simplify cmd_mod

The first command in cmd_mod is similar to the real-search macro.
Reuse it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: make multi_depend work with targets in subdirectory

Precisely speaking, when you get the stem of the path, you should use
$(patsubst $(obj)/%,%,...) instead of $(notdir ...).

I do not see this usecase, but if you create a composite object in a
subdirectory, the Makefile should look like this:

obj-$(CONFIG_FOO) += dir/foo.o
dir/foo-objs := dir/foo1.o dir/foo2.o

The member objects should be assigned to dir/foo-objs instead of
foo-objs.

This syntax is more consistent with commit 0e3c9dcd0579 ("kbuild:
change *FLAGS_<basetarget>.o to take the path relative to $(obj)").

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: reuse suffix-search to refactor multi_depend

The complicated part of multi_depend is the same as suffix-search.

Reuse it.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: refactor cmd_modversions_S

Split the code into two macros, cmd_gen_symversions_S for running
genksyms, and cmd_modversions for running $(LD) to update the object
with CRCs.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: refactor cmd_modversions_c

cmd_modversions_c implements two parts; run genksyms to calculate CRCs
of exported symbols, run $(LD) to update the object with the CRCs. The
latter is not executed for CONFIG_LTO_CLANG=y since the object is not
ELF but LLVM bit code at this point.

The first part can be unified because we can always use $(NM) instead
of "$(OBJDUMP) -h" to dump the symbols.

Split the code into the two macros, cmd_gen_symversions_c and
cmd_modversions.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: remove annoying namespace_from_kstrtabns()

There are two call sites for sym_update_namespace().

When the symbol has no namespace, s->namespace is set to NULL,
but the conversion from "" to NULL is done in two different places.

[1] read_symbols()

  This gets the namespace from __kstrtabns_<symbol>. If the symbol has
  no namespace, sym_get_data(info, sym) returns the empty string "".
  namespace_from_kstrtabns() converts it to NULL before it is passed to
  sym_update_namespace().

[2] read_dump()

  This gets the namespace from the dump file, *.symvers. If the symbol
  has no namespace, the 'namespace' is the empty string "", which is
  directly passed into sym_update_namespace(). The conversion from
  "" to NULL is done in sym_update_namespace().

namespace_from_kstrtabns() exists only for creating this inconsistency.

Remove namespace_from_kstrtabns() so that sym_update_namespace() is
consistently passed with "" instead of NULL.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: remove redundant initializes for static variables

These are initialized with zeros without explicit initializers.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: move export_from_secname() call to more relevant place

The assigned 'export' is only used when

if (strstarts(symname, "__ksymtab_"))

is met. The else-part of the assignment is the dead code.

Move the export_from_secname() call to where it is used.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

modpost: remove useless export_from_sec()

With commit 0f50406d6d40 ("modpost: stop symbol preloading for
modversion CRC") applied, now export_from_sec() is useless.

handle_symbol() is called for every symbol in the ELF.

When 'symname' does not start with "__ksymtab", export_from_sec() is
called, and the returned value is stored in 'export'.

It is used in the last part of handle_symbol():

    if (strstarts(symname, "__ksymtab_")) {
            name = symname + strlen("__ksymtab_");
            sym_add_exported(name, mod, export);
    }

'export' is used only when 'symname' starts with "__ksymtab_".

So, the value returned by export_from_sec() is never used.

Remove useless export_from_sec(). This makes further cleanups possible.

I put the temporary code:

    export = export_unknown;

Otherwise, I would get the compiler warning:

    warning: 'export' may be used uninitialized in this function [-Wmaybe-uninitialized]

This is apparently false positive because

    if (strstarts(symname, "__ksymtab_")

... is a stronger condition than:

    if (strstarts(symname, "__ksymtab")

Anyway, this part will be cleaned up by the next commit.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

kbuild: do not remove empty *.symtypes explicitly

Presumably, 'test -s $@ || rm -f $@' intends to remove the output when
the genksyms command fails.

It is unneeded because .DELETE_ON_ERROR automatically removes the output
on failure.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>

kbuild: factor out genksyms command from cmd_gensymtypes_{c,S}

The genksyms command part in cmd_gensymtypes_{c,S} is duplicated.
Factor it out into the 'genksyms' macro.

For the readability, I slightly refactor the arguments to genksyms.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>

docs: kbuild: add references on Kconfig semantics

Add references to 1) a research paper which provides a definition of
Kconfig semantics, 2) the kismet tool, which checks for unmet direct
dependency bugs in Kconfig specifications.

Signed-off-by: Paul Gazzillo <paul@pgazz.com>
Signed-off-by: Necip Fazil Yildiran <fazilyildiran@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: Allow kernel installation packaging to override pkg-config

Add HOSTPKG_CONFIG to allow tooling that builds the kernel to override
what pkg-config and parameters are used.

Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

kbuild: uapi: use -fsyntax-only rather than -S

The UAPI header tests are checking that the generated headers do not
have syntax errors. There's no need to run the rest of the compilation
pipeline after semantic analysis has run. Replace -S -o /dev/null with
-fsyntax-only.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Linux 5.18-rc1

Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:

- Rename the staging files to give them some meaning. Just
   stage1,stag2,etc, does not show what they are for

- Check for NULL from allocation in bootconfig

- Hold event mutex for dyn_event call in user events

- Mark user events to broken (to work on the API)

- Remove eBPF updates from user events

- Remove user events from uapi header to keep it from being installed.

- Move ftrace_graph_is_dead() into inline as it is called from hot
   paths and also convert it into a static branch.

* tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Move user_events.h temporarily out of include/uapi
  ftrace: Make ftrace_graph_is_dead() a static branch
  tracing: Set user_events to BROKEN
  tracing/user_events: Remove eBPF interfaces
  tracing/user_events: Hold event_mutex during dyn_event_add
  proc: bootconfig: Add null pointer check
  tracing: Rename the staging files for trace_events

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
"A single revert to fix a boot regression seen when clk_put() started
  dropping rate range requests. It's best to keep various systems
  booting so we'll kick this out and try again next time"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  Revert "clk: Drop the rate range on clk_put()"

Merge tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes and updates:

   - Make the prctl() for enabling dynamic XSTATE components correct so
     it adds the newly requested feature to the permission bitmap
     instead of overwriting it. Add a selftest which validates that.

   - Unroll string MMIO for encrypted SEV guests as the hypervisor
     cannot emulate it.

   - Handle supervisor states correctly in the FPU/XSTATE code so it
     takes the feature set of the fpstate buffer into account. The
     feature sets can differ between host and guest buffers. Guest
     buffers do not contain supervisor states. So far this was not an
     issue, but with enabling PASID it needs to be handled in the buffer
     offset calculation and in the permission bitmaps.

   - Avoid a gazillion of repeated CPUID invocations in by caching the
     values early in the FPU/XSTATE code.

   - Enable CONFIG_WERROR in x86 defconfig.

   - Make the X86 defconfigs more useful by adapting them to Y2022
     reality"

* tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Consolidate size calculations
  x86/fpu/xstate: Handle supervisor states in XSTATE permissions
  x86/fpu/xsave: Handle compacted offsets correctly with supervisor states
  x86/fpu: Cache xfeature flags from CPUID
  x86/fpu/xsave: Initialize offset/size cache early
  x86/fpu: Remove unused supervisor only offsets
  x86/fpu: Remove redundant XCOMP_BV initialization
  x86/sev: Unroll string mmio with CC_ATTR_GUEST_UNROLL_STRING_IO
  x86/config: Make the x86 defconfigs a bit more usable
  x86/defconfig: Enable WERROR
  selftests/x86/amx: Update the ARCH_REQ_XCOMP_PERM test
  x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation

Merge tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RT signal fix from Thomas Gleixner:
"Revert the RT related signal changes. They need to be reworked and
generalized"

* tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"

Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping

Pull more dma-mapping updates from Christoph Hellwig:

- fix a regression in dma remap handling vs AMD memory encryption (me)

- finally kill off the legacy PCI DMA API (Christophe JAILLET)

* tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: move pgprot_decrypted out of dma_pgprot
  PCI/doc: cleanup references to the legacy PCI DMA API
  PCI: Remove the deprecated "pci-dma-compat.h" API

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

- avoid unnecessary rebuilds for library objects

- fix return value of __setup handlers

- fix invalid input check for "crashkernel=" kernel option

- silence KASAN warnings in unwind_frame

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()
  ARM: 9190/1: kdump: add invalid input check for 'crashkernel=0'
  ARM: 9187/1: JIVE: fix return value of __setup handler
  ARM: 9189/1: decompressor: fix unneeded rebuilds of library objects

Revert "clk: Drop the rate range on clk_put()"

This reverts commit 0bc6a8728d95e710c7f6cb09aa25958dfa6f63db. There are
multiple reports that this breaks boot on various systems. The common
theme is that orphan clks are having rates set on them when that isn't
expected. Let's revert it out for now so that -rc1 boots.

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reported-by: Tony Lindgren <tony@atomide.com>
Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/r/366a0232-bb4a-c357-6aa8-636e398e05eb@samsung.com
Cc: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20220403022818.39572-1-sboyd@kernel.org

Merge tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

- Avoid SEGV if core.cpus isn't set in 'perf stat'.

- Stop depending on .git files for building PERF-VERSION-FILE, used in
   'perf --version', fixing some perf tools build scenarios.

- Convert tracepoint.py example to python3.

- Update UAPI header copies from the kernel sources: socket,
   mman-common, msr-index, KVM, i915 and cpufeatures.

- Update copy of libbpf's hashmap.c.

- Directly return instead of using local ret variable in
   evlist__create_syswide_maps(), found by coccinelle.

* tag 'perf-tools-for-v5.18-2022-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf python: Convert tracepoint.py example to python3
  perf evlist: Directly return instead of using local ret variable
  perf cpumap: More cpu map reuse by merge.
  perf cpumap: Add is_subset function
  perf evlist: Rename cpus to user_requested_cpus
  perf tools: Stop depending on .git files for building PERF-VERSION-FILE
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools kvm headers arm64: Update KVM headers from the kernel sources
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers UAPI: Sync asm-generic/mman-common.h with the kernel
  perf beauty: Update copy of linux/socket.h with the kernel sources
  perf tools: Update copy of libbpf's hashmap.c
  perf stat: Avoid SEGV if core.cpus isn't set

Merge tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- Fix empty $(PYTHON) expansion.

- Fix UML, which got broken by the attempt to suppress Clang warnings.

- Fix warning message in modpost.

* tag 'kbuild-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  modpost: restore the warning message for missing symbol versions
  Revert "um: clang: Strip out -mno-global-merge from USER_CFLAGS"
  kbuild: Remove '-mno-global-merge'
  kbuild: fix empty ${PYTHON} in scripts/link-vmlinux.sh
  kconfig: remove stale comment about removed kconfig_print_symbol()

Merge tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- build fix for gpio

- fix crc32 build problems

- check for failed memory allocations

* tag 'mips_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: crypto: Fix CRC32 code
  MIPS: rb532: move GPIOD definition into C-files
  MIPS: lantiq: check the return value of kzalloc()
  mips: sgi-ip22: add a check for the return of kzalloc()

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

- Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

- Documentation improvements

- Prevent module exit until all VMs are freed

- PMU Virtualization fixes

- Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences

- Other miscellaneous bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  KVM: x86: fix sending PV IPI
  KVM: x86/mmu: do compare-and-exchange of gPTE via the user address
  KVM: x86: Remove redundant vm_entry_controls_clearbit() call
  KVM: x86: cleanup enter_rmode()
  KVM: x86: SVM: fix tsc scaling when the host doesn't support it
  kvm: x86: SVM: remove unused defines
  KVM: x86: SVM: move tsc ratio definitions to svm.h
  KVM: x86: SVM: fix avic spec based definitions again
  KVM: MIPS: remove reference to trap&emulate virtualization
  KVM: x86: document limitations of MSR filtering
  KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr
  KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
  KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
  KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set
  KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
  KVM: x86: Trace all APICv inhibit changes and capture overall status
  KVM: x86: Add wrappers for setting/clearing APICv inhibits
  KVM: x86: Make APICv inhibit reasons an enum and cleanup naming
  KVM: X86: Handle implicit supervisor access with SMAP
  KVM: X86: Rename variable smap to not_smap in permission_fault()
  ...

modpost: restore the warning message for missing symbol versions

This log message was accidentally chopped off.

I was wondering why this happened, but checking the ML log, Mark
precisely followed my suggestion [1].

I just used "..." because I was too lazy to type the sentence fully.
Sorry for the confusion.

[1]: https://lore.kernel.org/all/CAK7LNAR6bXXk9-ZzZYpTqzFqdYbQsZHmiWspu27rtsFxvfRuVA@mail.gmail.com/

Fixes: 6f2ae4b2b13f ("kbuild: modpost: Explicitly warn about unprototyped symbols")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

Merge tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block

Pull block driver fix from Jens Axboe:
"Got two reports on nbd spewing warnings on load now, which is a
  regression from a commit that went into your tree yesterday.

  Revert the problematic change for now"

* tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block:
  Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"

Merge tag 'pci-v5.18-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fix from Bjorn Helgaas:

- Fix Hyper-V "defined but not used" build issue added during merge
window (YueHaibing)

* tag 'pci-v5.18-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: hv: Remove unused hv_set_msi_entry_from_desc()

Merge tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Benson Leung:
"cros_ec_typec:

   - Check for EC device - Fix a crash when using the cros_ec_typec
     driver on older hardware not capable of typec commands

   - Make try power role optional

   - Mux configuration reorganization series from Prashant

  cros_ec_debugfs:

   - Fix use after free. Thanks Tzung-bi

  sensorhub:

   - cros_ec_sensorhub fixup - Split trace include file

  misc:

   - Add new mailing list for chrome-platform development:

chrome-platform@lists.linux.dev

     Now with patchwork!"

* tag 'tag-chrome-platform-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_debugfs: detach log reader wq from devm
  platform: chrome: Split trace include file
  platform/chrome: cros_ec_typec: Update mux flags during partner removal
  platform/chrome: cros_ec_typec: Configure muxes at start of port update
  platform/chrome: cros_ec_typec: Get mux state inside configure_mux
  platform/chrome: cros_ec_typec: Move mux flag checks
  platform/chrome: cros_ec_typec: Check for EC device
  platform/chrome: cros_ec_typec: Make try power role optional
  MAINTAINERS: platform-chrome: Add new chrome-platform@lists.linux.dev list

Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"

This reverts commit 8e56dcbbc31abe045cd7804ebf9fc363e64fad62.

Both Gabriel and Borislav report that this commit casues a regression
with nbd:

sysfs: cannot create duplicate filename '/dev/block/43:0'

Revert it before 5.18-rc1 and we'll investigage this separately in
due time.

Link: https://lore.kernel.org/all/YkiJTnFOt9bTv6A2@zn.tnic/
Reported-by: Gabriel L. Somlo <somlo@cmu.edu>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

watch_queue: Free the page array when watch_queue is dismantled

Commit 927e597d61c7 ("watch_queue: Free the alloc bitmap when the
watch_queue is torn down") took care of the bitmap, but not the page
array.

  BUG: memory leak
  unreferenced object 0xffff88810d9bc140 (size 32):
  comm "syz-executor335", pid 3603, jiffies 4294946994 (age 12.840s)
  hex dump (first 32 bytes):
    40 a7 40 04 00 ea ff ff 00 00 00 00 00 00 00 00  @.@.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     kmalloc_array include/linux/slab.h:621 [inline]
     kcalloc include/linux/slab.h:652 [inline]
     watch_queue_set_size+0x12f/0x2e0 kernel/watch_queue.c:251
     pipe_ioctl+0x82/0x140 fs/pipe.c:632
     vfs_ioctl fs/ioctl.c:51 [inline]
     __do_sys_ioctl fs/ioctl.c:874 [inline]
     __se_sys_ioctl fs/ioctl.c:860 [inline]
     __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]

Reported-by: syzbot+25ea042ae28f3888727a@syzkaller.appspotmail.com
Fixes: a23d9805b5b3 ("pipe: Add general notification queue support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20220322004654.618274-1-eric.dumazet@gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tracing: mark user_events as BROKEN

After being merged, user_events become more visible to a wider audience
that have concerns with the current API.

It is too late to fix this for this release, but instead of a full
revert, just mark it as BROKEN (which prevents it from being selected in
make config). Then we can work finding a better API. If that fails,
then it will need to be completely reverted.

To not have the code silently bitrot, still allow building it with
COMPILE_TEST.

And to prevent the uapi header from being installed, then later changed,
and then have an old distro user space see the old version, move the
header file out of the uapi directory.

Surround the include with CONFIG_COMPILE_TEST to the current location,
but when the BROKEN tag is taken off, it will use the uapi directory,
and fail to compile. This is a good way to remind us to move the header
back.

Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home
Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tracing: Move user_events.h temporarily out of include/uapi

While user_events API is under development and has been marked for broken
to not let the API become fixed, move the header file out of the uapi
directory. This is to prevent it from being installed, then later changed,
and then have an old distro user space update with a new kernel, where
applications see the user_events being available, but the old header is in
place, and then they get compiled incorrectly.

Also, surround the include with CONFIG_COMPILE_TEST to the current
location, but when the BROKEN tag is taken off, it will use the uapi
directory, and fail to compile. This is a good way to remind us to move
the header back.

Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home
Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com
Link: https://lkml.kernel.org/r/20220401143903.188384f3@gandalf.local.home
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ftrace: Make ftrace_graph_is_dead() a static branch

ftrace_graph_is_dead() is used on hot paths, it just reads a variable
in memory and is not worth suffering function call constraints.

For instance, at entry of prepare_ftrace_return(), inlining it avoids
saving prepare_ftrace_return() parameters to stack and restoring them
after calling ftrace_graph_is_dead().

While at it using a static branch is even more performant and is
rather well adapted considering that the returned value will almost
never change.

Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool
by a static branch.

The performance improvement is noticeable.

Link: https://lkml.kernel.org/r/e0411a6a0ed3eafff0ad2bc9cd4b0e202b4617df.1648623570.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Set user_events to BROKEN

After being merged, user_events become more visible to a wider audience
that have concerns with the current API. It is too late to fix this for
this release, but instead of a full revert, just mark it as BROKEN (which
prevents it from being selected in make config). Then we can work finding
a better API. If that fails, then it will need to be completely reverted.

Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/
Link: https://lkml.kernel.org/r/20220330155835.5e1f6669@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/user_events: Remove eBPF interfaces

Remove eBPF interfaces within user_events to ensure they are fully
reviewed.

Link: https://lore.kernel.org/all/20220329165718.GA10381@kbox/
Link: https://lkml.kernel.org/r/20220329173051.10087-1-beaub@linux.microsoft.com
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing/user_events: Hold event_mutex during dyn_event_add

Make sure the event_mutex is properly held during dyn_event_add call.
This is required when adding dynamic events.

Link: https://lkml.kernel.org/r/20220328223225.1992-1-beaub@linux.microsoft.com
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

proc: bootconfig: Add null pointer check

kzalloc is a memory allocation function which can return NULL when some
internal memory errors happen. It is safer to add null pointer check.

Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn
Cc: stable@vger.kernel.org
Fixes: 9d52ced8893c ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracing: Rename the staging files for trace_events

When looking for implementation of different phases of the creation of the
TRACE_EVENT() macro, it is pretty useless when all helper macro
redefinitions are in files labeled "stageX_defines.h". Rename them to
state which phase the files are for. For instance, when looking for the
defines that are used to create the event fields, seeing
"stage4_event_fields.h" gives the developer a good idea that the defines
are in that file.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

KVM: x86: fix sending PV IPI

If apic_id is less than min, and (max - apic_id) is greater than
KVM_IPI_CLUSTER_SIZE, then the third check condition is satisfied but
the new apic_id does not fit the bitmask. In this case __send_ipi_mask
should send the IPI.

This is mostly theoretical, but it can happen if the apic_ids on three
iterations of the loop are for example 1, KVM_IPI_CLUSTER_SIZE, 0.

Fixes: 3c7773d7663 ("KVM: X86: Implement PV IPIs in linux guest")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Message-Id: <1646814944-51801-1-git-send-email-lirongqing@baidu.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/mmu: do compare-and-exchange of gPTE via the user address

FNAME(cmpxchg_gpte) is an inefficient mess.  It is at least decent if it
can go through get_user_pages_fast(), but if it cannot then it tries to
use memremap(); that is not just terribly slow, it is also wrong because
it assumes that the VM_PFNMAP VMA is contiguous.

The right way to do it would be to do the same thing as
hva_to_pfn_remapped() does since commit a59d68b54670 ("KVM: MMU: try to
fix up page faults before giving up", 2016-07-05), using follow_pte()
and fixup_user_fault() to determine the correct address to use for
memremap().  To do this, one could for example extract hva_to_pfn()
for use outside virt/kvm/kvm_main.c.  But really there is no reason to
do that either, because there is already a perfectly valid address to
do the cmpxchg() on, only it is a userspace address.  That means doing
user_access_begin()/user_access_end() and writing the code in assembly
to handle exceptions correctly.  Worse, the guest PTE can be 8-byte
even on i686 so there is the extra complication of using cmpxchg8b to
account for.  But at least it is an efficient mess.

(Thanks to Linus for suggesting improvement on the inline assembly).

Reported-by: Qiuhao Li <qiuhao@sysec.org>
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com
Debugged-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: b89f9ae1559d ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Remove redundant vm_entry_controls_clearbit() call

When emulating exit from long mode, EFER_LMA is cleared with
vmx_set_efer(). This will already unset the VM_ENTRY_IA32E_MODE control
bit as requested by SDM, so there is no need to unset VM_ENTRY_IA32E_MODE
again in exit_lmode() explicitly. In case EFER isn't supported by
hardware, long mode isn't supported, so exit_lmode() cannot be reached.

Note that, thanks to the shadow controls mechanism, this change doesn't
eliminate vmread or vmwrite.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-3-zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: cleanup enter_rmode()

vmx_set_efer() sets uret->data but, in fact if the value of uret->data
will be used vmx_setup_uret_msrs() will have rewritten it with the value
returned by update_transition_efer(). uret->data is consumed if and only
if uret->load_into_hardware is true, and vmx_setup_uret_msrs() takes care
of (a) updating uret->data before setting uret->load_into_hardware to true
(b) setting uret->load_into_hardware to false if uret->data isn't updated.

Opportunistically use "vmx" directly instead of redoing to_vmx().

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-2-zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: SVM: fix tsc scaling when the host doesn't support it

It was decided that when TSC scaling is not supported,
the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0'
value.

However in this case kvm_max_tsc_scaling_ratio is not set,
which breaks various assumptions.

Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of
host support. For consistency, do the same for VMX.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

kvm: x86: SVM: remove unused defines

Remove some unused #defines from svm.c

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: SVM: move tsc ratio definitions to svm.h

Another piece of SVM spec which should be in the header file

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: SVM: fix avic spec based definitions again

Due to wrong rebase, commit
b4c1e60675cf5 ("KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255")

moved avic spec #defines back to avic.c.

Move them back, and while at it extend AVIC_DOORBELL_PHYSICAL_ID_MASK to 12
bits as well (it will be used in nested avic)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: MIPS: remove reference to trap&emulate virtualization

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: document limitations of MSR filtering

MSR filtering requires an exit to userspace that is hard to implement and
would be very slow in the case of nested VMX vmexit and vmentry MSR
accesses. Document the limitation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr

If MSR access is rejected by MSR filtering,
kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED,
and the return value is only handled well for rdmsr/wrmsr.
However, some instruction emulation and state transition also
use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger
some unexpected results if MSR access is rejected, E.g. RDPID
emulation would inject a #UD but RDPID wouldn't cause a exit
when RDPID is supported in hardware and ENABLE_RDTSCP is set.
And it would also cause failure when load MSR at nested entry/exit.
Since msr filtering is based on MSR bitmap, it is better to only
do MSR filtering for rdmsr/wrmsr.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/emulator: Emulate RDPID only if it is enabled in guest

When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.

Fixes: 809be93a1ef1 ("KVM: x86: emulate RDPID")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/pmu: Fix and isolate TSX-specific performance event logic

HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.

Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."

Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.

Reported-by: Ravi Bangoria <ravi.bangoria@amd.com>
Reported-by: Jim Mattson <jmattson@google.com>
Fixes: bbe6638bf4f6 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set

It makes more sense to print new SPTE value than the
old value.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220302102457.588450-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs

AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.

When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.

This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.

For example,

root@Ubuntu1804:~# perf stat -e r26 -a sleep 1

Performance counter stats for 'system wide':

                 0      r26

       1.001070977 seconds time elapsed

Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90

Fixes: c97977012838 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: Lotus Fenn <lotusf@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Trace all APICv inhibit changes and capture overall status

Trace all APICv inhibit changes instead of just those that result in
APICv being (un)inhibited, and log the current state. Debugging why
APICv isn't working is frustrating as it's hard to see why APICv is still
inhibited, and logging only the first inhibition means unnecessary onion
peeling.

Opportunistically drop the export of the tracepoint, it is not and should
not be used by vendor code due to the need to serialize toggling via
apicv_update_lock.

Note, using the common flow means kvm_apicv_init() switched from atomic
to non-atomic bitwise operations. The VM is unreachable at init, so
non-atomic is perfectly ok.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Add wrappers for setting/clearing APICv inhibits

Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well. Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.

For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Make APICv inhibit reasons an enum and cleanup naming

Use an enum for the APICv inhibit reasons, there is no meaning behind
their values and they most definitely are not "unsigned longs". Rename
the various params to "reason" for consistency and clarity (inhibit may
be confused as a command, i.e. inhibit APICv, instead of the reason that
is getting toggled/checked).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: X86: Handle implicit supervisor access with SMAP

There are two kinds of implicit supervisor access
implicit supervisor access when CPL = 3
implicit supervisor access when CPL < 3

Current permission_fault() handles only the first kind for SMAP.

But if the access is implicit when SMAP is on, data may not be read
nor write from any user-mode address regardless the current CPL.

So the second kind should be also supported.

The first kind can be detect via CPL and access mode: if it is
supervisor access and CPL = 3, it must be implicit supervisor access.

But it is not possible to detect the second kind without extra
information, so this patch adds an artificial PFERR_EXPLICIT_ACCESS
into @access. This extra information also works for the first kind, so
the logic is changed to use this information for both cases.

The value of PFERR_EXPLICIT_ACCESS is deliberately chosen to be bit 48
which is in the most significant 16 bits of u64 and less likely to be
forced to change due to future hardware uses it.

This patch removes the call to ->get_cpl() for access mode is determined
by @access.  Not only does it reduce a function call, but also remove
confusions when the permission is checked for nested TDP.  The nested
TDP shouldn't have SMAP checking nor even the L2's CPL have any bearing
on it.  The original code works just because it is always user walk for
NPT and SMAP fault is not set for EPT in update_permission_bitmask.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-5-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: X86: Rename variable smap to not_smap in permission_fault()

Comments above the variable says the bit is set when SMAP is overridden
or the same meaning in update_permission_bitmask(): it is not subjected
to SMAP restriction.

Renaming it to reflect the negative implication and make the code better
readability.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-4-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>