From 2a5dd8cc9085c89f6bfd82704db2edac5453b577 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Wed, 20 Jul 2022 06:13:25 +0200 Subject: [PATCH] docs: admin-guide: for kernel bugs refer to other kernel documentation The current section 'If something goes wrong' makes a number of suggestions for debugging, bug hunting and reporting issues, which are quite briefly described in that section. However, the suggestions are also well covered in other kernel documentation or sometimes simply outdated. Here, each suggestion in that section is summarized, and then followed with its assessment, and the derived action for each suggestion: - use MAINTAINERS and mailing list: covered in 'Reporting issues', summarized in the short guide, detailed in its further section. Reporting issues even provides some specific examples that guides readers well through the needed steps. Refer to 'Reporting issues'. - contact Linus Torvalds: probably outdated as currently described. nevertheless covered in 'Reporting issues'. Reporting issues points out to contact the relevant kernel maintainers first, and after some patience and failed attempts with those maintainers, contacting Linus Torvalds might be okay. Refer to 'Reporting issues'. - tell what kernel, how to duplicate, the setup, if the problem is new or old and when did you notice: covered in 'Reporting issues', especially in Step-by-step guide how to report issues to the kernel maintainers. Refer to 'Reporting issues'. - duplicate kernel bug reports exactly: covered in 'Reporting issues', especially in Write and send the report. Refer to 'Reporting issues'. - read 'Bug hunting': keep this reference. Refer to 'Bug hunting'. - compile the kernel with CONFIG_KALLSYMS: covered in 'Reporting issues', especially in Decode failure messages. Refer to 'Reporting issues'. - alternatively, use ksymoops: ksymoops at the mentioned URL seems not to be maintained anymore. It was released roughly once a year until version 2.4.11 in 2005, but has not seen a new release since then. The information in ./scripts/ksymoops/README is from 1999, and does not give more insight on its actual maintenance state either. Ksymoops is mentioned as system utility in changes.rst, but also not recommended there. Drop the explanation on using ksymoops. - alternatively, lookup dump manually with the EIP and nm to determine the function in which the kernel crashes: this method seems already a quite advanced and low-level debugging method. Even all the further references on bug hunting and debugging do not mention it. Drop this alternative method and limit mentioning methods explained in the other existing kernel documentation. - read 'Reporting issues': keep this reference. Refer to 'Reporting issues'. - use gdb for debugging: some specific details, e.g., edit arch/x86/Makefile, are probably outdated or limited to one (historic important) setup. Using gdb is covered in 'Bug hunting', 'Debugging kernel and modules via gdb' and 'Using kgdb, kdb and the kernel debugger internals'. Refer to those three documents. Overall, it is sufficient to refer to reporting-issues.rst, bug-hunting.rst, gdb-kernel-debugging.rst and kgdb.rst and this way cover the existing suggestions. 'Reporting issues' is quite new and probably up to date. 'Bug hunting', 'Debugging kernel and modules via gdb' and 'Using kgdb, kdb and the kernel debugger internals' might need some revisit and update, but they are generally in an acceptable state for referring to them. Replace the existing suggestions by reference to other existing kernel documentation covering those suggestions---partly even nicely summarized and then explained in greater detail. Signed-off-by: Lukas Bulwahn Link: https://lore.kernel.org/r/20220720041325.15693-3-lukas.bulwahn@gmail.com Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/README.rst | 89 +++------------------------- 1 file changed, 7 insertions(+), 82 deletions(-) diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index b78fe64b39f6c..c47ce4830d4b7 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -330,85 +330,10 @@ Compiling the kernel If something goes wrong ----------------------- - - If you have problems that seem to be due to kernel bugs, please check - the file MAINTAINERS to see if there is a particular person associated - with the part of the kernel that you are having trouble with. If there - isn't anyone listed there, then the second best thing is to mail - them to me (torvalds@linux-foundation.org), and possibly to any other - relevant mailing-list or to the newsgroup. - - - In all bug-reports, *please* tell what kernel you are talking about, - how to duplicate the problem, and what your setup is (use your common - sense). If the problem is new, tell me so, and if the problem is - old, please try to tell me when you first noticed it. - - - If the bug results in a message like:: - - unable to handle kernel paging request at address C0000010 - Oops: 0002 - EIP: 0010:XXXXXXXX - eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx - esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx - ds: xxxx es: xxxx fs: xxxx gs: xxxx - Pid: xx, process nr: xx - xx xx xx xx xx xx xx xx xx xx - - or similar kernel debugging information on your screen or in your - system log, please duplicate it *exactly*. The dump may look - incomprehensible to you, but it does contain information that may - help debugging the problem. The text above the dump is also - important: it tells something about why the kernel dumped code (in - the above example, it's due to a bad kernel pointer). More information - on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst - - - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump - as is, otherwise you will have to use the ``ksymoops`` program to make - sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred). - This utility can be downloaded from - https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ . - Alternatively, you can do the dump lookup by hand: - - - In debugging dumps like the above, it helps enormously if you can - look up what the EIP value means. The hex value as such doesn't help - me or anybody else very much: it will depend on your particular - kernel setup. What you should do is take the hex value from the EIP - line (ignore the ``0010:``), and look it up in the kernel namelist to - see which kernel function contains the offending address. - - To find out the kernel function name, you'll need to find the system - binary associated with the kernel that exhibited the symptom. This is - the file 'linux/vmlinux'. To extract the namelist and match it against - the EIP from the kernel crash, do:: - - nm vmlinux | sort | less - - This will give you a list of kernel addresses sorted in ascending - order, from which it is simple to find the function that contains the - offending address. Note that the address given by the kernel - debugging messages will not necessarily match exactly with the - function addresses (in fact, that is very unlikely), so you can't - just 'grep' the list: the list will, however, give you the starting - point of each kernel function, so by looking for the function that - has a starting address lower than the one you are searching for but - is followed by a function with a higher address you will find the one - you want. In fact, it may be a good idea to include a bit of - "context" in your problem report, giving a few lines around the - interesting one. - - If you for some reason cannot do the above (you have a pre-compiled - kernel image or similar), telling me as much about your setup as - possible will help. Please read - 'Documentation/admin-guide/reporting-issues.rst' for details. - - - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you - cannot change values or set break points.) To do this, first compile the - kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make - clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``). - - After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``. - You can now use all the usual gdb commands. The command to look up the - point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes - with the EIP value.) - - gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly) - disregards the starting offset for which the kernel is compiled. +If you have problems that seem to be due to kernel bugs, please follow the +instructions at 'Documentation/admin-guide/reporting-issues.rst'. + +Hints on understanding kernel bug reports are in +'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel +with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and +'Documentation/dev-tools/kgdb.rst'. -- 2.39.5