Skip to content

Commit

Permalink
Automatic merge of 'next-test' into merge-test (2024-05-03 00:08)
Browse files Browse the repository at this point in the history
  • Loading branch information
mpe committed May 2, 2024
2 parents cd658a8 + cebb000 commit 05fb2fc
Show file tree
Hide file tree
Showing 62 changed files with 1,332 additions and 450 deletions.
11 changes: 11 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-fadump
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,14 @@ Contact: [email protected]
Description: read only
Provide information about the amount of memory reserved by
FADump to save the crash dump in bytes.

What: /sys/kernel/fadump/hotplug_ready
Date: Apr 2024
Contact: [email protected]
Description: read only
Kdump udev rule re-registers fadump on memory add/remove events,
primarily to update the elfcorehdr. This sysfs indicates the
kdump udev rule that fadump re-registration is not required on
memory add/remove events because elfcorehdr is now prepared in
the second/fadump kernel.
User: kexec-tools
141 changes: 139 additions & 2 deletions Documentation/arch/powerpc/dexcr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,145 @@ state for a process.
Configuration
=============

The DEXCR is currently unconfigurable. All threads are run with the
NPHIE aspect enabled.
prctl
-----

A process can control its own userspace DEXCR value using the
``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of
:manpage:`prctl(2)` commands. These calls have the form::

prctl(PR_PPC_GET_DEXCR, unsigned long which, 0, 0, 0);
prctl(PR_PPC_SET_DEXCR, unsigned long which, unsigned long ctrl, 0, 0);

The possible 'which' and 'ctrl' values are as follows. Note there is no relation
between the 'which' value and the DEXCR aspect's index.

.. flat-table::
:header-rows: 1
:widths: 2 7 1

* - ``prctl()`` which
- Aspect name
- Aspect index

* - ``PR_PPC_DEXCR_SBHE``
- Speculative Branch Hint Enable (SBHE)
- 0

* - ``PR_PPC_DEXCR_IBRTPD``
- Indirect Branch Recurrent Target Prediction Disable (IBRTPD)
- 3

* - ``PR_PPC_DEXCR_SRAPD``
- Subroutine Return Address Prediction Disable (SRAPD)
- 4

* - ``PR_PPC_DEXCR_NPHIE``
- Non-Privileged Hash Instruction Enable (NPHIE)
- 5

.. flat-table::
:header-rows: 1
:widths: 2 8

* - ``prctl()`` ctrl
- Meaning

* - ``PR_PPC_DEXCR_CTRL_EDITABLE``
- This aspect can be configured with PR_PPC_SET_DEXCR (get only)

* - ``PR_PPC_DEXCR_CTRL_SET``
- This aspect is set / set this aspect

* - ``PR_PPC_DEXCR_CTRL_CLEAR``
- This aspect is clear / clear this aspect

* - ``PR_PPC_DEXCR_CTRL_SET_ONEXEC``
- This aspect will be set after exec / set this aspect after exec

* - ``PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``
- This aspect will be clear after exec / clear this aspect after exec

Note that

* which is a plain value, not a bitmask. Aspects must be worked with individually.

* ctrl is a bitmask. ``PR_PPC_GET_DEXCR`` returns both the current and onexec
configuration. For example, ``PR_PPC_GET_DEXCR`` may return
``PR_PPC_DEXCR_CTRL_EDITABLE | PR_PPC_DEXCR_CTRL_SET |
PR_PPC_DEXCR_CTRL_CLEAR_ONEXEC``. This would indicate the aspect is currently
set, it will be cleared when you run exec, and you can change this with the
``PR_PPC_SET_DEXCR`` prctl.

* The set/clear terminology refers to setting/clearing the bit in the DEXCR.
For example::

prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_CTRL_SET, 0, 0);

will set the IBRTPD aspect bit in the DEXCR, causing indirect branch prediction
to be disabled.

* The status returned by ``PR_PPC_GET_DEXCR`` represents what value the process
would like applied. It does not include any alternative overrides, such as if
the hypervisor is enforcing the aspect be set. To see the true DEXCR state
software should read the appropriate SPRs directly.

* The aspect state when starting a process is copied from the parent's state on
:manpage:`fork(2)`. The state is reset to a fixed value on
:manpage:`execve(2)`. The PR_PPC_SET_DEXCR prctl() can control both of these
values.

* The ``*_ONEXEC`` controls do not change the current process's DEXCR.

Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_CTRL_SET`` or
``PR_PPC_DEXCR_CTRL_CLEAR`` to edit a given aspect.

Common error codes for both getting and setting the DEXCR are as follows:

.. flat-table::
:header-rows: 1
:widths: 2 8

* - Error
- Meaning

* - ``EINVAL``
- The DEXCR is not supported by the kernel.

* - ``ENODEV``
- The aspect is not recognised by the kernel or not supported by the
hardware.

``PR_PPC_SET_DEXCR`` may also report the following error codes:

.. flat-table::
:header-rows: 1
:widths: 2 8

* - Error
- Meaning

* - ``EINVAL``
- The ctrl value contains unrecognised flags.

* - ``EINVAL``
- The ctrl value contains mutually conflicting flags (e.g.,
``PR_PPC_DEXCR_CTRL_SET | PR_PPC_DEXCR_CTRL_CLEAR``)

* - ``EPERM``
- This aspect cannot be modified with prctl() (check for the
PR_PPC_DEXCR_CTRL_EDITABLE flag with PR_PPC_GET_DEXCR).

* - ``EPERM``
- The process does not have sufficient privilege to perform the operation.
For example, clearing NPHIE on exec is a privileged operation (a process
can still clear its own NPHIE aspect without privileges).

This interface allows a process to control its own DEXCR aspects, and also set
the initial DEXCR value for any children in its process tree (up to the next
child to use an ``*_ONEXEC`` control). This allows fine-grained control over the
default value of the DEXCR, for example allowing containers to run with different
default values.


coredump and ptrace
Expand Down
91 changes: 42 additions & 49 deletions Documentation/arch/powerpc/firmware-assisted-dump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,12 +134,12 @@ that are run. If there is dump data, then the
memory is held.

If there is no waiting dump data, then only the memory required to
hold CPU state, HPTE region, boot memory dump, FADump header and
elfcore header, is usually reserved at an offset greater than boot
memory size (see Fig. 1). This area is *not* released: this region
will be kept permanently reserved, so that it can act as a receptacle
for a copy of the boot memory content in addition to CPU state and
HPTE region, in the case a crash does occur.
hold CPU state, HPTE region, boot memory dump, and FADump header is
usually reserved at an offset greater than boot memory size (see Fig. 1).
This area is *not* released: this region will be kept permanently
reserved, so that it can act as a receptacle for a copy of the boot
memory content in addition to CPU state and HPTE region, in the case
a crash does occur.

Since this reserved memory area is used only after the system crash,
there is no point in blocking this significant chunk of memory from
Expand All @@ -153,22 +153,22 @@ that were present in CMA region::

o Memory Reservation during first kernel

Low memory Top of memory
0 boot memory size |<--- Reserved dump area --->| |
| | | Permanent Reservation | |
V V | | V
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| | |///|////| DUMP | HDR | ELF |////| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| ^ ^ ^ ^ ^
| | | | | |
\ CPU HPTE / | |
------------------------------ | |
Boot memory content gets transferred | |
to reserved area by firmware at the | |
time of crash. | |
FADump Header |
(meta area) |
Low memory Top of memory
0 boot memory size |<------ Reserved dump area ----->| |
| | | Permanent Reservation | |
V V | | V
+-----------+-----/ /---+---+----+-----------+-------+----+-----+
| | |///|////| DUMP | HDR |////| |
+-----------+-----/ /---+---+----+-----------+-------+----+-----+
| ^ ^ ^ ^ ^
| | | | | |
\ CPU HPTE / | |
-------------------------------- | |
Boot memory content gets transferred | |
to reserved area by firmware at the | |
time of crash. | |
FADump Header |
(meta area) |
|
|
Metadata: This area holds a metadata structure whose
Expand All @@ -186,20 +186,33 @@ that were present in CMA region::
0 boot memory size |
| |<------------ Crash preserved area ------------>|
V V |<--- Reserved dump area --->| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| | |///|////| DUMP | HDR | ELF |////| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| |
V V
Used by second /proc/vmcore
kernel to boot
+----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
| |ELF| | |///|////| DUMP | HDR |/////| |
+----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
| | | | | |
----- ------------------------------ ---------------
\ | |
\ | |
\ | |
\ | ----------------------------
\ | /
\ | /
\ | /
/proc/vmcore


+---+
|///| -> Regions (CPU, HPTE & Metadata) marked like this in the above
+---+ figures are not always present. For example, OPAL platform
does not have CPU & HPTE regions while Metadata region is
not supported on pSeries currently.

+---+
|ELF| -> elfcorehdr, it is created in second kernel after crash.
+---+

Note: Memory from 0 to the boot memory size is used by second kernel

Fig. 2


Expand Down Expand Up @@ -353,26 +366,6 @@ TODO:
- Need to come up with the better approach to find out more
accurate boot memory size that is required for a kernel to
boot successfully when booted with restricted memory.
- The FADump implementation introduces a FADump crash info structure
in the scratch area before the ELF core header. The idea of introducing
this structure is to pass some important crash info data to the second
kernel which will help second kernel to populate ELF core header with
correct data before it gets exported through /proc/vmcore. The current
design implementation does not address a possibility of introducing
additional fields (in future) to this structure without affecting
compatibility. Need to come up with the better approach to address this.

The possible approaches are:

1. Introduce version field for version tracking, bump up the version
whenever a new field is added to the structure in future. The version
field can be used to find out what fields are valid for the current
version of the structure.
2. Reserve the area of predefined size (say PAGE_SIZE) for this
structure and have unused area as reserved (initialized to zero)
for future field additions.

The advantage of approach 1 over 2 is we don't need to reserve extra space.

Author: Mahesh Salgaonkar <[email protected]>

Expand Down
3 changes: 1 addition & 2 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -12470,7 +12470,6 @@ LINUX FOR POWERPC (32-BIT AND 64-BIT)
M: Michael Ellerman <[email protected]>
R: Nicholas Piggin <[email protected]>
R: Christophe Leroy <[email protected]>
R: Aneesh Kumar K.V <[email protected]>
R: Naveen N. Rao <[email protected]>
L: [email protected]
S: Supported
Expand Down Expand Up @@ -14893,7 +14892,7 @@ F: drivers/phy/marvell/phy-pxa-usb.c

MMU GATHER AND TLB INVALIDATION
M: Will Deacon <[email protected]>
M: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
M: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
M: Andrew Morton <[email protected]>
M: Nick Piggin <[email protected]>
M: Peter Zijlstra <[email protected]>
Expand Down
31 changes: 29 additions & 2 deletions arch/powerpc/include/asm/fadump-internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,38 @@ static inline u64 fadump_str_to_u64(const char *str)

#define FADUMP_CPU_UNKNOWN (~((u32)0))

#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF")
/*
* The introduction of new fields in the fadump crash info header has
* led to a change in the magic key from `FADMPINF` to `FADMPSIG` for
* identifying a kernel crash from an old kernel.
*
* To prevent the need for further changes to the magic number in the
* event of future modifications to the fadump crash info header, a
* version field has been introduced to track the fadump crash info
* header version.
*
* Consider a few points before adding new members to the fadump crash info
* header structure:
*
* - Append new members; avoid adding them in between.
* - Non-primitive members should have a size member as well.
* - For every change in the fadump header, increment the
* fadump header version. This helps the updated kernel decide how to
* handle kernel dumps from older kernels.
*/
#define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF")
#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG")
#define FADUMP_HEADER_VERSION 1

/* fadump crash info structure */
struct fadump_crash_info_header {
u64 magic_number;
u64 elfcorehdr_addr;
u32 version;
u32 crashing_cpu;
u64 vmcoreinfo_raddr;
u64 vmcoreinfo_size;
u32 pt_regs_sz;
u32 cpu_mask_sz;
struct pt_regs regs;
struct cpumask cpu_mask;
};
Expand Down Expand Up @@ -94,6 +119,8 @@ struct fw_dump {
u64 boot_mem_regs_cnt;

unsigned long fadumphdr_addr;
u64 elfcorehdr_addr;
u64 elfcorehdr_size;
unsigned long cpu_notes_buf_vaddr;
unsigned long cpu_notes_buf_size;

Expand Down
10 changes: 5 additions & 5 deletions arch/powerpc/include/asm/hvcall.h
Original file line number Diff line number Diff line change
Expand Up @@ -524,7 +524,7 @@ long plpar_hcall_norets_notrace(unsigned long opcode, ...);
* Used for all but the craziest of phyp interfaces (see plpar_hcall9)
*/
#define PLPAR_HCALL_BUFSIZE 4
long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...);
long plpar_hcall(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...);

/**
* plpar_hcall_raw: - Make a hypervisor call without calculating hcall stats
Expand All @@ -538,7 +538,7 @@ long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...);
* plpar_hcall, but plpar_hcall_raw works in real mode and does not
* calculate hypervisor call statistics.
*/
long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...);
long plpar_hcall_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...);

/**
* plpar_hcall9: - Make a pseries hypervisor call with up to 9 return arguments
Expand All @@ -549,8 +549,8 @@ long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...);
* PLPAR_HCALL9_BUFSIZE to size the return argument buffer.
*/
#define PLPAR_HCALL9_BUFSIZE 9
long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...);
long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...);
long plpar_hcall9(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...);
long plpar_hcall9_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...);

/* pseries hcall tracing */
extern struct static_key hcall_tracepoint_key;
Expand All @@ -570,7 +570,7 @@ struct hvcall_mpp_data {
unsigned long backing_mem;
};

int h_get_mpp(struct hvcall_mpp_data *);
long h_get_mpp(struct hvcall_mpp_data *mpp_data);

struct hvcall_mpp_x_data {
unsigned long coalesced_bytes;
Expand Down
Loading

0 comments on commit 05fb2fc

Please sign in to comment.