From ae86df938d59416ef12a416db681c23135991cd5 Mon Sep 17 00:00:00 2001 From: fuwenxin Date: Thu, 26 Mar 2026 10:28:07 +0800 Subject: [PATCH 1/3] x86/idle: Disable IBRS when CPU is offline to improve single-threaded performance ANBZ: #32185 commit 2743fe89d4d41616ffbe1e7e96e443ae7a4b1cc6 upstream. Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") disables IBRS when the CPU enters long idle. However, when a CPU becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is enabled. That will impact the performance of a sibling CPU. Mitigate this performance impact by clearing all the mitigation bits in SPEC_CTRL MSR when offline. When the CPU is online again, it will be re-initialized and so restoring the SPEC_CTRL value isn't needed. Add a comment to say that native_play_dead() is a __noreturn function, but it can't be marked as such to avoid confusion about the missing MSR restoration code. When DPDK is running on an isolated CPU thread processing network packets in user space while its sibling thread is idle. The performance of the busy DPDK thread with IBRS on and off in the sibling idle thread are: IBRS on IBRS off ------- -------- packets/second: 7.8M 10.4M avg tsc cycles/packet: 282.26 209.86 This is a 25% performance degradation. The test system is a Intel Xeon 4114 CPU @ 2.20GHz. [ mingo: Extended the changelog with performance data from the 0/4 mail. ] Signed-off-by: Waiman Long Signed-off-by: Ingo Molnar Acked-by: Rafael J. Wysocki Cc: Linus Torvalds Link: https://lore.kernel.org/r/20230727184600.26768-3-longman@redhat.com Signed-off-by: fuwenxin --- arch/x86/kernel/smpboot.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 62803e392063..3426d6aea42b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -88,6 +88,7 @@ #include #include #include +#include /* representing HT siblings of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); @@ -1435,8 +1436,15 @@ void __noreturn hlt_play_dead(void) native_halt(); } +/* + * native_play_dead() is essentially a __noreturn function, but it can't + * be marked as such as the compiler may complain about it. + */ void native_play_dead(void) { + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) + __update_spec_ctrl(0); + play_dead_common(); tboot_shutdown(TB_SHUTDOWN_WFS); -- Gitee From 694257ec92d31e3d1a65b16171fa4fbd252579b9 Mon Sep 17 00:00:00 2001 From: fuwenxin Date: Thu, 26 Mar 2026 10:32:26 +0800 Subject: [PATCH 2/3] intel_idle: Use __update_spec_ctrl() in intel_idle_ibrs() ANBZ: #32185 commit 7506203089dceb1d9e1f35d37ad2e46d44798a6d upstream. When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0 in order disable IBRS. However, the new MSR value isn't reflected in x86_spec_ctrl_current which is at odd with the other code that keep track of its state in that percpu variable. Use the new __update_spec_ctrl() to have the x86_spec_ctrl_current percpu value properly updated. Since spec-ctrl.h includes both msr.h and nospec-branch.h, we can remove those from the include file list. Signed-off-by: Waiman Long Signed-off-by: Ingo Molnar Acked-by: Rafael J. Wysocki Cc: Linus Torvalds Link: https://lore.kernel.org/r/20230727184600.26768-4-longman@redhat.com Signed-off-by: fuwenxin --- drivers/idle/intel_idle.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 22445e8cf339..650f7a6e3a8e 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -53,9 +53,8 @@ #include #include #include -#include #include -#include +#include #include #include #include @@ -185,12 +184,12 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, int ret; if (smt_active) - native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); + __update_spec_ctrl(0); ret = __intel_idle(dev, drv, index, true); if (smt_active) - native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); + __update_spec_ctrl(spec_ctrl); return ret; } -- Gitee From fccf44b4e4a7922ec0071d3d8c77b9bd9f4f1ab8 Mon Sep 17 00:00:00 2001 From: fuwenxin Date: Thu, 26 Mar 2026 10:34:03 +0800 Subject: [PATCH 3/3] intel_idle: Add ibrs_off module parameter to force-disable IBRS ANBZ: #32185 commit aa1567a7e6440b8c3af4b0d8a8219d8fc5028c5f upstream. Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") disables IBRS when the cstate is 6 or lower. However, there are some use cases where a customer may want to use max_cstate=1 to lower latency. Such use cases will suffer from the performance degradation caused by the enabling of IBRS in the sibling idle thread. Add a "ibrs_off" module parameter to force disable IBRS and the CPUIDLE_FLAG_IRQ_ENABLE flag if set. In the case of a Skylake server with max_cstate=1, this new ibrs_off option will likely increase the IRQ response latency as IRQ will now be disabled. When running SPECjbb2015 with cstates set to C1 on a Skylake system. First test when the kernel is booted with: "intel_idle.ibrs_off": max-jOPS = 117828, critical-jOPS = 66047 Then retest when the kernel is booted without the "intel_idle.ibrs_off" added: max-jOPS = 116408, critical-jOPS = 58958 That means booting with "intel_idle.ibrs_off" improves performance by: max-jOPS: +1.2%, which could be considered noise range. critical-jOPS: +12%, which is definitely a solid improvement. The admin-guide/pm/intel_idle.rst file is updated to add a description about the new "ibrs_off" module parameter. Signed-off-by: Waiman Long Signed-off-by: Ingo Molnar Acked-by: Rafael J. Wysocki Cc: Linus Torvalds Link: https://lore.kernel.org/r/20230727184600.26768-5-longman@redhat.com Signed-off-by: fuwenxin --- Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++- drivers/idle/intel_idle.c | 11 ++++++++++- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index b799a43da62e..39bd6ecce7de 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the ``MWAIT`` instruction is not allowed to be used, so the initialization of ``intel_idle`` will fail. -Apart from that there are four module parameters recognized by ``intel_idle`` +Apart from that there are five module parameters recognized by ``intel_idle`` itself that can be set via the kernel command line (they cannot be updated via sysfs, so that is the only way to change their values). @@ -216,6 +216,21 @@ are ignored). The idle states disabled this way can be enabled (on a per-CPU basis) from user space via ``sysfs``. +The ``ibrs_off`` module parameter is a boolean flag (defaults to +false). If set, it is used to control if IBRS (Indirect Branch Restricted +Speculation) should be turned off when the CPU enters an idle state. +This flag does not affect CPUs that use Enhanced IBRS which can remain +on with little performance impact. + +For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed +security vulnerabilities by default. Leaving the IBRS mode on while idling may +have a performance impact on its sibling CPU. The IBRS mode will be turned off +by default when the CPU enters into a deep idle state, but not in some +shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS +mode to off when the CPU is in any one of the available idle states. This may +help performance of a sibling CPU at the expense of a slightly higher wakeup +latency for the idle CPU. + .. _intel-idle-core-and-package-idle-states: diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 650f7a6e3a8e..790c69099e85 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -70,6 +70,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1; static unsigned int disabled_states_mask __read_mostly; static unsigned int preferred_states_mask __read_mostly; static bool force_irq_on __read_mostly; +static bool ibrs_off __read_mostly; static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; @@ -2022,11 +2023,13 @@ static void state_update_enter_method(struct cpuidle_state *state, int cstate) } if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) && - state->flags & CPUIDLE_FLAG_IBRS) { + ((state->flags & CPUIDLE_FLAG_IBRS) || ibrs_off)) { /* * IBRS mitigation requires that C-states are entered * with interrupts disabled. */ + if (ibrs_off && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)) + state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE; WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE); state->enter = intel_idle_ibrs; return; @@ -2347,3 +2350,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states"); * 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags. */ module_param(force_irq_on, bool, 0444); +/* + * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and + * CPUIDLE_FLAG_IRQ_ENABLE isn't set. + */ +module_param(ibrs_off, bool, 0444); +MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle"); -- Gitee