From 62e3f8059dd1e0c6018bbdd70184246a8b14216a Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 14 Jun 2022 21:58:28 +0000 Subject: [PATCH] KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case commit da4d568d4c7482ff663e006e86c0be48d5f0f9c8 upstream. If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in vmcs12. When restoring nested state, e.g. after migration, without a nested run pending, prepare_vmcs02() will propagate nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into vmcs02.GUEST_IA32_DEBUGCTL. If userspace restores nested state before MSRs, then loading garbage is a non-issue as loading DEBUGCTL will also update vmcs02. But if usersepace restores MSRs first, then KVM is responsible for propagating L2's value, which is actually thrown into vmcs01, into vmcs02. Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state is all kinds of bizarre and ideally would not be supported. Sadly, some VMMs do exactly that and rely on KVM to make things work. Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL to vmcs02 across RSM may corrupt L2's DEBUGCTL. But KVM's entire VMX+SMM emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor. Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com Fixes: 1877921b6206 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-Id: <20220614215831.3762138-3-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx/nested.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index a145fa61d40de..149a030b7d773 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3364,7 +3364,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu)) evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu); - if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) + if (!vmx->nested.nested_run_pending || + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || -- 2.39.5