]> git.baikalelectronics.ru Git - kernel.git/commitdiff
net/mlx5e: Fix cleanup null-ptr deref on encap lock
authorPaul Blakey <paulb@nvidia.com>
Sun, 12 Feb 2023 09:01:43 +0000 (11:01 +0200)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 22 Mar 2023 12:33:49 +0000 (13:33 +0100)
[ Upstream commit 301471d798d1670c5a7d434b482e1c06338dcc62 ]

During module is unloaded while a peer tc flow is still offloaded,
first the peer uplink rep profile is changed to a nic profile, and so
neigh encap lock is destroyed. Next during unload, the VF reps netdevs
are unregistered which causes the original non-peer tc flow to be deleted,
which deletes the peer flow. The peer flow deletion detaches the encap
entry and try to take the already destroyed encap lock, causing the
below trace.

Fix this by clearing peer flows during tc eswitch cleanup
(mlx5e_tc_esw_cleanup()).

Relevant trace:
[ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40
[ 4316.851897] Call Trace:
[ 4316.852481]  <TASK>
[ 4316.857214]  mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core]
[ 4316.858258]  mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core]
[ 4316.859134]  mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core]
[ 4316.859867]  clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core]
[ 4316.860605]  mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core]
[ 4316.862609]  __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core]
[ 4316.863394]  mlx5e_tc_del_flow+0x(/0x630 [mlx5_core]
[ 4316.864090]  mlx5e_flow_put+0x5f/0x100 [mlx5_core]
[ 4316.864771]  mlx5e_delete_flower+0x4de/0xa40 [mlx5_core]
[ 4316.865486]  tc_setup_cb_reoffload+0x20/0x80
[ 4316.865905]  fl_reoffload+0x47c/0x510 [cls_flower]
[ 4316.869181]  tcf_block_playback_offloads+0x91/0x1d0
[ 4316.869649]  tcf_block_unbind+0xe7/0x1b0
[ 4316.870049]  tcf_block_offload_cmd.isra.0+0x1ee/0x270
[ 4316.879266]  tcf_block_offload_unbind+0x61/0xa0
[ 4316.879711]  __tcf_block_put+0xa4/0x310

Fixes: de2adb12c9ef ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 519580ce09c1 ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c

index a71eaa0601149078671bd2dbccd2f2eb2b0f7282..73af062a878305f4333a294f73a4f872b4ae28e6 100644 (file)
@@ -5160,6 +5160,16 @@ err_tun_mapping:
 
 void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
 {
+       struct mlx5e_rep_priv *rpriv;
+       struct mlx5_eswitch *esw;
+       struct mlx5e_priv *priv;
+
+       rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
+       priv = netdev_priv(rpriv->netdev);
+       esw = priv->mdev->priv.eswitch;
+
+       mlx5e_tc_clean_fdb_peer_flows(esw);
+
        mlx5e_tc_tun_cleanup(uplink_priv->encap);
 
        mapping_destroy(uplink_priv->tunnel_enc_opts_mapping);