I made a mistake with my previous fix, I assumed that we didn't need to
mess with the reloc roots once we were out of the part of relocation where
we are actually moving the extents.
The subtle thing that I missed is that btrfs_init_reloc_root() also
updates the last_trans for the reloc root when we do
btrfs_record_root_in_trans() for the corresponding fs_root. I've added a
comment to make sure future me doesn't make this mistake again.
This showed up as a WARN_ON() in btrfs_copy_root() because our
last_trans didn't == the current transid. This could happen if we
snapshotted a fs root with a reloc root after we set
rc->create_reloc_tree = 0, but before we actually merge the reloc root.
Worth mentioning that the regression produced the following warning
when running snapshot creation and balance in parallel:
BTRFS info (device sdc): relocating block group
30408704 flags metadata|dup
------------[ cut here ]------------
WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191 btrfs_copy_root+0x26f/0x430 [btrfs]
CPU: 0 PID: 12823 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs]
RSP: 0018:
ffffb96e044279b8 EFLAGS:
00010202
RAX:
0000000000000009 RBX:
ffff9da70bf61000 RCX:
ffffb96e04427a48
RDX:
ffff9da733a770c8 RSI:
ffff9da70bf61000 RDI:
ffff9da694163818
RBP:
ffff9da733a770c8 R08:
fffffffffffffff8 R09:
0000000000000002
R10:
ffffb96e044279a0 R11:
0000000000000000 R12:
ffff9da694163818
R13:
fffffffffffffff8 R14:
ffff9da6d2512000 R15:
ffff9da714cdac00
FS:
00007fdeacf328c0(0000) GS:
ffff9da735e00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055a2a5b8a118 CR3:
00000001eed78002 CR4:
00000000003606f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
? create_reloc_root+0x49/0x2b0 [btrfs]
? kmem_cache_alloc_trace+0xe5/0x200
create_reloc_root+0x8b/0x2b0 [btrfs]
btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs]
create_pending_snapshot+0x610/0x1010 [btrfs]
create_pending_snapshots+0xa8/0xd0 [btrfs]
btrfs_commit_transaction+0x4c7/0xc50 [btrfs]
? btrfs_mksubvol+0x3cd/0x560 [btrfs]
btrfs_mksubvol+0x455/0x560 [btrfs]
__btrfs_ioctl_snap_create+0x15f/0x190 [btrfs]
btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs]
? mem_cgroup_commit_charge+0x6e/0x540
btrfs_ioctl+0x12d8/0x3760 [btrfs]
? do_raw_spin_unlock+0x49/0xc0
? _raw_spin_unlock+0x29/0x40
? __handle_mm_fault+0x11b3/0x14b0
? ksys_ioctl+0x92/0xb0
ksys_ioctl+0x92/0xb0
? trace_hardirqs_off_thunk+0x1a/0x1c
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5c/0x280
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fdeabd3bdd7
Fixes: 2abc726ab4b8 ("btrfs: do not init a reloc root if we aren't relocating")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
int clear_rsv = 0;
int ret;
- if (!rc || !rc->create_reloc_tree ||
- root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+ if (!rc)
return 0;
/*
if (reloc_root_is_dead(root))
return 0;
+ /*
+ * This is subtle but important. We do not do
+ * record_root_in_transaction for reloc roots, instead we record their
+ * corresponding fs root, and then here we update the last trans for the
+ * reloc root. This means that we have to do this for the entire life
+ * of the reloc root, regardless of which stage of the relocation we are
+ * in.
+ */
if (root->reloc_root) {
reloc_root = root->reloc_root;
reloc_root->last_trans = trans->transid;
return 0;
}
+ /*
+ * We are merging reloc roots, we do not need new reloc trees. Also
+ * reloc trees never need their own reloc tree.
+ */
+ if (!rc->create_reloc_tree ||
+ root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+ return 0;
+
if (!trans->reloc_reserved) {
rsv = trans->block_rsv;
trans->block_rsv = rc->block_rsv;