jbd2: fix a potential race while discarding reserved buffers after an abort
we got issue as follows:
[ 72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
[ 72.826847] EXT4-fs (sda): Remounting filesystem read-only
fallocate: fallocate failed: Read-only file system
[ 74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
[ 74.793597] ------------[ cut here ]------------
[ 74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
[ 74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-
20220315-dirty #150
[ 74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
[ 74.801971] RSP: 0018:
ffffa828c24a3cb8 EFLAGS:
00010202
[ 74.802694] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 74.803601] RDX:
0000000000000001 RSI:
ffff9cfefe725d90 RDI:
ffff9cfefe725d90
[ 74.804554] RBP:
ffff9cfefe725d90 R08:
0000000000000000 R09:
ffffa828c24a3b20
[ 74.805471] R10:
0000000000000001 R11:
0000000000000001 R12:
ffff9cfefe725d90
[ 74.806385] R13:
ffff9cfefe725d98 R14:
0000000000000000 R15:
ffff9cfe833a4d00
[ 74.807301] FS:
0000000000000000(0000) GS:
ffff9d01afb00000(0000) knlGS:
0000000000000000
[ 74.808338] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 74.809084] CR2:
00007f2b81bf4000 CR3:
0000000100056000 CR4:
00000000000006e0
[ 74.810047] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 74.810981] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 74.811897] Call Trace:
[ 74.812241] <TASK>
[ 74.812566] __jbd2_journal_refile_buffer+0x12f/0x180
[ 74.813246] jbd2_journal_refile_buffer+0x4c/0xa0
[ 74.813869] jbd2_journal_commit_transaction.cold+0xa1/0x148
[ 74.817550] kjournald2+0xf8/0x3e0
[ 74.819056] kthread+0x153/0x1c0
[ 74.819963] ret_from_fork+0x22/0x30
Above issue may happen as follows:
write truncate kjournald2
generic_perform_write
ext4_write_begin
ext4_walk_page_buffers
do_journal_get_write_access ->add BJ_Reserved list
ext4_journalled_write_end
ext4_walk_page_buffers
write_end_fn
ext4_handle_dirty_metadata
***************JBD2 ABORT**************
jbd2_journal_dirty_metadata
-> return -EROFS, jh in reserved_list
jbd2_journal_commit_transaction
while (commit_transaction->t_reserved_list)
jh = commit_transaction->t_reserved_list;
truncate_pagecache_range
do_invalidatepage
ext4_journalled_invalidatepage
jbd2_journal_invalidatepage
journal_unmap_buffer
__dispose_buffer
__jbd2_journal_unfile_buffer
jbd2_journal_put_journal_head ->put last ref_count
__journal_remove_journal_head
bh->b_private = NULL;
jh->b_bh = NULL;
jbd2_journal_refile_buffer(journal, jh);
bh = jh2bh(jh);
->bh is NULL, later will trigger null-ptr-deref
journal_free_journal_head(jh);
After commit
2a629849c6e3, we no longer hold the j_state_lock while
iterating over the list of reserved handles in
jbd2_journal_commit_transaction(). This potentially allows the
journal_head to be freed by journal_unmap_buffer while the commit
codepath is also trying to free the BJ_Reserved buffers. Keeping
j_state_lock held while trying extends hold time of the lock
minimally, and solves this issue.
Fixes: 2a629849c6e3("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>