]> git.baikalelectronics.ru Git - kernel.git/commitdiff
ext4: fix bug_on ext4_mb_use_inode_pa
authorBaokun Li <libaokun1@huawei.com>
Sat, 28 May 2022 11:00:15 +0000 (19:00 +0800)
committerTheodore Ts'o <tytso@mit.edu>
Sat, 18 Jun 2022 23:35:43 +0000 (19:35 -0400)
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
 ext4_mb_new_blocks+0x9df/0x5d30
 ext4_ext_map_blocks+0x1803/0x4d80
 ext4_map_blocks+0x3a4/0x1a10
 ext4_writepages+0x126d/0x2c30
 do_writepages+0x7f/0x1b0
 __filemap_fdatawrite_range+0x285/0x3b0
 file_write_and_wait_range+0xb1/0x140
 ext4_sync_file+0x1aa/0xca0
 vfs_fsync_range+0xfb/0x260
 do_fsync+0x48/0xa0
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
do_fsync
 vfs_fsync_range
  ext4_sync_file
   file_write_and_wait_range
    __filemap_fdatawrite_range
     do_writepages
      ext4_writepages
       mpage_map_and_submit_extent
        mpage_map_one_extent
         ext4_map_blocks
          ext4_mb_new_blocks
           ext4_mb_normalize_request
            >>> start + size <= ac->ac_o_ex.fe_logical
           ext4_mb_regular_allocator
            ext4_mb_simple_scan_group
             ext4_mb_use_best_found
              ext4_mb_new_preallocation
               ext4_mb_new_inode_pa
                ext4_mb_use_inode_pa
                 >>> set ac->ac_b_ex.fe_len <= 0
           ext4_mb_mark_diskspace_used
            >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);

we can easily reproduce this problem with the following commands:
`fallocate -l100M disk`
`mkfs.ext4 -b 1024 -g 256 disk`
`mount disk /mnt`
`fsstress -d /mnt -l 0 -n 1000 -p 1`

The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.

Cc: stable@kernel.org
Fixes: 8d80954c43e8 ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
fs/ext4/mballoc.c

index 9f12f29bc346842cdc0e8c5974608d7cfc752480..4d3740fdff90f558fbb3aff096a5efc70ec8e209 100644 (file)
@@ -4104,6 +4104,15 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
        size = size >> bsbits;
        start = start_off >> bsbits;
 
+       /*
+        * For tiny groups (smaller than 8MB) the chosen allocation
+        * alignment may be larger than group size. Make sure the
+        * alignment does not move allocation to a different group which
+        * makes mballoc fail assertions later.
+        */
+       start = max(start, rounddown(ac->ac_o_ex.fe_logical,
+                       (ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));
+
        /* don't cover already allocated blocks in selected range */
        if (ar->pleft && start <= ar->lleft) {
                size -= ar->lleft + 1 - start;