In case of devices with at most 64 blocks, the digestion of consecutive
eras uses the writeset of the first era as the writeset of all eras to
digest, leading to lost writes. That is, we lose the information about
what blocks were written during the affected eras.
The digestion code uses a dm_disk_bitset object to access the archived
writesets. This structure includes a one word (64-bit) cache to reduce
the number of array lookups.
This structure is initialized only once, in metadata_digest_start(),
when we kick off digestion.
But, when we insert a new writeset into the writeset tree, before the
digestion of the previous writeset is done, or equivalently when there
are multiple writesets in the writeset tree to digest, then all these
writesets are digested using the same cache and the cache is not
re-initialized when moving from one writeset to the next.
For devices with more than 64 blocks, i.e., the size of the cache, the
cache is indirectly invalidated when we move to a next set of blocks, so
we avoid the bug.
But for devices with at most 64 blocks we end up using the same cached
data for digesting all archived writesets, i.e., the cache is loaded
when digesting the first writeset and it never gets reloaded, until the
digestion is done.
As a result, the writeset of the first era to digest is used as the
writeset of all the following archived eras, leading to lost writes.
Fix this by reinitializing the dm_disk_bitset structure, and thus
invalidating the cache, every time the digestion code starts digesting a
new writeset.
Fixes: b83b01ee643a86 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
ws_unpack(&disk, &d->writeset);
d->value = cpu_to_le32(key);
+ /*
+ * We initialise another bitset info to avoid any caching side effects
+ * with the previous one.
+ */
+ dm_disk_bitset_init(md->tm, &d->info);
+
d->nr_bits = min(d->writeset.nr_bits, md->nr_blocks);
d->current_bit = 0;
d->step = metadata_digest_transcribe_writeset;
return 0;
memset(d, 0, sizeof(*d));
-
- /*
- * We initialise another bitset info to avoid any caching side
- * effects with the previous one.
- */
- dm_disk_bitset_init(md->tm, &d->info);
d->step = metadata_digest_lookup_writeset;
return 0;