Implement support for 1 & 2 byte cmpxchg() using read-modify-write atop
a 4 byte cmpxchg(). This allows us to support these atomic operations
despite the MIPS ISA only providing 4 & 8 byte atomic operations.
This is required in order to support queued rwlocks (qrwlock) in a later
patch, since these make use of a 1 byte cmpxchg() in their slow path.