Implement 1 & 2 byte xchg() using read-modify-write atop a 4 byte
cmpxchg(). This allows us to support these atomic operations despite the
MIPS ISA only providing for 4 & 8 byte atomic operations.
This is required in order to support queued spinlocks (qspinlock) in a
later patch, since these make use of a 2 byte xchg() in their slow path.