Apply some optimizations to speed up bitstream loading
(both for full and split periph/core bitstreams):
* Change the size of the first fs read, so that all the subsequent
reads are aligned to a specific value (called MAX_FIRST_LOAD_SIZE).
This value was chosen so that in subsequent reads the fat fs driver
doesn't have to allocate a temporary buffer in get_contents
(assuming 8KiB clusters).
* Change the buffer size to a larger value when reading to ddr
(but not too large, because large transfers cause a stack overflow
in the dwmmc driver).
Signed-off-by: Paweł Anikiel <pan@semihalf.com> Reviewed-by: Simon Glass <sjg@chromium.org>