Laura Abbott [Tue, 28 Feb 2017 22:07:25 +0000 (14:07 -0800)]
crypto: testmgr - Pad aes_ccm_enc_tv_template vector
Running with KASAN and crypto tests currently gives
BUG: KASAN: global-out-of-bounds in __test_aead+0x9d9/0x2200 at addr ffffffff8212fca0
Read of size 16 by task cryptomgr_test/1107
Address belongs to variable 0xffffffff8212fca0
CPU: 0 PID: 1107 Comm: cryptomgr_test Not tainted 4.10.0+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
dump_stack+0x63/0x8a
kasan_report.part.1+0x4a7/0x4e0
? __test_aead+0x9d9/0x2200
? crypto_ccm_init_crypt+0x218/0x3c0 [ccm]
kasan_report+0x20/0x30
check_memory_region+0x13c/0x1a0
memcpy+0x23/0x50
__test_aead+0x9d9/0x2200
? kasan_unpoison_shadow+0x35/0x50
? alg_test_akcipher+0xf0/0xf0
? crypto_skcipher_init_tfm+0x2e3/0x310
? crypto_spawn_tfm2+0x37/0x60
? crypto_ccm_init_tfm+0xa9/0xd0 [ccm]
? crypto_aead_init_tfm+0x7b/0x90
? crypto_alloc_tfm+0xc4/0x190
test_aead+0x28/0xc0
alg_test_aead+0x54/0xd0
alg_test+0x1eb/0x3d0
? alg_find_test+0x90/0x90
? __sched_text_start+0x8/0x8
? __wake_up_common+0x70/0xb0
cryptomgr_test+0x4d/0x60
kthread+0x173/0x1c0
? crypto_acomp_scomp_free_ctx+0x60/0x60
? kthread_create_on_node+0xa0/0xa0
ret_from_fork+0x2c/0x40
Memory state around the buggy address: ffffffff8212fb80: 00 00 00 00 01 fa fa fa fa fa fa fa 00 00 00 00 ffffffff8212fc00: 00 01 fa fa fa fa fa fa 00 00 00 00 01 fa fa fa
>ffffffff8212fc80: fa fa fa fa 00 05 fa fa fa fa fa fa 00 00 00 00
^ ffffffff8212fd00: 01 fa fa fa fa fa fa fa 00 00 00 00 01 fa fa fa ffffffff8212fd80: fa fa fa fa 00 00 00 00 00 05 fa fa fa fa fa fa
This always happens on the same IV which is less than 16 bytes.
Per Ard,
"CCM IVs are 16 bytes, but due to the way they are constructed
internally, the final couple of bytes of input IV are dont-cares.
Apparently, we do read all 16 bytes, which triggers the KASAN errors."
Fix this by padding the IV with null bytes to be at least 16 bytes.
Cc: stable@vger.kernel.org Fixes: 0bc5a6c5c79a ("crypto: testmgr - Disable rfc4309 test and convert
test vectors") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Tue, 28 Feb 2017 14:36:57 +0000 (14:36 +0000)]
crypto: arm/crc32 - add build time test for CRC instruction support
The accelerated CRC32 module for ARM may use either the scalar CRC32
instructions, the NEON 64x64 to 128 bit polynomial multiplication
(vmull.p64) instruction, or both, depending on what the current CPU
supports.
However, this also requires support in binutils, and as it turns out,
versions of binutils exist that support the vmull.p64 instruction but
not the crc32 instructions.
So refactor the Makefile logic so that this module only gets built if
binutils has support for both.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Tue, 28 Feb 2017 14:36:56 +0000 (14:36 +0000)]
crypto: arm/crc32 - fix build error with outdated binutils
Annotate a vmov instruction with an explicit element size of 32 bits.
This is inferred by recent toolchains, but apparently, older versions
need some help figuring this out.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Mon, 27 Feb 2017 15:30:56 +0000 (15:30 +0000)]
crypto: ccm - move cbcmac input off the stack
Commit f15f05b0a5de ("crypto: ccm - switch to separate cbcmac driver")
refactored the CCM driver to allow separate implementations of the
underlying MAC to be provided by a platform. However, in doing so, it
moved some data from the linear region to the stack, which violates the
SG constraints when the stack is virtually mapped.
So move idata/odata back to the request ctx struct, of which we can
reasonably expect that it has been allocated using kmalloc() et al.
Reported-by: Johannes Berg <johannes@sipsolutions.net> Fixes: f15f05b0a5de ("crypto: ccm - switch to separate cbcmac driver") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
George Cherian [Wed, 15 Feb 2017 12:42:19 +0000 (12:42 +0000)]
crypto: cavium - Fix couple of static checker errors
Fix the following smatch errors
cptvf_reqmanager.c:333 do_post_process() warn: variable dereferenced
before check 'cptvf'
cptvf_main.c:825 cptvf_remove() error: we previously assumed 'cptvf'
could be null
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Milan Broz [Thu, 23 Feb 2017 07:38:26 +0000 (08:38 +0100)]
crypto: xts - Add ECB dependency
Since the
commit f1c131b45410a202eb45cc55980a7a9e4e4b4f40
crypto: xts - Convert to skcipher
the XTS mode is based on ECB, so the mode must select
ECB otherwise it can fail to initialize.
Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
pci_enable_msix has been long deprecated, but this driver adds a new
instance. Convert it to pci_alloc_irq_vectors and greatly simplify
the code, and make sure the prope code properly unwinds.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rob Rice [Tue, 14 Feb 2017 17:45:52 +0000 (12:45 -0500)]
crypto: brcm - Avoid double free in ahash_finup()
In Broadcom SPU driver, in case where incremental hash
is done in software in ahash_finup(), tmpbuf was freed
twice.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Tue, 14 Feb 2017 17:07:31 +0000 (18:07 +0100)]
crypto: cavium - fix Kconfig dependencies
The driver fails to build if MSI support is disabled:
In file included from /git/arm-soc/drivers/crypto/cavium/cpt/cptpf_main.c:18:0:
drivers/crypto/cavium/cpt/cptpf.h:57:20: error: array type has incomplete element type 'struct msix_entry'
struct msix_entry msix_entries[CPT_PF_MSIX_VECTORS];
^~~~~~~~~~~~
drivers/crypto/cavium/cpt/cptpf_main.c: In function 'cpt_enable_msix':
drivers/crypto/cavium/cpt/cptpf_main.c:344:8: error: implicit declaration of function 'pci_enable_msix';did you mean 'cpt_enable_msix'? [-Werror=implicit-function-declaration]
On the other hand, it doesn't seem to have any build dependency on ARCH_THUNDER,
so let's allow compile-testing to catch this kind of problem more easily.
The 64-bit dependency is needed for the use of readq/writeq.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Russell King [Mon, 13 Feb 2017 12:04:08 +0000 (12:04 +0000)]
hwrng: omap - update Kconfig help description
omap-rng also supports Marvell Armada 7k/8k SoCs, but no mention of this
is made in the help text, despite the dependency being added. Explicitly
mention these SoCs in the help description so people know that it covers
more than just TI SoCs.
Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 11 Feb 2017 19:25:22 +0000 (19:25 +0000)]
crypto: ccm - drop unnecessary minimum 32-bit alignment
The CCM driver forces 32-bit alignment even if the underlying ciphers
don't care about alignment. This is because crypto_xor() used to require
this, but since this is no longer the case, drop the hardcoded minimum
of 32 bits.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 11 Feb 2017 19:25:21 +0000 (19:25 +0000)]
crypto: ccm - honour alignmask of subordinate MAC cipher
The CCM driver was recently updated to defer the MAC part of the algorithm
to a dedicated crypto transform, and a template for instantiating such
transforms was added at the same time.
However, this new cbcmac template fails to take the alignmask of the
encapsulated cipher into account, which may result in buffer addresses
being passed down that are not sufficiently aligned.
So update the code to ensure that the digest buffer in the desc ctx
appears at a sufficiently aligned offset, and tweak the code so that all
calls to crypto_cipher_encrypt_one() operate on this buffer exclusively.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Fri, 10 Feb 2017 12:07:25 +0000 (14:07 +0200)]
crypto: caam - fix state buffer DMA (un)mapping
If we register the DMA API debug notification chain to
receive platform bus events:
dma_debug_add_bus(&platform_bus_type);
we start receiving warnings after a simple test like "modprobe caam_jr &&
modprobe caamhash && modprobe -r caamhash && modprobe -r caam_jr":
platform ffe301000.jr: DMA-API: device driver has pending DMA allocations while released from device [count=1938]
One of leaked entries details: [device address=0x0000000173fda090] [size=63 bytes] [mapped with DMA_TO_DEVICE] [mapped as single]
It turns out there are several issues with handling buf_dma (mapping of buffer
holding the previous chunk smaller than hash block size):
-detection of buf_dma mapping failure occurs too late, after a job descriptor
using that value has been submitted for execution
-dma mapping leak - unmapping is not performed in all places: for e.g.
in ahash_export or in most ahash_fin* callbacks (due to current back-to-back
implementation of buf_dma unmapping/mapping)
Fix these by:
-calling dma_mapping_error() on buf_dma right after the mapping and providing
an error code if needed
-unmapping buf_dma during the "job done" (ahash_done_*) callbacks
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Fri, 10 Feb 2017 12:07:23 +0000 (14:07 +0200)]
crypto: caam - fix error path for ctx_dma mapping failure
In case ctx_dma dma mapping fails, ahash_unmap_ctx() tries to
dma unmap an invalid address:
map_seq_out_ptr_ctx() / ctx_map_to_sec4_sg() -> goto unmap_ctx ->
-> ahash_unmap_ctx() -> dma unmap ctx_dma
There is also possible to reach ahash_unmap_ctx() with ctx_dma
uninitialzed or to try to unmap the same address twice.
Fix these by setting ctx_dma = 0 where needed:
-initialize ctx_dma in ahash_init()
-clear ctx_dma in case of mapping error (instead of holding
the error code returned by the dma map function)
-clear ctx_dma after each unmapping
Fixes: 32686d34f8fb6 ("crypto: caam - ensure that we clean up after an error") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Fri, 10 Feb 2017 12:07:22 +0000 (14:07 +0200)]
crypto: caam - fix DMA API leaks for multiple setkey() calls
setkey() callback may be invoked multiple times for the same tfm.
In this case, DMA API leaks are caused by shared descriptors
(and key for caamalg) being mapped several times and unmapped only once.
Fix this by performing mapping / unmapping only in crypto algorithm's
cra_init() / cra_exit() callbacks and sync_for_device in the setkey()
tfm callback.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Fri, 10 Feb 2017 12:07:21 +0000 (14:07 +0200)]
crypto: caam - don't dma_map key for hash algorithms
Shared descriptors for hash algorithms are small enough
for (split) keys to be inlined in all cases.
Since driver already does this, all what's left is to remove
unused ctx->key_dma.
Horia Geantă [Fri, 10 Feb 2017 12:07:19 +0000 (14:07 +0200)]
crypto: caam - replace sg_count() with sg_nents_for_len()
Replace internal sg_count() function and the convoluted logic
around it with the standard sg_nents_for_len() function.
src_nents, dst_nents now hold the number of SW S/G entries,
instead of the HW S/G table entries.
With this change, null (zero length) input data for AEAD case
needs to be handled in a visible way. req->src is no longer
(un)mapped, pointer address is set to 0 in SEQ IN PTR command.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Horia Geantă [Fri, 10 Feb 2017 12:07:17 +0000 (14:07 +0200)]
crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc()
HW S/G generation does not work properly when the following conditions
are met:
-src == dst
-src/dst is S/G
-IV is right before (contiguous with) the first src/dst S/G entry
since "iv_contig" is set to true (iv_contig is a misnomer here and
it actually refers to the whole output being contiguous)
Fix this by setting dst S/G nents equal to src S/G nents, instead of
leaving it set to init value (0).
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 9 Feb 2017 16:51:21 +0000 (17:51 +0100)]
crypto: atmel-sha - fix error management in atmel_sha_start()
This patch clarifies and fixes how errors should be handled by
atmel_sha_start().
For update operations, the previous code wrongly assumed that
(err != -EINPROGRESS) implies (err == 0). It's wrong because that doesn't
take the error cases (err < 0) into account.
This patch also adds many comments to detail all the possible returned
values and what should be done in each case.
Especially, when an error occurs, since atmel_sha_complete() has already
been called, hence releasing the hardware, atmel_sha_start() must not call
atmel_sha_finish_req() later otherwise atmel_sha_complete() would be
called a second time.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch fixes a previous patch: "crypto: atmel-sha - update request
queue management to make it more generic".
Indeed the patch above should have replaced the "return -EINVAL;" lines by
"return atmel_sha_complete(dd, -EINVAL);" but instead replaced them by a
simple call of "atmel_sha_complete(dd, -EINVAL);".
Hence all "return" instructions were missing.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rob Rice [Fri, 3 Feb 2017 17:55:33 +0000 (12:55 -0500)]
crypto: brcm - Add Broadcom SPU driver
Add Broadcom Secure Processing Unit (SPU) crypto driver for SPU
hardware crypto offload. The driver supports ablkcipher, ahash,
and aead symmetric crypto operations.
Signed-off-by: Steve Lin <steven.lin1@broadcom.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rob Rice [Fri, 3 Feb 2017 17:55:32 +0000 (12:55 -0500)]
crypto: brcm - DT documentation for Broadcom SPU hardware
Device tree documentation for Broadcom Secure Processing Unit
(SPU) crypto hardware.
Signed-off-by: Steve Lin <steven.lin1@broadcom.com> Signed-off-by: Rob Rice <rob.rice@broadcom.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
George Cherian [Tue, 7 Feb 2017 14:51:15 +0000 (14:51 +0000)]
crypto: cavium - Enable CPT options crypto for build
Add the CPT options in crypto Kconfig and update the
crypto Makefile
Update the MAINTAINERS file too.
Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
George Cherian [Tue, 7 Feb 2017 14:51:14 +0000 (14:51 +0000)]
crypto: cavium - Add the Virtual Function driver for CPT
Enable the CPT VF driver. CPT is the cryptographic Acceleration Unit
in Octeon-tx series of processors.
Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
George Cherian [Tue, 7 Feb 2017 14:51:13 +0000 (14:51 +0000)]
crypto: cavium - Add Support for Octeon-tx CPT Engine
Enable the Physical Function driver for the Cavium Crypto Engine (CPT)
found in Octeon-tx series of SoC's. CPT is the Cryptographic Accelaration
Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
asymmetric engines (AEs).
Signed-off-by: George Cherian <george.cherian@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
David Daney [Mon, 6 Feb 2017 22:28:46 +0000 (14:28 -0800)]
hwrng: cavium - Use per device name to allow for multiple devices.
Systems containing the Cavium HW RNG may have one device per NUMA
node. A typical configuration is a 2-node NUMA system, which results
in 2 RNG devices. The hwrng subsystem refuses (and rightly so) to
register more than one device with he same name, so we get failure
messages on these systems.
Make the hwrng name unique by including the underlying device name.
Also remove spaces from the name to make it possible to switch devices
via the sysfs knobs.
Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Mon, 6 Feb 2017 12:32:16 +0000 (13:32 +0100)]
crypto: atmel - fix 64-bit build warnings
When we enable COMPILE_TEST building for the Atmel sha and tdes implementations,
we run into a couple of warnings about incorrect format strings, e.g.
In file included from include/linux/platform_device.h:14:0,
from drivers/crypto/atmel-sha.c:24:
drivers/crypto/atmel-sha.c: In function 'atmel_sha_xmit_cpu':
drivers/crypto/atmel-sha.c:571:19: error: format '%d' expects argument of type 'int', but argument 6 has type 'size_t {aka long unsigned int}' [-Werror=format=]
In file included from include/linux/printk.h:6:0,
from include/linux/kernel.h:13,
from drivers/crypto/atmel-tdes.c:17:
drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_dma_stop':
include/linux/kern_levels.h:4:18: error: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'size_t {aka long unsigned int}' [-Werror=format=]
These are all fixed by using the "%z" modifier for size_t data.
There are also a few uses of min()/max() with incompatible types:
drivers/crypto/atmel-tdes.c: In function 'atmel_tdes_crypt_start':
drivers/crypto/atmel-tdes.c:528:181: error: comparison of distinct pointer types lacks a cast [-Werror]
Where possible, we should use consistent types here, otherwise we can use
min_t()/max_t() to get well-defined behavior without a warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Mon, 6 Feb 2017 12:32:15 +0000 (13:32 +0100)]
crypto: atmel - refine Kconfig dependencies
With the new authenc support, we get a harmless Kconfig warning:
warning: (CRYPTO_DEV_ATMEL_AUTHENC) selects CRYPTO_DEV_ATMEL_SHA which has unmet direct dependencies (CRYPTO && CRYPTO_HW && ARCH_AT91)
The problem is that each of the options has slightly different dependencies,
although they all seem to want the same thing: allow building for real AT91
targets that actually have the hardware, and possibly for compile testing.
This makes all four options consistent: instead of depending on a particular
dmaengine implementation, we depend on the ARM platform, CONFIG_COMPILE_TEST
as an alternative when that is turned off. This makes the 'select' statements
work correctly.
Fixes: 89a82ef87e01 ("crypto: atmel-authenc - add support to authenc(hmac(shaX), Y(aes)) modes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sun, 5 Feb 2017 10:06:12 +0000 (10:06 +0000)]
crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
Instead of unconditionally forcing 4 byte alignment for all generic
chaining modes that rely on crypto_xor() or crypto_inc() (which may
result in unnecessary copying of data when the underlying hardware
can perform unaligned accesses efficiently), make those functions
deal with unaligned input explicitly, but only if the Kconfig symbol
HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.
For crypto_inc(), this simply involves making the 4-byte stride
conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
it typically operates on 16 byte buffers.
For crypto_xor(), an algorithm is implemented that simply runs through
the input using the largest strides possible if unaligned accesses are
allowed. If they are not, an optimal sequence of memory accesses is
emitted that takes the relative alignment of the input buffers into
account, e.g., if the relative misalignment of dst and src is 4 bytes,
the entire xor operation will be completed using 4 byte loads and stores
(modulo unaligned bits at the start and end). Note that all expressions
involving misalign are simply eliminated by the compiler when
HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Arnd Bergmann [Fri, 3 Feb 2017 22:33:23 +0000 (23:33 +0100)]
crypto: improve gcc optimization flags for serpent and wp512
An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:
crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.
It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).
Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.
The four columns are:
default: -O2
press: -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)
Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.
I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.
On ARMv8 implementations that do not support the Crypto Extensions,
such as the Raspberry Pi 3, the CCM driver falls back to the generic
table based AES implementation to perform the MAC part of the
algorithm, which is slow and not time invariant. So add a CBCMAC
implementation to the shared glue code between NEON AES and Crypto
Extensions AES, so that it can be used instead now that the CCM
driver has been updated to look for CBCMAC implementations other
than the one it supplies itself.
Also, given how these algorithms mostly only differ in the way the key
handling and the final encryption are implemented, expose CMAC and XCBC
algorithms as well based on the same core update code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Fri, 3 Feb 2017 14:49:36 +0000 (14:49 +0000)]
crypto: ccm - switch to separate cbcmac driver
Update the generic CCM driver to defer CBC-MAC processing to a
dedicated CBC-MAC ahash transform rather than open coding this
transform (and much of the associated scatterwalk plumbing) in
the CCM driver itself.
This cleans up the code considerably, but more importantly, it allows
the use of alternative CBC-MAC implementations that don't suffer from
performance degradation due to significant setup time (e.g., the NEON
based AES code needs to enable/disable the NEON, and load the S-box
into 16 SIMD registers, which cannot be amortized over the entire input
when using the cipher interface)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Fri, 3 Feb 2017 14:49:35 +0000 (14:49 +0000)]
crypto: testmgr - add test cases for cbcmac(aes)
In preparation of splitting off the CBC-MAC transform in the CCM
driver into a separate algorithm, define some test cases for the
AES incarnation of cbcmac.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Thu, 2 Feb 2017 16:37:40 +0000 (16:37 +0000)]
crypto: aes - add generic time invariant AES cipher
Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.
For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.
For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.
This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Thu, 2 Feb 2017 15:58:57 +0000 (15:58 +0000)]
crypto: aes-generic - drop alignment requirement
The generic AES code exposes a 32-bit align mask, which forces all
users of the code to use temporary buffers or take other measures to
ensure the alignment requirement is adhered to, even on architectures
that don't care about alignment for software algorithms such as this
one.
So drop the align mask, and fix the code to use get_unaligned_le32()
where appropriate, which will resolve to whatever is optimal for the
architecture.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The flusher and regular multi-buffer computation via mcryptd may race with another.
Add here a lock and turn off interrupt to to access multi-buffer
computation state cstate->mgr before a round of computation. This should
prevent the flusher code jumping in.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Wed, 1 Feb 2017 15:35:40 +0000 (15:35 +0000)]
crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers
The PMULL based CRC32 implementation already contains code based on the
separate, optional CRC32 instructions to fallback to when operating on
small quantities of data. We can expose these routines directly on systems
that lack the 64x64 PMULL instructions but do implement the CRC32 ones,
which makes the driver that is based solely on those CRC32 instructions
redundant. So remove it.
Note that this aligns arm64 with ARM, whose accelerated CRC32 driver
also combines the CRC32 extension based and the PMULL based versions.
Ard Biesheuvel [Thu, 2 Feb 2017 11:38:56 +0000 (11:38 +0000)]
crypto: arm/aes - don't use IV buffer to return final keystream block
The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.
However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.
So use a separate buffer to return the final keystream block.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Thu, 2 Feb 2017 11:38:55 +0000 (11:38 +0000)]
crypto: arm64/aes - don't use IV buffer to return final keystream block
The arm64 bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.
However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.
So use a separate buffer to return the final keystream block.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:39 +0000 (23:25 +0000)]
crypto: arm64/aes - replace scalar fallback with plain NEON fallback
The new bitsliced NEON implementation of AES uses a fallback in two
places: CBC encryption (which is strictly sequential, whereas this
driver can only operate efficiently on 8 blocks at a time), and the
XTS tweak generation, which involves encrypting a single AES block
with a different key schedule.
The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback,
given that it is faster than scalar on low end cores (which is what
the NEON implementations target, since high end cores have dedicated
instructions for AES), and shows similar behavior in terms of D-cache
footprint and sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:38 +0000 (23:25 +0000)]
crypto: arm64/aes-neon-blk - tweak performance for low end cores
The non-bitsliced AES implementation using the NEON is highly sensitive
to micro-architectural details, and, as it turns out, the Cortex-A53 on
the Raspberry Pi 3 is a core that can benefit from this code, given that
its scalar AES performance is abysmal (32.9 cycles per byte).
The new bitsliced AES code manages 19.8 cycles per byte on this core,
but can only operate on 8 blocks at a time, which is not supported by
all chaining modes. With a bit of tweaking, we can get the plain NEON
code to run at 22.0 cycles per byte, making it useful for sequential
modes like CBC encryption. (Like bitsliced NEON, the plain NEON
implementation does not use any lookup tables, which makes it easy on
the D-cache, and invulnerable to cache timing attacks)
So tweak the plain NEON AES code to use tbl instructions rather than
shl/sri pairs, and to avoid the need to reload permutation vectors or
other constants from memory in every round. Also, improve the decryption
performance by switching to 16x8 pmul instructions for the performing
the multiplications in GF(2^8).
To allow the ECB and CBC encrypt routines to be reused by the bitsliced
NEON code in a subsequent patch, export them from the module.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:36 +0000 (23:25 +0000)]
crypto: arm64/aes - avoid literals for cross-module symbol references
Using simple adrp/add pairs to refer to the AES lookup tables exposed by
the generic AES driver (which could be loaded far away from this driver
when KASLR is in effect) was unreliable at module load time before commit 41c066f2c4d4 ("arm64: assembler: make adr_l work in modules under KASLR"),
which is why the AES code used literals instead.
So now we can get rid of the literals, and switch to the adr_l macro.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:35 +0000 (23:25 +0000)]
crypto: arm64/chacha20 - remove cra_alignmask
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:34 +0000 (23:25 +0000)]
crypto: arm64/aes-blk - remove cra_alignmask
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:33 +0000 (23:25 +0000)]
crypto: arm64/aes-ce-ccm - remove cra_alignmask
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:32 +0000 (23:25 +0000)]
crypto: arm/chacha20 - remove cra_alignmask
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Sat, 28 Jan 2017 23:25:31 +0000 (23:25 +0000)]
crypto: arm/aes-ce - remove cra_alignmask
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When VERBOSE_DEBUG is defined and SHA_FLAGS_DUMP_REG flag is set in
dd->flags, this patch prints the register names and values when performing
IO accesses.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:56 +0000 (17:07 +0100)]
crypto: atmel-authenc - add support to authenc(hmac(shaX), Y(aes)) modes
This patchs allows to combine the AES and SHA hardware accelerators on
some Atmel SoCs. Doing so, AES blocks are only written to/read from the
AES hardware. Those blocks are also transferred from the AES to the SHA
accelerator internally, without additionnal accesses to the system busses.
Hence, the AES and SHA accelerators work in parallel to process all the
data blocks, instead of serializing the process by (de)crypting those
blocks first then authenticating them after like the generic
crypto/authenc.c driver does.
Of course, both the AES and SHA hardware accelerators need to be available
before we can start to process the data blocks. Hence we use their crypto
request queue to synchronize both drivers.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:55 +0000 (17:07 +0100)]
crypto: atmel-aes - fix atmel_aes_handle_queue()
This patch fixes the value returned by atmel_aes_handle_queue(), which
could have been wrong previously when the crypto request was started
synchronously but became asynchronous during the ctx->start() call.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch simply defines a helper function to test the 'Data Ready' flag
of the Status Register. It also gives a chance for the crypto request to
be processed synchronously if this 'Data Ready' flag is already set when
polling the Status Register. Indeed, running synchronously avoid the
latency of the 'Data Ready' interrupt.
When the 'Data Ready' flag has not been set yet, we enable the associated
interrupt and resume processing the crypto request asynchronously from the
'done' task just as before.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:49 +0000 (17:07 +0100)]
crypto: atmel-sha - redefine SHA_FLAGS_SHA* flags to match SHA_MR_ALGO_SHA*
This patch modifies the SHA_FLAGS_SHA* flags: those algo flags are now
organized as values of a single bitfield instead of individual bits.
This allows to reduce the number of bits needed to encode all possible
values. Also the new values match the SHA_MR_ALGO_SHA* values hence
the algorithm bitfield of the SHA_MR register could simply be set with:
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:48 +0000 (17:07 +0100)]
crypto: atmel-sha - make atmel_sha_done_task more generic
This patch is a transitional patch. It updates atmel_sha_done_task() to
make it more generic. Indeed, it adds a new .resume() member in the
atmel_sha_dev structure. This hook is called from atmel_sha_done_task()
to resume processing an asynchronous request.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:47 +0000 (17:07 +0100)]
crypto: atmel-sha - update request queue management to make it more generic
This patch is a transitional patch. It splits the atmel_sha_handle_queue()
function. Now atmel_sha_handle_queue() only manages the request queue and
calls a new .start() hook from the atmel_sha_ctx structure.
This hook allows to implement different kind of requests still handled by
a single queue.
Also when the req parameter of atmel_sha_handle_queue() refers to the very
same request as the one returned by crypto_dequeue_request(), the queue
management now gives a chance to this crypto request to be handled
synchronously, hence reducing latencies. The .start() hook returns 0 if
the crypto request was handled synchronously and -EINPROGRESS if the
crypto request still need to be handled asynchronously.
Besides, the new .is_async member of the atmel_sha_dev structure helps
tagging this asynchronous state. Indeed, the req->base.complete() callback
should not be called if the crypto request is handled synchronously.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cyrille Pitchen [Thu, 26 Jan 2017 16:07:46 +0000 (17:07 +0100)]
crypto: atmel-sha - create function to get an Atmel SHA device
This is a transitional patch: it creates the atmel_sha_find_dev() function,
which will be used in further patches to share the source code responsible
for finding a Atmel SHA device.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rabin Vincent [Thu, 26 Jan 2017 15:33:00 +0000 (16:33 +0100)]
crypto: doc - Fix hash export state information
The documentation states that crypto_ahash_reqsize() provides the size
of the state structure used by crypto_ahash_export(). But it's actually
crypto_ahash_statesize() which provides this size.
Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Wed, 1 Feb 2017 14:17:39 +0000 (22:17 +0800)]
crypto: aesni - Fix failure when pcbc module is absent
When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load. However, the pcbc module may not
be present for reasons such as its absence on initramfs. This patch
allows the aesni to function even if the pcbc module is enabled but
not present.
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Gary R Hook [Fri, 27 Jan 2017 23:09:04 +0000 (17:09 -0600)]
crypto: ccp - Fix double add when creating new DMA command
Eliminate a double-add by creating a new list to manage
command descriptors when created; move the descriptor to
the pending list when the command is submitted.
Cc: <stable@vger.kernel.org> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Gary R Hook [Fri, 27 Jan 2017 21:28:45 +0000 (15:28 -0600)]
crypto: ccp - Fix DMA operations when IOMMU is enabled
An I/O page fault occurs when the IOMMU is enabled on a
system that supports the v5 CCP. DMA operations use a
Request ID value that does not match what is expected by
the IOMMU, resulting in the I/O page fault. Setting the
Request ID value to 0 corrects this issue.
Cc: <stable@vger.kernel.org> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Li Zhong [Fri, 20 Jan 2017 08:35:33 +0000 (16:35 +0800)]
crypto: vmx - disable preemption to enable vsx in aes_ctr.c
Some preemptible check warnings were reported from enable_kernel_vsx(). This
patch disables preemption in aes_ctr.c before enabling vsx, and they are now
consistent with other files in the same directory.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>