Arnd Bergmann [Tue, 2 Oct 2018 20:56:19 +0000 (22:56 +0200)]
crypto: caam/qi2 - avoid double export
Both the caam ctrl file and dpaa2_caam export a couple of flags. They
use an #ifdef check to make sure that each flag is only built once,
but this fails if they are both loadable modules:
WARNING: drivers/crypto/caam/dpaa2_caam: 'caam_little_end' exported twice. Previous export was in drivers/crypto/caam/caam.ko
WARNING: drivers/crypto/caam/dpaa2_caam: 'caam_imx' exported twice. Previous export was in drivers/crypto/caam/caam.ko
Change the #ifdef to an IS_ENABLED() check in order to make it work in
all configurations. It may be better to redesign this aspect of the
two drivers in a cleaner way.
Radu Solea [Tue, 2 Oct 2018 19:01:52 +0000 (19:01 +0000)]
crypto: mxs-dcp - Fix AES issues
The DCP driver does not obey cryptlen, when doing android CTS this
results in passing to hardware input stream lengths which are not
multiple of block size.
Add a check to prevent future erroneous stream lengths from reaching the
hardware and adjust the scatterlist walking code to obey cryptlen.
Also properly copy-out the IV for chaining.
Signed-off-by: Radu Solea <radu.solea@nxp.com> Signed-off-by: Franck LENORMAND <franck.lenormand@nxp.com> Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Radu Solea [Tue, 2 Oct 2018 19:01:50 +0000 (19:01 +0000)]
crypto: mxs-dcp - Fix SHA null hashes and output length
DCP writes at least 32 bytes in the output buffer instead of hash length
as documented. Add intermediate buffer to prevent write out of bounds.
When requested to produce null hashes DCP fails to produce valid output.
Add software workaround to bypass hardware and return valid output.
Signed-off-by: Radu Solea <radu.solea@nxp.com> Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dan Douglass [Tue, 2 Oct 2018 19:01:48 +0000 (19:01 +0000)]
crypto: mxs-dcp - Implement sha import/export
The mxs-dcp driver fails to probe if sha1/sha256 are supported:
[ 2.455404] mxs-dcp 80028000.dcp: Failed to register sha1 hash!
[ 2.464042] mxs-dcp: probe of 80028000.dcp failed with error -22
This happens because since commit 8996eafdcbad ("crypto: ahash - ensure
statesize is non-zero") import/export is mandatory and ahash_prepare_alg
fails on statesize == 0.
A set of dummy import/export functions were implemented in commit 9190b6fd5db9 ("crypto: mxs-dcp - Add empty hash export and import") but
statesize is still zero and the driver fails to probe. That change was
apparently part of some unrelated refactoring.
Fix by actually implementing import/export.
Signed-off-by: Dan Douglass <dan.douglass@nxp.com> Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Mon, 1 Oct 2018 08:36:38 +0000 (10:36 +0200)]
crypto: aegis/generic - fix for big endian systems
Use the correct __le32 annotation and accessors to perform the
single round of AES encryption performed inside the AEGIS transform.
Otherwise, tcrypt reports:
alg: aead: Test 1 failed on encryption for aegis128-generic 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
alg: aead: Test 1 failed on encryption for aegis128l-generic 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
alg: aead: Test 1 failed on encryption for aegis256-generic 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
Ard Biesheuvel [Mon, 1 Oct 2018 08:36:37 +0000 (10:36 +0200)]
crypto: morus/generic - fix for big endian systems
Omit the endian swabbing when folding the lengths of the assoc and
crypt input buffers into the state to finalize the tag. This is not
necessary given that the memory representation of the state is in
machine native endianness already.
This fixes an error reported by tcrypt running on a big endian system:
alg: aead: Test 2 failed on encryption for morus640-generic 00000000: a8 30 ef fb e6 26 eb 23 b0 87 dd 98 57 f3 e1 4b 00000010: 21
alg: aead: Test 2 failed on encryption for morus1280-generic 00000000: 88 19 1b fb 1c 29 49 0e ee 82 2f cb 97 a6 a5 ee 00000010: 5f
crypto: lrw - fix rebase error after out of bounds fix
Due to an unfortunate interaction between commit fbe1a850b3b1
("crypto: lrw - Fix out-of bounds access on counter overflow") and
commit c778f96bf347 ("crypto: lrw - Optimize tweak computation"),
we ended up with a version of next_index() that always returns 127.
Fixes: c778f96bf347 ("crypto: lrw - Optimize tweak computation") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: cavium/nitrox - use pci_alloc_irq_vectors() while enabling MSI-X.
replace pci_enable_msix_exact() with pci_alloc_irq_vectors(). get the
required vector count from pci_msix_vec_count().
use struct nitrox_q_vector as the argument to tasklets.
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com> Reviewed-by: Gadam Sreerama <sgadam@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use node based allocations for queues. consider the dma address
alignment changes, while calculating the queue base address.
added checks in cleanup functions. Minor changes to queue variable names
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com> Reviewed-by: Gadam Sreerama <sgadam@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto/morus(640,1280) - make crypto_...-algs static
sparse complains thusly:
CHECK arch/x86/crypto/morus640-sse2-glue.c
arch/x86/crypto/morus640-sse2-glue.c:38:1: warning: symbol 'crypto_morus640_sse2_algs' was not declared. Should it be static?
CHECK arch/x86/crypto/morus1280-sse2-glue.c
arch/x86/crypto/morus1280-sse2-glue.c:38:1: warning: symbol 'crypto_morus1280_sse2_algs' was not declared. Should it be static?
CHECK arch/x86/crypto/morus1280-avx2-glue.c
arch/x86/crypto/morus1280-avx2-glue.c:38:1: warning: symbol 'crypto_morus1280_avx2_algs' was not declared. Should it be static?
and sparse is correct - these don't need to be global and polluting the namespace.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This driver implements a (part of a) network driver, and fails to
build if we have turned off networking support:
drivers/crypto/caam/caamalg_qi2.o: In function `dpaa2_caam_fqdan_cb':
caamalg_qi2.c:(.text+0x577c): undefined reference to `napi_schedule_prep'
caamalg_qi2.c:(.text+0x578c): undefined reference to `__napi_schedule_irqoff'
drivers/crypto/caam/caamalg_qi2.o: In function `dpaa2_dpseci_poll':
caamalg_qi2.c:(.text+0x59b8): undefined reference to `napi_complete_done'
drivers/crypto/caam/caamalg_qi2.o: In function `dpaa2_caam_remove':
caamalg_qi2.c:(.text.unlikely+0x4e0): undefined reference to `napi_disable'
caamalg_qi2.c:(.text.unlikely+0x4e8): undefined reference to `netif_napi_del'
drivers/crypto/caam/caamalg_qi2.o: In function `dpaa2_dpseci_setup':
caamalg_qi2.c:(.text.unlikely+0xc98): undefined reference to `netif_napi_add'
From what I can tell, CONFIG_NETDEVICES is the correct dependency here,
and adding it fixes the randconfig failures.
Arnd reports that with Kees's latest VLA patches applied, the HMAC
handling in the QAT driver uses a worst case estimate of 160 bytes
for the SHA blocksize, allowing the compiler to determine the size
of the stack frame at compile time and throw a warning:
drivers/crypto/qat/qat_common/qat_algs.c: In function 'qat_alg_do_precomputes':
drivers/crypto/qat/qat_common/qat_algs.c:257:1: error: the frame size
of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Given that this worst case estimate is only 32 bytes larger than the
actual block size of SHA-512, the use of a VLA here was hiding the
excessive size of the stack frame from the compiler, and so we should
try to move these buffers off the stack.
So move the ipad/opad buffers and the various SHA state descriptors
into the tfm context struct. Since qat_alg_do_precomputes() is only
called in the context of a setkey() operation, this should be safe.
Using SHA512_BLOCK_SIZE for the size of the ipad/opad buffers allows
them to be used by SHA-1/SHA-256 as well.
crypto: ccp - Make function sev_get_firmware() static
Fixes the following sparse warning:
drivers/crypto/ccp/psp-dev.c:444:5: warning:
symbol 'sev_get_firmware' was not declared. Should it be static?
Fixes: e93720606efd ("crypto: ccp - Allow SEV firmware to be chosen based on Family and Model") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
thus the actual definition is "bits of entropy per 1024 bits of input".
The current documentation seems to have confused multiple people
in the past, let's fix the documentation to match code.
An alternative is to change core to match driver expectations, replacing
rc * current_quality * 8 >> 10
with
rc * current_quality / 1000
but that has performance costs, so probably isn't a good option.
Fixes: 0f734e6e768 ("hwrng: add per-device entropy derating") Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
drivers/crypto/ccp/sp-platform.c:36:36: warning: tentative array
definition assumed to have one element
static const struct acpi_device_id sp_acpi_match[];
^
1 warning generated.
Just remove the forward declarations and move the initializations up
so that they can be used in sp_get_of_version and sp_get_acpi_version.
Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: x86/aes-ni - remove special handling of AES in PCBC mode
For historical reasons, the AES-NI based implementation of the PCBC
chaining mode uses a special FPU chaining mode wrapper template to
amortize the FPU start/stop overhead over multiple blocks.
When this FPU wrapper was introduced, it supported widely used
chaining modes such as XTS and CTR (as well as LRW), but currently,
PCBC is the only remaining user.
Since there are no known users of pcbc(aes) in the kernel, let's remove
this special driver, and rely on the generic pcbc driver to encapsulate
the AES-NI core cipher.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a generic version of output feedback mode. We already have support of
several hardware based transformations of this mode and the needed test
vectors but we somehow missed adding a generic software one. Fix this now.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add additional test vectors from "The SM4 Blockcipher Algorithm And Its
Modes Of Operations" draft-ribose-cfrg-sm4-10 and register cipher speed
tests for sm4.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that all the users of the VLA-generating SKCIPHER_REQUEST_ON_STACK()
macro have been moved to SYNC_SKCIPHER_REQUEST_ON_STACK(), we can remove
the former.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.
In preparation for removal of VLAs due to skcipher requests on the stack
via SKCIPHER_REQUEST_ON_STACK() usage, this introduces the infrastructure
for the "sync skcipher" tfm, which is for handling the on-stack cases of
skcipher, which are always non-ASYNC and have a known limited request
size.
Dan Aloni [Mon, 17 Sep 2018 17:24:32 +0000 (20:24 +0300)]
crypto: fix a memory leak in rsa-kcs1pad's encryption mode
The encryption mode of pkcs1pad never uses out_sg and out_buf, so
there's no need to allocate the buffer, which presently is not even
being freed.
CC: Herbert Xu <herbert@gondor.apana.org.au> CC: linux-crypto@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for aes counter(ctr) block cipher mode of operation for
Exynos Hardware. In contrast to ecb and cbc modes, aes-ctr allows
encyption/decryption for request sizes not being a multiple of 16(bytes).
Hardware requires block sizes being a multiple of 16(bytes). In order to
achieve this, copy request source and destination memory, and align it's size
to 16. That way hardware processes additional bytes, that are omitted
when copying the result back to its original destination.
Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.19-rc2 with crypto
run-time self test testmgr.
Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Modifications in s5p-sss.c:
- remove unnecessary 'goto' statements (making code shorter),
- change uint_8 and uint_32 to u8 and u32 types (for consistency in the
driver and making code shorter),
Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Remove a race condition introduced by error path in functions:
s5p_aes_interrupt and s5p_aes_crypt_start. Setting the busy field of
struct s5p_aes_dev to false made it possible for s5p_tasklet_cb to
change the req field, before s5p_aes_complete was called.
Change the first parameter of s5p_aes_complete to struct
ablkcipher_request. Before spin_unlock, make a copy of the currently
handled request, to ensure s5p_aes_complete function call with the
correct request.
Signed-off-by: Christoph Manszewski <c.manszewski@samsung.com> Acked-by: Kamil Konieczny <k.konieczny@partner.samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stefan Agner [Sun, 16 Sep 2018 04:38:25 +0000 (21:38 -0700)]
crypto: arm/crc32 - avoid warning when compiling with Clang
The table id (second) argument to MODULE_DEVICE_TABLE is often
referenced otherwise. This is not the case for CPU features. This
leads to a warning when building the kernel with Clang:
arch/arm/crypto/crc32-ce-glue.c:239:33: warning: variable
'crc32_cpu_feature' is not needed and will not be emitted
[-Wunneeded-internal-declaration]
static const struct cpu_feature crc32_cpu_feature[] = {
^
Avoid warnings by using __maybe_unused, similar to commit 1f318a8bafcf
("modules: mark __inittest/__exittest as __maybe_unused").
Fixes: 2a9faf8b7e43 ("crypto: arm/crc32 - enable module autoloading based on CPU feature bits") Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stefan Agner [Sun, 16 Sep 2018 04:38:24 +0000 (21:38 -0700)]
cpufeature: avoid warning when compiling with clang
The table id (second) argument to MODULE_DEVICE_TABLE is often
referenced otherwise. This is not the case for CPU features. This
leads to warnings when building the kernel with Clang:
arch/arm/crypto/aes-ce-glue.c:450:1: warning: variable
'cpu_feature_match_AES' is not needed and will not be emitted
[-Wunneeded-internal-declaration]
module_cpu_feature_match(AES, aes_init);
^
Avoid warnings by using __maybe_unused, similar to commit 1f318a8bafcf
("modules: mark __inittest/__exittest as __maybe_unused").
Fixes: 67bad2fdb754 ("cpu: add generic support for CPU feature based module autoloading") Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ccp - Allow SEV firmware to be chosen based on Family and Model
During PSP initialization, there is an attempt to update the SEV firmware
by looking in /lib/firmware/amd/. Currently, sev.fw is the expected name
of the firmware blob.
This patch will allow for firmware filenames based on the family and
model of the processor.
Model specific firmware files are given highest priority. Followed by
firmware for a subset of models. Lastly, failing the previous two options,
fallback to looking for sev.fw.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Under certain configuration SEV functions can be defined as no-op.
In such a case error can be uninitialized.
Initialize the variable to 0.
Cc: Dan Carpenter <Dan.Carpenter@oracle.com> Reported-by: Dan Carpenter <Dan.Carpenter@oracle.com> Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ondrej Mosnacek [Thu, 13 Sep 2018 08:51:34 +0000 (10:51 +0200)]
crypto: lrw - Do not use auxiliary buffer
This patch simplifies the LRW template to recompute the LRW tweaks from
scratch in the second pass and thus also removes the need to allocate a
dynamic buffer using kmalloc().
As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.
PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)
The results show that the new code has about the same performance as the
old code. For 512-byte message it seems to be even slightly faster, but
that might be just noise.
Ondrej Mosnacek [Thu, 13 Sep 2018 08:51:33 +0000 (10:51 +0200)]
crypto: lrw - Optimize tweak computation
This patch rewrites the tweak computation to a slightly simpler method
that performs less bswaps. Based on performance measurements the new
code seems to provide slightly better performance than the old one.
PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)
Ondrej Mosnacek [Thu, 13 Sep 2018 08:51:32 +0000 (10:51 +0200)]
crypto: testmgr - Add test for LRW counter wrap-around
This patch adds a test vector for lrw(aes) that triggers wrap-around of
the counter, which is a tricky corner case.
Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ondrej Mosnacek [Thu, 13 Sep 2018 08:51:31 +0000 (10:51 +0200)]
crypto: lrw - Fix out-of bounds access on counter overflow
When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).
Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode") Cc: <stable@vger.kernel.org> # 2.6.20+ Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ghash is a keyed hash algorithm, thus setkey needs to be called.
Otherwise the following error occurs:
$ modprobe tcrypt mode=318 sec=1
testing speed of async ghash-generic (ghash-generic)
tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates):
tcrypt: hashing failed ret=-126
arm64: defconfig: enable CAAM crypto engine on QorIQ DPAA2 SoCs
Enable CAAM (Cryptographic Accelerator and Assurance Module) driver
for QorIQ Data Path Acceleration Architecture (DPAA) v2.
It handles DPSECI (Data Path SEC Interface) DPAA2 objects that sit
on the Management Complex (MC) fsl-mc bus.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add CAAM driver that works using the DPSECI backend, i.e. manages
DPSECI DPAA2 objects sitting on the Management Complex (MC) fsl-mc bus.
Data transfers (crypto requests) are sent/received to/from CAAM crypto
engine via Queue Interface (v2), this being similar to existing caam/qi.
OTOH, configuration/setup (obtaining virtual queue IDs, authorization
etc.) is done by sending commands to the MC f/w.
Note that the CAAM accelerator included in DPAA2 platforms still has
Job Rings. However, the driver being added does not handle access
via this backend. Kconfig & Makefile are updated such that DPAA2-CAAM
(a.k.a. "caam/qi2") driver does not depend on caam/jr or caam/qi
backends - which rely on platform bus support (ctrl.c).
Support for the following aead and authenc algorithms is also added
in this patch:
-aead:
gcm(aes)
rfc4106(gcm(aes))
rfc4543(gcm(aes))
-authenc:
authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede}))
echainiv(authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede})))
authenc(hmac({md5,sha*}),rfc3686(ctr(aes))
seqiv(authenc(hmac({md5,sha*}),rfc3686(ctr(aes)))
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: caam - fix implicit casts in endianness helpers
Fix the following sparse endianness warnings:
drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __le32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __le32 [usertype] <noident>
drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __be32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __be32 [usertype] <noident>
drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __le32
drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __be32
soc: fsl: dpio: add congestion notification support
Add support for Congestion State Change Notifications (CSCN), which
allow DPIO users to be notified when a congestion group changes its
state (due to hitting the entrance / exit threshold).
Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for dpaa2_fd_list format, i.e. dpaa2_fl_entry structure
and accessors.
Frame list entries (FLEs) are similar, but not identical to FDs:
+ "F" (final) bit
- FMT[b'01] is reserved
- DD, SC, DROPP bits (covered by "FD compatibility" field in FLE case)
- FLC[5:0] not used for stashing
Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
soc: fsl: dpio: add back some frame queue functions
This commit adds back functions removed in
commit a211c8170b3c ("staging: fsl-mc/dpio: remove couple of unused functions")
since dpseci object will make use of them.
Acked-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In commit 9f480faec58c ("crypto: chacha20 - Fix keystream alignment for
chacha20_block()"), I had missed that chacha20_block() can be called
directly on the buffer passed to get_random_bytes(), which can have any
alignment. So, while my commit didn't break anything, it didn't fully
solve the alignment problems.
Revert my solution and just update chacha20_block() to use
put_unaligned_le32(), so the output buffer need not be aligned.
This is simpler, and on many CPUs it's the same speed.
But, I kept the 'tmp' buffers in extract_crng_user() and
_get_random_bytes() 4-byte aligned, since that alignment is actually
needed for _crng_backtrack_protect() too.
Reported-by: Stephan Müller <smueller@chronox.de> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ondrej Mosnacek [Tue, 11 Sep 2018 07:40:08 +0000 (09:40 +0200)]
crypto: xts - Drop use of auxiliary buffer
Since commit acb9b159c784 ("crypto: gf128mul - define gf128mul_x_* in
gf128mul.h"), the gf128mul_x_*() functions are very fast and therefore
caching the computed XTS tweaks has only negligible advantage over
computing them twice.
In fact, since the current caching implementation limits the size of
the calls to the child ecb(...) algorithm to PAGE_SIZE (usually 4096 B),
it is often actually slower than the simple recomputing implementation.
This patch simplifies the XTS template to recompute the XTS tweaks from
scratch in the second pass and thus also removes the need to allocate a
dynamic buffer using kmalloc().
As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.
PERFORMANCE RESULTS
I measured time to encrypt/decrypt a memory buffer of varying sizes with
xts(ecb-aes-aesni) using a tool I wrote ([2]) and the results suggest
that after this patch the performance is either better or comparable for
both small and large buffers. Note that there is a lot of noise in the
measurements, but the overall difference is easy to see.
The Crypto Extension instantiation of the aes-modes.S collection of
skciphers uses only 15 NEON registers for the round key array, whereas
the pure NEON flavor uses 16 NEON registers for the AES S-box.
This means we have a spare register available that we can use to hold
the XTS mask vector, removing the need to reload it at every iteration
of the inner loop.
Since the pure NEON version does not permit this optimization, tweak
the macros so we can factor out this functionality. Also, replace the
literal load with a short sequence to compose the mask vector.
On Cortex-A53, this results in a ~4% speedup.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: arm64/aes-blk - add support for CTS-CBC mode
Currently, we rely on the generic CTS chaining mode wrapper to
instantiate the cts(cbc(aes)) skcipher. Due to the high performance
of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte),
any overhead in the chaining mode layers is amplified, and so it pays
off considerably to fold the CTS handling into the SIMD routines.
On Cortex-A53, this results in a ~50% speedup for smaller input sizes.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: arm64/aes-blk - revert NEON yield for skciphers
The reasoning of commit f10dc56c64bb ("crypto: arm64 - revert NEON yield
for fast AEAD implementations") applies equally to skciphers: the walk
API already guarantees that the input size of each call into the NEON
code is bounded to the size of a page, and so there is no need for an
additional TIF_NEED_RESCHED flag check inside the inner loop. So revert
the skcipher changes to aes-modes.S (but retain the mac ones)
For some reason, the asmlinkage prototypes of the NEON routines take
u8[] arguments for the round key arrays, while the actual round keys
are arrays of u32, and so passing them into those routines requires
u8* casts at each occurrence. Fix that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In some cases the zero-length hw_desc array at the end of
ablkcipher_edesc struct requires for 4B of tail padding.
Due to tail padding and the way pointers to S/G table and IV
are computed:
edesc->sec4_sg = (void *)edesc + sizeof(struct ablkcipher_edesc) +
desc_bytes;
iv = (u8 *)edesc->hw_desc + desc_bytes + sec4_sg_bytes;
first 4 bytes of IV are overwritten by S/G table.
Update computation of pointer to S/G table to rely on offset of hw_desc
member and not on sizeof() operator.
Cc: <stable@vger.kernel.org> # 4.13+ Fixes: 115957bb3e59 ("crypto: caam - fix IV DMA mapping and updating") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: cavium/nitrox - Added support for SR-IOV configuration.
Added support to configure SR-IOV using sysfs interface.
Supported VF modes are 16, 32, 64 and 128. Grouped the
hardware configuration functions to "nitrox_hal.h" file.
Changed driver version to "1.1".
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com> Reviewed-by: Gadam Sreerama <sgadam@cavium.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross a page in gcm
This patch fixes gcmaes_crypt_by_sg so that it won't use memory
allocation if the data doesn't cross a page boundary.
Authenticated encryption may be used by dm-crypt. If the encryption or
decryption fails, it would result in I/O error and filesystem corruption.
The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can
fail anytime. This patch fixes the logic so that it won't attempt the
failing allocation if the data doesn't cross a page boundary.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: b76377543b73 ("crc-t10dif: Pick better transform if one becomes available") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kees Cook [Tue, 7 Aug 2018 21:18:39 +0000 (14:18 -0700)]
dm: Remove VLA usage from hashes
In the quest to remove all stack VLA usage from the kernel[1], this uses
the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper
bounds on stack usage.
Ondrej Mosnacek [Wed, 5 Sep 2018 07:26:41 +0000 (09:26 +0200)]
crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
Without this patch the affected modules refuse to load on CPUs with SSE2
but without AVX support.
Fixes: 877ccce7cbe8 ("crypto: x86/aegis,morus - Fix and simplify CPUID checks") Cc: <stable@vger.kernel.org> # 4.18 Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Brijesh Singh [Wed, 15 Aug 2018 21:11:25 +0000 (16:11 -0500)]
crypto: ccp - add timeout support in the SEV command
Currently, the CCP driver assumes that the SEV command issued to the PSP
will always return (i.e. it will never hang). But recently, firmware bugs
have shown that a command can hang. Since of the SEV commands are used
in probe routines, this can cause boot hangs and/or loss of virtualization
capabilities.
To protect against firmware bugs, add a timeout in the SEV command
execution flow. If a command does not complete within the specified
timeout then return -ETIMEOUT and stop the driver from executing any
further commands since the state of the SEV firmware is unknown.
Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Gary Hook <Gary.Hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Eric Biggers [Sat, 1 Sep 2018 07:17:07 +0000 (00:17 -0700)]
crypto: arm/chacha20 - faster 8-bit rotations and other optimizations
Optimize ChaCha20 NEON performance by:
- Implementing the 8-bit rotations using the 'vtbl.8' instruction.
- Streamlining the part that adds the original state and XORs the data.
- Making some other small tweaks.
On ARM Cortex-A7, these optimizations improve ChaCha20 performance from
about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement.
There is a tradeoff involved with the 'vtbl.8' rotation method since
there is at least one CPU (Cortex-A53) where it's not fastest. But it
seems to be a better default; see the added comment. Overall, this
patch reduces Cortex-A53 performance by less than 0.5%.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc-t10dif: Pick better transform if one becomes available
T10 CRC library is linked into the kernel thanks to block and SCSI. The
crypto accelerators are typically loaded later as modules and are
therefore not available when the T10 CRC library is initialized.
Use the crypto notifier facility to trigger a switch to a better algorithm
if one becomes available after the initial hash has been registered. Use
RCU to protect the original transform while the new one is being set up.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: api - Introduce notifier for new crypto algorithms
Introduce a facility that can be used to receive a notification
callback when a new algorithm becomes available. This can be used by
existing crypto registrations to trigger a switch from a software-only
algorithm to a hardware-accelerated version.
A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto
notification chain, and the register/unregister functions are exported
so they can be called by subsystems outside of crypto.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Mon, 27 Aug 2018 15:38:12 +0000 (17:38 +0200)]
crypto: arm64/crct10dif - implement non-Crypto Extensions alternative
The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit
polynomial multiplication instructions, which are optional in the
architecture, and if these instructions are not available, we fall back
to the C routine which is slow and inefficient.
So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that
uses a sequence of ~40 instructions involving 8x8 bit PMULL and some
shifting and masking. This is a lot slower than the original, but it is
still twice as fast as the current [unoptimized] C code on Cortex-A53,
and it is time invariant and much easier on the D-cache.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Mon, 27 Aug 2018 15:38:11 +0000 (17:38 +0200)]
crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version
Reorganize the CRC-T10DIF asm routine so we can easily instantiate an
alternative version based on 8x8 polynomial multiplication in a
subsequent patch.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ard Biesheuvel [Mon, 27 Aug 2018 11:02:45 +0000 (13:02 +0200)]
crypto: arm64/crc32 - remove PMULL based CRC32 driver
Now that the scalar fallbacks have been moved out of this driver into
the core crc32()/crc32c() routines, we are left with a CRC32 crypto API
driver for arm64 that is based only on 64x64 polynomial multiplication,
which is an optional instruction in the ARMv8 architecture, and is less
and less likely to be available on cores that do not also implement the
CRC32 instructions, given that those are mandatory in the architecture
as of ARMv8.1.
Since the scalar instructions do not require the special handling that
SIMD instructions do, and since they turn out to be considerably faster
on some cores (Cortex-A53) as well, there is really no point in keeping
this code around so let's just remove it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>