]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
5 years agocrypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS
Eric Biggers [Sun, 16 Dec 2018 23:00:30 +0000 (15:00 -0800)]
crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS

Fixes: b7ed7a91e20d ("crypto: cavium/nitrox - Enabled Mailbox support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: salsa20-generic - don't unnecessarily use atomic walk
Eric Biggers [Sat, 15 Dec 2018 20:42:52 +0000 (12:42 -0800)]
crypto: salsa20-generic - don't unnecessarily use atomic walk

salsa20-generic doesn't use SIMD instructions or otherwise disable
preemption, so passing atomic=true to skcipher_walk_virt() is
unnecessary.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: skcipher - add might_sleep() to skcipher_walk_virt()
Eric Biggers [Sat, 15 Dec 2018 20:41:53 +0000 (12:41 -0800)]
crypto: skcipher - add might_sleep() to skcipher_walk_virt()

skcipher_walk_virt() can still sleep even with atomic=true, since that
only affects the later calls to skcipher_walk_done().  But,
skcipher_walk_virt() only has to allocate memory for some input data
layouts, so incorrectly calling it with preemption disabled can go
undetected.  Use might_sleep() so that it's detected reliably.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
Eric Biggers [Sat, 15 Dec 2018 20:40:17 +0000 (12:40 -0800)]
crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()

Passing atomic=true to skcipher_walk_virt() only makes the later
skcipher_walk_done() calls use atomic memory allocations, not
skcipher_walk_virt() itself.  Thus, we have to move it outside of the
preemption-disabled region (kernel_fpu_begin()/kernel_fpu_end()).

(skcipher_walk_virt() only allocates memory for certain layouts of the
input scatterlist, hence why I didn't notice this earlier...)

Reported-by: syzbot+9bf843c33f782d73ae7d@syzkaller.appspotmail.com
Fixes: f562887c7028 ("crypto: x86/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: cavium/nitrox - Added AEAD cipher support
Nagadheeraj Rottela [Fri, 14 Dec 2018 11:19:51 +0000 (11:19 +0000)]
crypto: cavium/nitrox - Added AEAD cipher support

Added support to offload AEAD ciphers to NITROX. Currently supported
AEAD cipher is 'gcm(aes)'.

Signed-off-by: Nagadheeraj Rottela <rnagadheeraj@marvell.com>
Reviewed-by: Srikanth Jampala <jsrikanth@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: mxc-scc - fix build warnings on ARM64
Fabio Estevam [Thu, 13 Dec 2018 09:52:32 +0000 (07:52 -0200)]
crypto: mxc-scc - fix build warnings on ARM64

The following build warnings are seen when building for ARM64 allmodconfig:

drivers/crypto/mxc-scc.c:181:20: warning: format '%d' expects argument of type 'int', but argument 5 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
drivers/crypto/mxc-scc.c:186:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
drivers/crypto/mxc-scc.c:277:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=]
drivers/crypto/mxc-scc.c:339:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
drivers/crypto/mxc-scc.c:340:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Fix them by using the %zu specifier to print a size_t variable and using
a plain %x to print the result of a readl().

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: api - document missing stats member
Corentin Labbe [Thu, 13 Dec 2018 08:36:38 +0000 (08:36 +0000)]
crypto: api - document missing stats member

This patchs adds missing member of stats documentation.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - remove unused dump functions
Corentin Labbe [Thu, 13 Dec 2018 08:36:37 +0000 (08:36 +0000)]
crypto: user - remove unused dump functions

This patch removes unused dump functions for crypto_user_stats.
There are remains of the copy/paste of crypto_user_base to
crypto_user_stat and I forgot to remove them.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - Fix wrong error counter increments
Harsh Jain [Tue, 11 Dec 2018 10:51:42 +0000 (16:21 +0530)]
crypto: chelsio - Fix wrong error counter increments

Fix error counter increment in AEAD decrypt operation when
validation of tag is done in Driver instead of H/W.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - Reset counters on cxgb4 Detach
Harsh Jain [Tue, 11 Dec 2018 10:51:41 +0000 (16:21 +0530)]
crypto: chelsio - Reset counters on cxgb4 Detach

Reset the counters on receiving detach from Cxgb4.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - Handle PCI shutdown event
Harsh Jain [Tue, 11 Dec 2018 10:51:40 +0000 (16:21 +0530)]
crypto: chelsio - Handle PCI shutdown event

chcr receives "CXGB4_STATE_DETACH" event on PCI Shutdown.
Wait for processing of inflight request and Mark the device unavailable.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - cleanup:send addr as value in function argument
Harsh Jain [Tue, 11 Dec 2018 10:51:39 +0000 (16:21 +0530)]
crypto: chelsio - cleanup:send addr as value in function argument

Send dma address as value to function arguments instead of pointer.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - Use same value for both channel in single WR
Harsh Jain [Tue, 11 Dec 2018 10:51:38 +0000 (16:21 +0530)]
crypto: chelsio - Use same value for both channel in single WR

Use tx_channel_id instead of rx_channel_id.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - Swap location of AAD and IV sent in WR
Harsh Jain [Tue, 11 Dec 2018 10:51:37 +0000 (16:21 +0530)]
crypto: chelsio - Swap location of AAD and IV sent in WR

Send input as IV | AAD | Data. It will allow sending IV as Immediate
Data and Creates space in Work request to add more dma mapped entries.

Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chelsio - remove set but not used variable 'kctx_len'
YueHaibing [Tue, 11 Dec 2018 08:11:59 +0000 (08:11 +0000)]
crypto: chelsio - remove set but not used variable 'kctx_len'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/crypto/chelsio/chcr_ipsec.c: In function 'chcr_ipsec_xmit':
drivers/crypto/chelsio/chcr_ipsec.c:674:33: warning:
 variable 'kctx_len' set but not used [-Wunused-but-set-variable]
  unsigned int flits = 0, ndesc, kctx_len;

It not used since commit 0fd466cf4846 ("crypto: chcr - ESN for Inline IPSec Tx")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: ux500 - Use proper enum in hash_set_dma_transfer
Nathan Chancellor [Mon, 10 Dec 2018 23:49:54 +0000 (16:49 -0700)]
crypto: ux500 - Use proper enum in hash_set_dma_transfer

Clang warns when one enumerated type is implicitly converted to another:

drivers/crypto/ux500/hash/hash_core.c:169:4: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                        direction, DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
                        ^~~~~~~~~
1 warning generated.

dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
We know that the only direction supported by this function is
DMA_TO_DEVICE because of the check at the top of this function so we can
just use the equivalent value from dma_transfer_direction.

DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: ux500 - Use proper enum in cryp_set_dma_transfer
Nathan Chancellor [Mon, 10 Dec 2018 23:49:29 +0000 (16:49 -0700)]
crypto: ux500 - Use proper enum in cryp_set_dma_transfer

Clang warns when one enumerated type is implicitly converted to another:

drivers/crypto/ux500/cryp/cryp_core.c:559:5: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                direction, DMA_CTRL_ACK);
                                ^~~~~~~~~
drivers/crypto/ux500/cryp/cryp_core.c:583:5: warning: implicit
conversion from enumeration type 'enum dma_data_direction' to different
enumeration type 'enum dma_transfer_direction' [-Wenum-conversion]
                                direction,
                                ^~~~~~~~~
2 warnings generated.

dmaengine_prep_slave_sg expects an enum from dma_transfer_direction.
Because we know the value of the dma_data_direction enum from the
switch statement, we can just use the proper value from
dma_transfer_direction so there is no more conversion.

DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1
DMA_FROM_DEVICE = DMA_DEV_TO_MEM = 2

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Add scatter/gather avx stubs, and use them in C
Dave Watson [Mon, 10 Dec 2018 19:59:59 +0000 (19:59 +0000)]
crypto: aesni - Add scatter/gather avx stubs, and use them in C

Add the appropriate scatter/gather stubs to the avx asm.
In the C code, we can now always use crypt_by_sg, since both
sse and asm code now support scatter/gather.

Introduce a new struct, aesni_gcm_tfm, that is initialized on
startup to point to either the SSE, AVX, or AVX2 versions of the
four necessary encryption/decryption routines.

GENX_OPTSIZE is still checked at the start of crypt_by_sg.  The
total size of the data is checked, since the additional overhead
is in the init function, calculating additional HashKeys.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Introduce partial block macro
Dave Watson [Mon, 10 Dec 2018 19:59:38 +0000 (19:59 +0000)]
crypto: aesni - Introduce partial block macro

Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Introduce READ_PARTIAL_BLOCK macro
Dave Watson [Mon, 10 Dec 2018 19:59:26 +0000 (19:59 +0000)]
crypto: aesni - Introduce READ_PARTIAL_BLOCK macro

Introduce READ_PARTIAL_BLOCK macro, and use it in the two existing
partial block cases: AAD and the end of ENC_DEC.   In particular,
the ENC_DEC case should be faster, since we read by 8/4 bytes if
possible.

This macro will also be used to read partial blocks between
enc_update and dec_update calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Move ghash_mul to GCM_COMPLETE
Dave Watson [Mon, 10 Dec 2018 19:59:11 +0000 (19:59 +0000)]
crypto: aesni - Move ghash_mul to GCM_COMPLETE

Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Fill in new context data structures
Dave Watson [Mon, 10 Dec 2018 19:58:56 +0000 (19:58 +0000)]
crypto: aesni - Fill in new context data structures

Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Merge avx precompute functions
Dave Watson [Mon, 10 Dec 2018 19:58:38 +0000 (19:58 +0000)]
crypto: aesni - Merge avx precompute functions

The precompute functions differ only by the sub-macros
they call, merge them to a single macro.   Later diffs
add more code to fill in the gcm_context_data structure,
this allows changes in a single place.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Split AAD hash calculation to separate macro
Dave Watson [Mon, 10 Dec 2018 19:58:19 +0000 (19:58 +0000)]
crypto: aesni - Split AAD hash calculation to separate macro

AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Add GCM_COMPLETE macro
Dave Watson [Mon, 10 Dec 2018 19:57:49 +0000 (19:57 +0000)]
crypto: aesni - Add GCM_COMPLETE macro

Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - support 256 byte keys in avx asm
Dave Watson [Mon, 10 Dec 2018 19:57:36 +0000 (19:57 +0000)]
crypto: aesni - support 256 byte keys in avx asm

Add support for 192/256-bit keys using the avx gcm/aes routines.
The sse routines were previously updated in 3be15ecf2f (Add support
for 192 & 256 bit keys to AESNI RFC4106).

Instead of adding an additional loop in the hotpath as in 3be15ecf2f,
this diff instead generates separate versions of the code using macros,
and the entry routines choose which version once.   This results
in a 5% performance improvement vs. adding a loop to the hot path.
This is the same strategy chosen by the intel isa-l_crypto library.

The key size checks are removed from the c code where appropriate.

Note that this diff depends on using gcm_context_data - 256 bit keys
require 16 HashKeys + 15 expanded keys, which is larger than
struct crypto_aes_ctx, so they are stored in struct gcm_context_data.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Macro-ify func save/restore
Dave Watson [Mon, 10 Dec 2018 19:57:12 +0000 (19:57 +0000)]
crypto: aesni - Macro-ify func save/restore

Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Introduce gcm_context_data
Dave Watson [Mon, 10 Dec 2018 19:57:00 +0000 (19:57 +0000)]
crypto: aesni - Introduce gcm_context_data

Add the gcm_context_data structure to the avx asm routines.
This will be necessary to support both 256 bit keys and
scatter/gather.

The pre-computed HashKeys are now stored in the gcm_context_data
struct, which is expanded to hold the greater number of hashkeys
necessary for avx.

Loads and stores to the new struct are always done unlaligned to
avoid compiler issues, see 21655d23 "Use unaligned loads from
gcm_context_data"

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: aesni - Merge GCM_ENC_DEC
Dave Watson [Mon, 10 Dec 2018 19:56:45 +0000 (19:56 +0000)]
crypto: aesni - Merge GCM_ENC_DEC

The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they
call separate sub-macros.  Pass the macros as arguments, and merge them.
This facilitates additional refactoring, by requiring changes in only
one place.

The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs,
since it will be used by both AVX and AVX2.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: adiantum - fix leaking reference to hash algorithm
Eric Biggers [Mon, 10 Dec 2018 19:45:58 +0000 (11:45 -0800)]
crypto: adiantum - fix leaking reference to hash algorithm

crypto_alg_mod_lookup() takes a reference to the hash algorithm but
crypto_init_shash_spawn() doesn't take ownership of it, hence the
reference needs to be dropped in adiantum_create().

Fixes: 8b2281d1c5cc ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - support incremental algorithm dumps
Eric Biggers [Thu, 6 Dec 2018 23:55:41 +0000 (15:55 -0800)]
crypto: user - support incremental algorithm dumps

CRYPTO_MSG_GETALG in NLM_F_DUMP mode sometimes doesn't return all
registered crypto algorithms, because it doesn't support incremental
dumps.  crypto_dump_report() only permits itself to be called once, yet
the netlink subsystem allocates at most ~64 KiB for the skb being dumped
to.  Thus only the first recvmsg() returns data, and it may only include
a subset of the crypto algorithms even if the user buffer passed to
recvmsg() is large enough to hold all of them.

Fix this by using one of the arguments in the netlink_callback structure
to keep track of the current position in the algorithm list.  Then
userspace can do multiple recvmsg() on the socket after sending the dump
request.  This is the way netlink dumps work elsewhere in the kernel;
it's unclear why this was different (probably just an oversight).

Also fix an integer overflow when calculating the dump buffer size hint.

Fixes: b4d06718625c ("crypto: Add userspace configuration API")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: adiantum - adjust some comments to match latest paper
Eric Biggers [Thu, 6 Dec 2018 22:21:59 +0000 (14:21 -0800)]
crypto: adiantum - adjust some comments to match latest paper

The 2018-11-28 revision of the Adiantum paper has revised some notation:

- 'M' was replaced with 'L' (meaning "Left", for the left-hand part of
  the message) in the definition of Adiantum hashing, to avoid confusion
  with the full message
- ε-almost-∆-universal is now abbreviated as ε-∆U instead of εA∆U
- "block" is now used only to mean block cipher and Poly1305 blocks

Also, Adiantum hashing was moved from the appendix to the main paper.

To avoid confusion, update relevant comments in the code to match.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: xchacha20 - fix comments for test vectors
Eric Biggers [Thu, 6 Dec 2018 21:00:08 +0000 (13:00 -0800)]
crypto: xchacha20 - fix comments for test vectors

The kernel's ChaCha20 uses the RFC7539 convention of the nonce being 12
bytes rather than 8, so actually I only appended 12 random bytes (not
16) to its test vectors to form 24-byte nonces for the XChaCha20 test
vectors.  The other 4 bytes were just from zero-padding the stream
position to 8 bytes.  Fix the comments above the test vectors.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: xchacha - add test vector from XChaCha20 draft RFC
Eric Biggers [Thu, 6 Dec 2018 20:31:54 +0000 (12:31 -0800)]
crypto: xchacha - add test vector from XChaCha20 draft RFC

There is a draft specification for XChaCha20 being worked on.  Add the
XChaCha20 test vector from the appendix so that we can be extra sure the
kernel's implementation is compatible.

I also recomputed the ciphertext with XChaCha12 and added it there too,
to keep the tests for XChaCha20 and XChaCha12 in sync.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/chacha - yield the FPU occasionally
Eric Biggers [Wed, 5 Dec 2018 06:20:05 +0000 (22:20 -0800)]
crypto: x86/chacha - yield the FPU occasionally

To improve responsiveness, yield the FPU (temporarily re-enabling
preemption) every 4 KiB encrypted/decrypted, rather than keeping
preemption disabled during the entire encryption/decryption operation.

Alternatively we could do this for every skcipher_walk step, but steps
may be small in some cases, and yielding the FPU is expensive on x86.

Suggested-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/chacha - add XChaCha12 support
Eric Biggers [Wed, 5 Dec 2018 06:20:04 +0000 (22:20 -0800)]
crypto: x86/chacha - add XChaCha12 support

Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have
been refactored to support varying the number of rounds, add support for
XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.  This can be used by Adiantum.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/chacha20 - refactor to allow varying number of rounds
Eric Biggers [Wed, 5 Dec 2018 06:20:03 +0000 (22:20 -0800)]
crypto: x86/chacha20 - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor the x86_64
SIMD implementations of ChaCha20 to support different numbers of rounds.

Reviewed-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/chacha20 - add XChaCha20 support
Eric Biggers [Wed, 5 Dec 2018 06:20:02 +0000 (22:20 -0800)]
crypto: x86/chacha20 - add XChaCha20 support

Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD
implementations of ChaCha20.  This can be used by Adiantum.

An SSSE3 implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha permutation into its own function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305
Eric Biggers [Wed, 5 Dec 2018 06:20:01 +0000 (22:20 -0800)]
crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305

Add a 64-bit AVX2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually AVX2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305
Eric Biggers [Wed, 5 Dec 2018 06:20:00 +0000 (22:20 -0800)]
crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305

Add a 64-bit SSE2 implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually SSE2-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: adiantum - propagate CRYPTO_ALG_ASYNC flag to instance
Eric Biggers [Wed, 5 Dec 2018 00:46:54 +0000 (16:46 -0800)]
crypto: adiantum - propagate CRYPTO_ALG_ASYNC flag to instance

If the stream cipher implementation is asynchronous, then the Adiantum
instance must be flagged as asynchronous as well.  Otherwise someone
asking for a synchronous algorithm can get an asynchronous algorithm.

There are no asynchronous xchacha12 or xchacha20 implementations yet
which makes this largely a theoretical issue, but it should be fixed.

Fixes: 8b2281d1c5cc ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/chacha - use combined SIMD/ALU routine for more speed
Ard Biesheuvel [Tue, 4 Dec 2018 13:13:33 +0000 (14:13 +0100)]
crypto: arm64/chacha - use combined SIMD/ALU routine for more speed

To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/chacha - optimize for arbitrary length inputs
Ard Biesheuvel [Tue, 4 Dec 2018 13:13:32 +0000 (14:13 +0100)]
crypto: arm64/chacha - optimize for arbitrary length inputs

Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.

On inputs that are a multiple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: tcrypt - add block size of 1472 to skcipher template
Ard Biesheuvel [Tue, 4 Dec 2018 13:13:31 +0000 (14:13 +0100)]
crypto: tcrypt - add block size of 1472 to skcipher template

In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN  packet, add 1472 to the
block_sizes array.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: cavium/nitrox - Enabled Mailbox support
Srikanth, Jampala [Tue, 4 Dec 2018 12:55:54 +0000 (12:55 +0000)]
crypto: cavium/nitrox - Enabled Mailbox support

Enabled the PF->VF Mailbox support. Mailbox message are interpreted
as {type, opcode, data}. Supported message types are REQ, ACK and NACK.

Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/chacha - add XChaCha12 support
Eric Biggers [Tue, 4 Dec 2018 03:52:52 +0000 (19:52 -0800)]
crypto: arm64/chacha - add XChaCha12 support

Now that the ARM64 NEON implementation of ChaCha20 and XChaCha20 has
been refactored to support varying the number of rounds, add support for
XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.  This can be used by Adiantum.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/chacha20 - refactor to allow varying number of rounds
Eric Biggers [Tue, 4 Dec 2018 03:52:51 +0000 (19:52 -0800)]
crypto: arm64/chacha20 - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor the ARM64
NEON implementation of ChaCha20 to support different numbers of rounds.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/chacha20 - add XChaCha20 support
Eric Biggers [Tue, 4 Dec 2018 03:52:50 +0000 (19:52 -0800)]
crypto: arm64/chacha20 - add XChaCha20 support

Add an XChaCha20 implementation that is hooked up to the ARM64 NEON
implementation of ChaCha20.  This can be used by Adiantum.

A NEON implementation of single-block HChaCha20 is also added so that
XChaCha20 can use it rather than the generic implementation.  This
required refactoring the ChaCha20 permutation into its own function.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305
Eric Biggers [Tue, 4 Dec 2018 03:52:49 +0000 (19:52 -0800)]
crypto: arm64/nhpoly1305 - add NEON-accelerated NHPoly1305

Add an ARM64 NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> # big-endian
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: cavium/nitrox - convert to DEFINE_SHOW_ATTRIBUTE
Yangtao Li [Sat, 1 Dec 2018 09:56:30 +0000 (04:56 -0500)]
crypto: cavium/nitrox - convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chcr - ESN for Inline IPSec Tx
Atul Gupta [Fri, 30 Nov 2018 09:02:09 +0000 (14:32 +0530)]
crypto: chcr - ESN for Inline IPSec Tx

Send SPI, 64b seq nos and 64b IV with aadiv drop for inline crypto.
This information is added in outgoing packet after the CPL TX PKT XT
and removed by hardware.
The aad, auth and cipher offsets are then adjusted for ESN enabled tunnel.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: chcr - small packet Tx stalls the queue
Atul Gupta [Fri, 30 Nov 2018 09:01:48 +0000 (14:31 +0530)]
crypto: chcr - small packet Tx stalls the queue

Immediate packets sent to hardware should include the work
request length in calculating the flits. WR occupy one flit and
if not accounted result in invalid request which stalls the HW
queue.

Cc: stable@vger.kernel.org
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - Add crypto_stats_init
Corentin Labbe [Thu, 29 Nov 2018 14:42:26 +0000 (14:42 +0000)]
crypto: user - Add crypto_stats_init

This patch add the crypto_stats_init() function.
This will permit to remove some ifdef from __crypto_register_alg().

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - rename err_cnt parameter
Corentin Labbe [Thu, 29 Nov 2018 14:42:25 +0000 (14:42 +0000)]
crypto: user - rename err_cnt parameter

Since now all crypto stats are on their own structures, it is now
useless to have the algorithm name in the err_cnt member.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - Split stats in multiple structures
Corentin Labbe [Thu, 29 Nov 2018 14:42:24 +0000 (14:42 +0000)]
crypto: user - Split stats in multiple structures

Like for userspace, this patch splits stats into multiple structures,
one for each algorithm class.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - remove intermediate variable
Corentin Labbe [Thu, 29 Nov 2018 14:42:23 +0000 (14:42 +0000)]
crypto: user - remove intermediate variable

The use of the v64 intermediate variable is useless, and removing it
bring to much readable code.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - Fix invalid stat reporting
Corentin Labbe [Thu, 29 Nov 2018 14:42:22 +0000 (14:42 +0000)]
crypto: user - Fix invalid stat reporting

Some error count use the wrong name for getting this data.
But this had not caused any reporting problem, since all error count are shared in the same
union.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - fix use_after_free of struct xxx_request
Corentin Labbe [Thu, 29 Nov 2018 14:42:21 +0000 (14:42 +0000)]
crypto: user - fix use_after_free of struct xxx_request

All crypto_stats functions use the struct xxx_request for feeding stats,
but in some case this structure could already be freed.

For fixing this, the needed parameters (len and alg) will be stored
before the request being executed.
Fixes: c5bb87c02c96 ("crypto: user - Implement a generic crypto statistics")
Reported-by: syzbot <syzbot+6939a606a5305e9e9799@syzkaller.appspotmail.com>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: tool: getstat: convert user space example to the new crypto_user_stat uapi
Corentin Labbe [Thu, 29 Nov 2018 14:42:20 +0000 (14:42 +0000)]
crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi

This patch converts the getstat example tool to the recent changes done in crypto_user_stat
- changed all stats to u64
- separated struct stats for each crypto alg

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - split user space crypto stat structures
Corentin Labbe [Thu, 29 Nov 2018 14:42:19 +0000 (14:42 +0000)]
crypto: user - split user space crypto stat structures

It is cleaner to have each stat in their own structures.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - convert all stats from u32 to u64
Corentin Labbe [Thu, 29 Nov 2018 14:42:18 +0000 (14:42 +0000)]
crypto: user - convert all stats from u32 to u64

All the 32-bit fields need to be 64-bit.  In some cases, UINT32_MAX crypto
operations can be done in seconds.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - CRYPTO_STATS should depend on CRYPTO_USER
Corentin Labbe [Thu, 29 Nov 2018 14:42:17 +0000 (14:42 +0000)]
crypto: user - CRYPTO_STATS should depend on CRYPTO_USER

CRYPTO_STATS is using CRYPTO_USER stuff, so it should depends on it.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: user - made crypto_user_stat optional
Corentin Labbe [Thu, 29 Nov 2018 14:42:16 +0000 (14:42 +0000)]
crypto: user - made crypto_user_stat optional

Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are
compiled.
This patch made all part of crypto_user_stat uncompiled in that case.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agoMAINTAINERS: change NX/VMX maintainers
Paulo Flabiano Smorigo [Tue, 27 Nov 2018 17:27:21 +0000 (15:27 -0200)]
MAINTAINERS: change NX/VMX maintainers

Add Breno and Nayna as NX/VMX crypto driver maintainers. Also change my
email address to my personal account and remove Leonidas since he's not
working with the driver anymore.

Signed-off-by: Paulo Flabiano Smorigo <pfsmorigo@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agoMAINTAINERS: ccree: add co-maintainer
Gilad Ben-Yossef [Tue, 13 Nov 2018 09:40:37 +0000 (09:40 +0000)]
MAINTAINERS: ccree: add co-maintainer

Add Yael Chemla as co-maintainer of Arm TrustZone CryptoCell REE
driver.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agodt-bindings: crypto: ccree: add dt bindings for ccree 703
Gilad Ben-Yossef [Tue, 13 Nov 2018 09:40:36 +0000 (09:40 +0000)]
dt-bindings: crypto: ccree: add dt bindings for ccree 703

Add device tree bindings associating Arm TrustZone CryptoCell 703 with the
ccree driver.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agocrypto: ccree - add support for CryptoCell 703
Gilad Ben-Yossef [Tue, 13 Nov 2018 09:40:35 +0000 (09:40 +0000)]
crypto: ccree - add support for CryptoCell 703

Add support for Arm TrustZone CryptoCell 703.
The 703 is a variant of the CryptoCell 713 that supports only
algorithms certified by the Chinesse Office of the State Commercial
Cryptography Administration (OSCCA).

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Herbert Xu [Fri, 7 Dec 2018 05:59:10 +0000 (13:59 +0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Merge crypto tree to pick up crypto stats API revert.

5 years agocrypto: user - Disable statistics interface
Herbert Xu [Fri, 7 Dec 2018 05:56:08 +0000 (13:56 +0800)]
crypto: user - Disable statistics interface

Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: cavium/nitrox - Enable interrups for PF in SR-IOV mode.
Srikanth, Jampala [Wed, 21 Nov 2018 09:52:24 +0000 (09:52 +0000)]
crypto: cavium/nitrox - Enable interrups for PF in SR-IOV mode.

Enable the available interrupt vectors for PF in SR-IOV Mode.
Only single vector entry 192 is valid of PF. This is used to
notify any hardware errors and mailbox messages from VF(s).

Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: cavium/nitrox - crypto request format changes
Nagadheeraj, Rottela [Wed, 21 Nov 2018 07:36:58 +0000 (07:36 +0000)]
crypto: cavium/nitrox - crypto request format changes

nitrox_skcipher_crypt() will do the necessary formatting/ordering of
input and output sglists based on the algorithm requirements.
It will also accommodate the mandatory output buffers required for
NITROX hardware like Output request headers (ORH) and Completion headers.

Signed-off-by: Nagadheeraj Rottela <rottela.nagadheeraj@cavium.com>
Reviewed-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Add a 4-block AVX-512VL variant
Martin Willi [Tue, 20 Nov 2018 16:30:50 +0000 (17:30 +0100)]
crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Add a 2-block AVX-512VL variant
Martin Willi [Tue, 20 Nov 2018 16:30:49 +0000 (17:30 +0100)]
crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Add a 8-block AVX-512VL variant
Martin Willi [Tue, 20 Nov 2018 16:30:48 +0000 (17:30 +0100)]
crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: do not free algorithm before using
Pan Bian [Thu, 22 Nov 2018 10:00:16 +0000 (18:00 +0800)]
crypto: do not free algorithm before using

In multiple functions, the algorithm fields are read after its reference
is dropped through crypto_mod_put. In this case, the algorithm memory
may be freed, resulting in use-after-free bugs. This patch delays the
put operation until the algorithm is never used.

Fixes: 5e414f2ec18e ("crypto: cbc - Convert to skcipher")
Fixes: f703c2d64a0e ("crypto: cfb - add support for Cipher FeedBack mode")
Fixes: 528fa55fd48a ("crypto: pcbc - Convert to skcipher")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: adiantum - add Adiantum support
Eric Biggers [Sat, 17 Nov 2018 01:26:31 +0000 (17:26 -0800)]
crypto: adiantum - add Adiantum support

Add support for the Adiantum encryption mode.  Adiantum was designed by
Paul Crowley and is specified by our paper:

    Adiantum: length-preserving encryption for entry-level processors
    (https://eprint.iacr.org/2018/720.pdf)

See our paper for full details; this patch only provides an overview.

Adiantum is a tweakable, length-preserving encryption mode designed for
fast and secure disk encryption, especially on CPUs without dedicated
crypto instructions.  Adiantum encrypts each sector using the XChaCha12
stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
function, and an invocation of the AES-256 block cipher on a single
16-byte block.  On CPUs without AES instructions, Adiantum is much
faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
and decryption about 5 times faster.

Adiantum is a specialization of the more general HBSH construction.  Our
earlier proposal, HPolyC, was also a HBSH specialization, but it used a
different εA∆U hash function, one based on Poly1305 only.  Adiantum's
εA∆U hash function, which is based primarily on the "NH" hash function
like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
consequently, Adiantum is about 20% faster than HPolyC.

This speed comes with no loss of security: Adiantum is provably just as
secure as HPolyC, in fact slightly *more* secure.  Like HPolyC,
Adiantum's security is reducible to that of XChaCha12 and AES-256,
subject to a security bound.  XChaCha12 itself has a security reduction
to ChaCha12.  Therefore, one need not "trust" Adiantum; one need only
trust ChaCha12 and AES-256.  Note that the εA∆U hash function is only
used for its proven combinatorical properties so cannot be "broken".

Adiantum is also a true wide-block encryption mode, so flipping any
plaintext bit in the sector scrambles the entire ciphertext, and vice
versa.  No other such mode is available in the kernel currently; doing
the same with XTS scrambles only 16 bytes.  Adiantum also supports
arbitrary-length tweaks and naturally supports any length input >= 16
bytes without needing "ciphertext stealing".

For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
order to make encryption feasible on the widest range of devices.
Although the 20-round variant is quite popular, the best known attacks
on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
security margin; in fact, larger than AES-256's.  12-round Salsa20 is
also the eSTREAM recommendation.  For the block cipher, Adiantum uses
AES-256, despite it having a lower security margin than XChaCha12 and
needing table lookups, due to AES's extensive adoption and analysis
making it the obvious first choice.  Nevertheless, for flexibility this
patch also permits the "adiantum" template to be instantiated with
XChaCha20 and/or with an alternate block cipher.

We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
where currently the only other suitable options are block cipher modes
such as AES-XTS.  A big problem with this is that many low-end mobile
devices (e.g. Android Go phones sold primarily in developing countries,
as well as some smartwatches) still have CPUs that lack AES
instructions, e.g. ARM Cortex-A7.  Sadly, AES-XTS encryption is much too
slow to be viable on these devices.  We did find that some "lightweight"
block ciphers are fast enough, but these suffer from problems such as
not having much cryptanalysis or being too controversial.

The ChaCha stream cipher has excellent performance but is insecure to
use directly for disk encryption, since each sector's IV is reused each
time it is overwritten.  Even restricting the threat model to offline
attacks only isn't enough, since modern flash storage devices don't
guarantee that "overwrites" are really overwrites, due to wear-leveling.
Adiantum avoids this problem by constructing a
"tweakable super-pseudorandom permutation"; this is the strongest
possible security model for length-preserving encryption.

Of course, storing random nonces along with the ciphertext would be the
ideal solution.  But doing that with existing hardware and filesystems
runs into major practical problems; in most cases it would require data
journaling (like dm-integrity) which severely degrades performance.
Thus, for now length-preserving encryption is still needed.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305
Eric Biggers [Sat, 17 Nov 2018 01:26:30 +0000 (17:26 -0800)]
crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305

Add an ARM NEON implementation of NHPoly1305, an ε-almost-∆-universal
hash function used in the Adiantum encryption mode.  For now, only the
NH portion is actually NEON-accelerated; the Poly1305 part is less
performance-critical so is just implemented in C.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: nhpoly1305 - add NHPoly1305 support
Eric Biggers [Sat, 17 Nov 2018 01:26:29 +0000 (17:26 -0800)]
crypto: nhpoly1305 - add NHPoly1305 support

Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash
function used in the Adiantum encryption mode.

CONFIG_NHPOLY1305 is not selectable by itself since there won't be any
real reason to enable it without also enabling Adiantum support.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: poly1305 - add Poly1305 core API
Eric Biggers [Sat, 17 Nov 2018 01:26:28 +0000 (17:26 -0800)]
crypto: poly1305 - add Poly1305 core API

Expose a low-level Poly1305 API which implements the
ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305 MAC
and supports block-aligned inputs only.

This is needed for Adiantum hashing, which builds an εA∆U hash function
from NH and a polynomial evaluation in GF(2^{130}-5); this polynomial
evaluation is identical to the one the Poly1305 MAC does.  However, the
crypto_shash Poly1305 API isn't very appropriate for this because its
calling convention assumes it is used as a MAC, with a 32-byte "one-time
key" provided for every digest.

But by design, in Adiantum hashing the performance of the polynomial
evaluation isn't nearly as critical as NH.  So it suffices to just have
some C helper functions.  Thus, this patch adds such functions.

Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: poly1305 - use structures for key and accumulator
Eric Biggers [Sat, 17 Nov 2018 01:26:27 +0000 (17:26 -0800)]
crypto: poly1305 - use structures for key and accumulator

In preparation for exposing a low-level Poly1305 API which implements
the ε-almost-∆-universal (εA∆U) hash function underlying the Poly1305
MAC and supports block-aligned inputs only, create structures
poly1305_key and poly1305_state which hold the limbs of the Poly1305
"r" key and accumulator, respectively.

These structures could actually have the same type (e.g. poly1305_val),
but different types are preferable, to prevent misuse.

Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: arm/chacha - add XChaCha12 support
Eric Biggers [Sat, 17 Nov 2018 01:26:26 +0000 (17:26 -0800)]
crypto: arm/chacha - add XChaCha12 support

Now that the 32-bit ARM NEON implementation of ChaCha20 and XChaCha20
has been refactored to support varying the number of rounds, add support
for XChaCha12.  This is identical to XChaCha20 except for the number of
rounds, which is 12 instead of 20.

XChaCha12 is faster than XChaCha20 but has a lower security margin,
though still greater than AES-256's since the best known attacks make it
through only 7 rounds.  See the patch "crypto: chacha - add XChaCha12
support" for more details about why we need XChaCha12 support.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: arm/chacha20 - refactor to allow varying number of rounds
Eric Biggers [Sat, 17 Nov 2018 01:26:25 +0000 (17:26 -0800)]
crypto: arm/chacha20 - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor the NEON
implementation of ChaCha20 to support different numbers of rounds.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: arm/chacha20 - add XChaCha20 support
Eric Biggers [Sat, 17 Nov 2018 01:26:24 +0000 (17:26 -0800)]
crypto: arm/chacha20 - add XChaCha20 support

Add an XChaCha20 implementation that is hooked up to the ARM NEON
implementation of ChaCha20.  This is needed for use in the Adiantum
encryption mode; see the generic code patch,
"crypto: chacha20-generic - add XChaCha20 support", for more details.

We also update the NEON code to support HChaCha20 on one block, so we
can use that in XChaCha20 rather than calling the generic HChaCha20.
This required factoring the permutation out into its own macro.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: arm/chacha20 - limit the preemption-disabled section
Eric Biggers [Sat, 17 Nov 2018 01:26:23 +0000 (17:26 -0800)]
crypto: arm/chacha20 - limit the preemption-disabled section

To improve responsivesess, disable preemption for each step of the walk
(which is at most PAGE_SIZE) rather than for the entire
encryption/decryption operation.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: chacha - add XChaCha12 support
Eric Biggers [Sat, 17 Nov 2018 01:26:22 +0000 (17:26 -0800)]
crypto: chacha - add XChaCha12 support

Now that the generic implementation of ChaCha20 has been refactored to
allow varying the number of rounds, add support for XChaCha12, which is
the XSalsa construction applied to ChaCha12.  ChaCha12 is one of the
three ciphers specified by the original ChaCha paper
(https://cr.yp.to/chacha/chacha-20080128.pdf: "ChaCha, a variant of
Salsa20"), alongside ChaCha8 and ChaCha20.  ChaCha12 is faster than
ChaCha20 but has a lower, but still large, security margin.

We need XChaCha12 support so that it can be used in the Adiantum
encryption mode, which enables disk/file encryption on low-end mobile
devices where AES-XTS is too slow as the CPUs lack AES instructions.

We'd prefer XChaCha20 (the more popular variant), but it's too slow on
some of our target devices, so at least in some cases we do need the
XChaCha12-based version.  In more detail, the problem is that Adiantum
is still much slower than we're happy with, and encryption still has a
quite noticeable effect on the feel of low-end devices.  Users and
vendors push back hard against encryption that degrades the user
experience, which always risks encryption being disabled entirely.  So
we need to choose the fastest option that gives us a solid margin of
security, and here that's XChaCha12.  The best known attack on ChaCha
breaks only 7 rounds and has 2^235 time complexity, so ChaCha12's
security margin is still better than AES-256's.  Much has been learned
about cryptanalysis of ARX ciphers since Salsa20 was originally designed
in 2005, and it now seems we can be comfortable with a smaller number of
rounds.  The eSTREAM project also suggests the 12-round version of
Salsa20 as providing the best balance among the different variants:
combining very good performance with a "comfortable margin of security".

Note that it would be trivial to add vanilla ChaCha12 in addition to
XChaCha12.  However, it's unneeded for now and therefore is omitted.

As discussed in the patch that introduced XChaCha20 support, I
considered splitting the code into separate chacha-common, chacha20,
xchacha20, and xchacha12 modules, so that these algorithms could be
enabled/disabled independently.  However, since nearly all the code is
shared anyway, I ultimately decided there would have been little benefit
to the added complexity.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: chacha20-generic - refactor to allow varying number of rounds
Eric Biggers [Sat, 17 Nov 2018 01:26:21 +0000 (17:26 -0800)]
crypto: chacha20-generic - refactor to allow varying number of rounds

In preparation for adding XChaCha12 support, rename/refactor
chacha20-generic to support different numbers of rounds.  The
justification for needing XChaCha12 support is explained in more detail
in the patch "crypto: chacha - add XChaCha12 support".

The only difference between ChaCha{8,12,20} are the number of rounds
itself; all other parts of the algorithm are the same.  Therefore,
remove the "20" from all definitions, structures, functions, files, etc.
that will be shared by all ChaCha versions.

Also make ->setkey() store the round count in the chacha_ctx (previously
chacha20_ctx).  The generic code then passes the round count through to
chacha_block().  There will be a ->setkey() function for each explicitly
allowed round count; the encrypt/decrypt functions will be the same.  I
decided not to do it the opposite way (same ->setkey() function for all
round counts, with different encrypt/decrypt functions) because that
would have required more boilerplate code in architecture-specific
implementations of ChaCha and XChaCha.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: chacha20-generic - add XChaCha20 support
Eric Biggers [Sat, 17 Nov 2018 01:26:20 +0000 (17:26 -0800)]
crypto: chacha20-generic - add XChaCha20 support

Add support for the XChaCha20 stream cipher.  XChaCha20 is the
application of the XSalsa20 construction
(https://cr.yp.to/snuffle/xsalsa-20081128.pdf) to ChaCha20 rather than
to Salsa20.  XChaCha20 extends ChaCha20's nonce length from 64 bits (or
96 bits, depending on convention) to 192 bits, while provably retaining
ChaCha20's security.  XChaCha20 uses the ChaCha20 permutation to map the
key and first 128 nonce bits to a 256-bit subkey.  Then, it does the
ChaCha20 stream cipher with the subkey and remaining 64 bits of nonce.

We need XChaCha support in order to add support for the Adiantum
encryption mode.  Note that to meet our performance requirements, we
actually plan to primarily use the variant XChaCha12.  But we believe
it's wise to first add XChaCha20 as a baseline with a higher security
margin, in case there are any situations where it can be used.
Supporting both variants is straightforward.

Since XChaCha20's subkey differs for each request, XChaCha20 can't be a
template that wraps ChaCha20; that would require re-keying the
underlying ChaCha20 for every request, which wouldn't be thread-safe.
Instead, we make XChaCha20 its own top-level algorithm which calls the
ChaCha20 streaming implementation internally.

Similar to the existing ChaCha20 implementation, we define the IV to be
the nonce and stream position concatenated together.  This allows users
to seek to any position in the stream.

I considered splitting the code into separate chacha20-common, chacha20,
and xchacha20 modules, so that chacha20 and xchacha20 could be
enabled/disabled independently.  However, since nearly all the code is
shared anyway, I ultimately decided there would have been little benefit
to the added complexity of separate modules.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: chacha20-generic - don't unnecessarily use atomic walk
Eric Biggers [Sat, 17 Nov 2018 01:26:19 +0000 (17:26 -0800)]
crypto: chacha20-generic - don't unnecessarily use atomic walk

chacha20-generic doesn't use SIMD instructions or otherwise disable
preemption, so passing atomic=true to skcipher_walk_virt() is
unnecessary.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: chacha20-generic - add HChaCha20 library function
Eric Biggers [Sat, 17 Nov 2018 01:26:18 +0000 (17:26 -0800)]
crypto: chacha20-generic - add HChaCha20 library function

Refactor the unkeyed permutation part of chacha20_block() into its own
function, then add hchacha20_block() which is the ChaCha equivalent of
HSalsa20 and is an intermediate step towards XChaCha20 (see
https://cr.yp.to/snuffle/xsalsa-20081128.pdf).  HChaCha20 skips the
final addition of the initial state, and outputs only certain words of
the state.  It should not be used for streaming directly.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations
Eric Biggers [Wed, 14 Nov 2018 20:21:11 +0000 (12:21 -0800)]
crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations

'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
in the mask to crypto_alloc_shash() has no effect.  Many users therefore
already don't pass it, but some still do.  This inconsistency can cause
confusion, especially since the way the 'mask' argument works is
somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations
Eric Biggers [Wed, 14 Nov 2018 20:19:39 +0000 (12:19 -0800)]
crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations

'cipher' algorithms (single block ciphers) are always synchronous, so
passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
effect.  Many users therefore already don't pass it, but some still do.
This inconsistency can cause confusion, especially since the way the
'mask' argument works is somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: remove useless initializations of cra_list
Eric Biggers [Wed, 14 Nov 2018 19:35:48 +0000 (11:35 -0800)]
crypto: remove useless initializations of cra_list

Some algorithms initialize their .cra_list prior to registration.
But this is unnecessary since crypto_register_alg() will overwrite
.cra_list when adding the algorithm to the 'crypto_alg_list'.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignments.

Exception: paes_s390.c uses cra_list to check whether the algorithm is
registered or not, so I left that as-is for now.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: inside-secure - remove useless setting of type flags
Eric Biggers [Wed, 14 Nov 2018 19:10:53 +0000 (11:10 -0800)]
crypto: inside-secure - remove useless setting of type flags

Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
Commit 917a4da40338 ("crypto: skcipher - remove useless setting of type
flags") took care of this everywhere else, but a few more instances made
it into the tree at about the same time.  Squash them before they get
copy+pasted around again.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: ecc - regularize scalar for scalar multiplication
Vitaly Chikunov [Sun, 11 Nov 2018 17:40:02 +0000 (20:40 +0300)]
crypto: ecc - regularize scalar for scalar multiplication

ecc_point_mult is supposed to be used with a regularized scalar,
otherwise, it's possible to deduce the position of the top bit of the
scalar with timing attack. This is important when the scalar is a
private key.

ecc_point_mult is already using a regular algorithm (i.e. having an
operation flow independent of the input scalar) but regularization step
is not implemented.

Arrange scalar to always have fixed top bit by adding a multiple of the
curve order (n).

References:
The constant time regularization step is based on micro-ecc by Kenneth
MacKay and also referenced in the literature (Bernstein, D. J., & Lange,
T. (2017). Montgomery curves and the Montgomery ladder. (Cryptology
ePrint Archive; Vol. 2017/293). s.l.: IACR. Chapter 4.6.2.)

Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Cc: kernel-hardening@lists.openwall.com
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Add a 4-block AVX2 variant
Martin Willi [Sun, 11 Nov 2018 09:36:30 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Add a 4-block AVX2 variant

This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.

Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.

The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Add a 2-block AVX2 variant
Martin Willi [Sun, 11 Nov 2018 09:36:29 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Add a 2-block AVX2 variant

This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.

This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Use larger block functions more aggressively
Martin Willi [Sun, 11 Nov 2018 09:36:28 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Use larger block functions more aggressively

Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant
Martin Willi [Sun, 11 Nov 2018 09:36:27 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant

Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.

To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant
Martin Willi [Sun, 11 Nov 2018 09:36:26 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant

Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.

As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
6 years agocrypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant
Martin Willi [Sun, 11 Nov 2018 09:36:25 +0000 (10:36 +0100)]
crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant

Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.

The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>