SPEED-B – Software performance enhancement for encryption and decryption, and benchmarking

Abstracts

Diego F. Aranha, Lightweight cryptography on ARM

We describe techniques for implementing cryptographic algorithms in software for resource-constrained ARM devices. The target platforms are the Cortex-M and Cortex-A family of processors typical of embedded systems, located towards the mid to lower-end of the ARM spectrum of architectures. The implementations include the Fantomas and PRESENT lightweight block ciphers and curve-based primitives for key exchange and digital signatures. We improve on the state-of-the-art implementations of these algorithms substantially, both in terms of efficiency or compactness, by making use of novel algorithmic techniques and features specific of the target platforms.

Daniel J. Bernstein, Benchmarking benchmarking, and optimizing optimization

Manual software optimization of a single performance-critical function often experimentally evaluates the performance of hundreds of implementations of the function. This talk will analyze the performance of various aspects of the evaluation process.

Agner Fog, Optimizing software performance using vector instructions

Microprocessor factories have a problem obeying Moore's law because of physical limitations. The answer is increasing parallelism in the form of multiple CPU cores and vector instructions (Single Instruction Multiple Data - SIMD). This is a challenge to software developers who have to adapt to a moving target of new instruction set additions and increasing vector sizes. Most of the software industry is lagging several years behind the available hardware because of these problems. Other challenges are tasks that cannot easily be executed with vector instructions, such as sequential algorithms and lookup tables. The talk will discuss methods for overcoming these problems and utilize the continuously growing power of microprocessors on the market. A few problems relevant to cryptographic software will be covered, and the outlook for the future will be discussed.

Philipp Jovanovic, Improved Masking for Tweakable Blockciphers with Applications to Authenticated Encryption

A popular approach to tweakable blockcipher design is via masking, where a certain primitive (a blockcipher or a permutation) is preceded and followed by an easy-to-compute tweak-dependent mask. We revisit the principle of masking and introduce the tweakable Even-Mansour construction MEM. Its masking function combines the advantages of word-oriented LFSR- and powering-up-based methods. We show in particular how recent advancements in computing discrete logarithms over finite fields of characteristic 2 can be exploited in a constructive way to realize highly efficient, constant-time masking functions. If the masking satisfies a set of simple conditions, then MEM is a secure tweakable blockcipher up to the birthday bound. The strengths of MEM are exhibited by the design of fully parallelizable authenticated encryption schemes OPP (nonce-respecting) and MRO (misuse-resistant). If instantiated with a reduced-round BLAKE2b permutation, OPP and MRO achieve speeds up to 0.55 and 1.06 cycles per byte on the Intel Haswell microarchitecture, and are able to significantly outperform their closest competitors.

Jens-Peter Kaps, eXtended eXternal Benchmarking eXtension (XXBX)

Embedded devices are becoming increasingly interconnected through the move to the Internet of Things (IoT). Formerly "dumb" appliances now have the ability to connect to the Internet. Therefore, their communications need to be secured via encryption.
The eXternal Benchmarking eXtension (XBX) was developed as an extension to the System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives (SUPERCOP) to evaluate the performance of hash functions on several microcontrollers. As part of the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR), we overhauled XBX to cover Authenticated Encryption and Associated Data (AEAD) ciphers, ported the test harness to a more capable platform, and proposed a means to measure power. As part of this work we completely rewrote a portion of the software in python 3 and store the results in a SQLite database. Currently an effort is underway to support more target devices and benchmark all round 3 candidates of the CAESAR competition.

Rich Salz, What we need and want from you

Like the story of the tree falling in the forest, what good is the world's fastest cryptography if nobody uses it?
I'll talk about the major user communities these days -- browsers, standards, and open source. I'll talk about how they work, and how you can work with them.