Parallel Improved Schnorr-Euchner Enumeration SE++ on Shared and Distributed Memory Systems, With and Without Extreme Pruning

Fábio Correia¹⁺, Artur Mariano¹, Alberto Proença², Christian Bischof¹, and Erik Agrell³

¹Darmstadt University of Technology, Darmstadt, Germany

fabio.lei.67@gmail.com, artur.mariano@sc.tu-darmstadt.de, christian.bischof@tu-darmstadt.de

²University of Minho, Braga, Portugal

aproenca@di.uminho.pt

³Chalmers University of Technology, Gothenburg, Sweden

agrell@chalmers.se

Abstract

The security of lattice-based cryptography relies on the hardness of problems based on lattices, such as the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP). This paper presents two parallel implementations for the SE++ with and without extreme pruning. The SE++ is an enumeration-based CVP-solver, which can be easily adapted to solve the SVP. We improved the SVP version of the SE++ with an optimization that avoids symmetric branches, improving its performance by a factor of » 50%, and applied the extreme pruning technique to this improved version. The extreme pruning technique is the fastest way to compute the SVP with enumeration known to date. It solves the SVP for lattices in much higher dimensions in less time than implementations without extreme pruning. Our parallel implementation of the SE++ with extreme pruning targets distributed memory multi-core CPU systems, while our SE++ without extreme pruning is designed for shared memory multi-core CPU systems. These implementations address load balancing problems for optimal performance, with a master-slave mechanism on the distributed memory implementation, and specific bounds for task creation on the shared memory implementation. The parallel implementation for the SE++ without extreme pruning scales linearly for up to 8 threads and almost linearly for 16 threads. In addition, it also achieves super-linear speedups on some instances, as the workload may be shortened, since some threads may find shorter vectors at earlier points in time, compared to the sequential implementation. Tests with our Improved SE++ implementation showed that it outperforms the state of the art implementation by a factor of between 35% and 60%, while maintaining a scalability similar to the SE++ implementation. Our parallel implementation of the SE++ with extreme pruning achieves linear speedups for up to 8 (working) processes and speedups of up to 13x for 16 (working) processes.

Keywords: Enumeration, Parallel, Shared Memory, Distributed Memory, OpenMP, MPI

+: Corresponding author: Fábio Correia
Institute for Scientific Computing, Darmstadt University of Technology, Mornewegstraße 30, 64283 Darmstadt, Germany, Tel: +49 6151 16-27278

Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA)
Vol. 7, No. 4, pp. 1-19, December 2016 [pdf]