Parallel
Improved Schnorr-Euchner Enumeration SE++ on Shared
and Distributed Memory Systems, With and Without Extreme Pruning 1Darmstadt
University of Technology, Darmstadt, Germany fabio.lei.67@gmail.com,
artur.mariano@sc.tu-darmstadt.de, christian.bischof@tu-darmstadt.de 2University
of Minho, Braga, Portugal aproenca@di.uminho.pt 3Chalmers
University of Technology, Gothenburg, Sweden agrell@chalmers.se Abstract The security of lattice-based cryptography relies on
the hardness of problems based on lattices, such as the Shortest Vector
Problem (SVP) and the Closest Vector Problem (CVP). This paper presents two parallel
implementations for the SE++ with and without extreme pruning. The SE++ is an
enumeration-based CVP-solver, which can be easily adapted to solve the SVP.
We improved the SVP version of the SE++ with an optimization that avoids
symmetric branches, improving its performance by a factor of » 50%, and
applied the extreme pruning technique to this improved version. The extreme
pruning technique is the fastest way to compute the SVP with enumeration
known to date. It solves the SVP for lattices in much higher dimensions in
less time than implementations without extreme pruning. Our parallel
implementation of the SE++ with extreme pruning targets distributed memory
multi-core CPU systems, while our SE++ without extreme pruning is designed
for shared memory multi-core CPU systems. These implementations address load
balancing problems for optimal performance, with a master-slave mechanism on
the distributed memory implementation, and specific bounds for task creation
on the shared memory implementation. The parallel implementation for the SE++
without extreme pruning scales linearly for up to 8 threads and almost
linearly for 16 threads. In addition, it also achieves super-linear speedups
on some instances, as the workload may be shortened, since some threads may
find shorter vectors at earlier points in time, compared to the sequential
implementation. Tests with our Improved SE++ implementation showed that it
outperforms the state of the art implementation by a factor of between 35%
and 60%, while maintaining a scalability similar to
the SE++ implementation. Our parallel implementation of the SE++ with extreme
pruning achieves linear speedups for up to 8 (working) processes and speedups
of up to 13x for 16 (working) processes. Keywords: Enumeration, Parallel, Shared Memory,
Distributed Memory, OpenMP, MPI +: Corresponding author: Fábio
Correia Journal
of Wireless Mobile Networks, Ubiquitous Computing, and Dependable
Applications (JoWUA) |