Fastplay-A Parallelization Model and Implementation of SMC on CUDA based GPU Cluster Architecture

نویسندگان

  • Shi Pu
  • Pu Duan
  • Jyh-Charn Liu
چکیده

We propose a four-tiered parallelization model for acceleration of the secure multiparty computation (SMC) on the CUDA based Graphic Processing Unit (GPU) cluster architecture. Specification layer is the top layer, which adopts the SFDL of Fairplay for specification of secure computations. The SHDL file generated by the SFDL compiler of Fairplay is used as inputs to the function layer, for which we developed both multi-core and GPU based control functions for garbling of various types of Boolean gates, and ECC-based 1-out-of-2 Oblivious Transfer (OT). These high level control functions invoke computation of 3-DGG (3-DES gate garbling), EGG (ECC based gate garbling), and ECC based OT that run at the secure protocol layer. An ECC Arithmetic GPU Library (EAGL), which co-run on the GPU cluster and its host, manages utilization of GPUs in parallel computing of ECC arithmetic. Experimental results show highly linear acceleration of ECC related computations when the system is not overloaded; When running on a GPU cluster consisted of 6 Tesla C870 devices, with GPU devices fully loaded with over 3000 execution threads, Fastplay achieved 35~40 times of acceleration over a serial implementation running on a 2.53GHz duo core CPU and 4GB memory. When the execution thread count exceeds this number, the speed up factor remains fairly constant, yet slightly increased. Keywordssecure multiparty computation (SMC), Oblivious Transfer (OT), ECC, Graphic Processing Unit (GPU)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...

متن کامل

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

K-Means on the Graphics Processor: Design And Experimental Analysis

Apart from algorithmic improvements many intensive machine learning algorithms can gain performance by parallelization. Programmable graphics processing units (GPU) offer a highly data parallel architecture that is suitable for many computational tasks in machine learning. We present an optimized k-means implementation on the graphics processing unit. NVIDIA’s Compute Unified Device Architectur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IACR Cryptology ePrint Archive

دوره 2011  شماره 

صفحات  -

تاریخ انتشار 2011