On September 28, 2009, a workstation Genesis GE-i940 Tesl, based on both GPGPU* and nVidia/CUDA** Technologies has been installed at DSA/LabMNCP.

It is a testbed for developing advanced simulation in the following research field:

  • Stochastic simulation;
  • Molecular Dynamics;
  • Atmospheric and climate modeling;
  • Weather forecast investigation;
  • Grid/Cloud Hybrid Virtualization;

*

“GPGPU stands for General-Purpose computation on Graphics Processing Units, also known as GPU Computing. Graphics Processing Units (GPUs) are high-performance many-core processors capable of very high computation and data throughput. See more here.”

**

“NVIDIA® CUDA™ is a general purpose parallel computing architecture that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex computational problems in a fraction of the time required on a CPU. See more here. “

Hardware

Mainboard

Asus x58/ICH10R 3 PCI-Express x16, 6 SAT, 2 SAS, 3+6 USB

CPU

i7-940 2,93 133 GHz fsb, Quad Core 8 Mb cache

RAM

6 x 2Gb DRR 3 1333 DIM

Hard Disk

2 x 500 Gb SATA 16Mb cache 7.200 RPM

GPU

1 Quadro FX5800 4Gb RAM

2 x Tesla C1060 4 Gb RAM

Software

OS: GNU/Linux CentOs 5.3 64 Bit

Driver: nVidia Cuda 180.22 Linux 64bit

VMware: VMware-server-2.0.2

OUTPUT of First Test:

  Serial simulation (ms)  GPU (ms)
 execution time for malloc  0.02    175.21 ms
 execution time for RndGnr  51430.92  2283.19
 execution time for init  275.48   0.31
 execution time for computing  391391.12  329.19 ms
 execution time for I/O  56822.77  64740.54 ms
 execution time for GPU/CPU    198.43 ms

Output using GPU,
device 0           : Quadro FX 5800
device 1           : Tesla C1060
device 2           : Tesla C1060

Selected device: 2 <<<<<<<<<<<<<<<<<<

device 2           : Tesla C1060
major/minor        : 1.3 compute capability
Total global mem   : -262144 bytes
Shared block mem   : 16384 bytes
RegsPerBlock       : 16384
WarpSize           : 32
MaxThreadsPerBlock : 512
TotalConstMem      : 65536 bytes
ClockRate          : 1296000 (kHz)
deviceOverlap      : 1
deviceOverlap      : 1
MultiProcessorCount: 30

******
Using 1048576 particles
100 time steps
******