Math 7924 – Readings in Computational Mathematics, Spring 2009

Discovering GPU Computing

 

Jan Mandel

 

Back to the home page and syllabus

 

Course materials

 

Class presentations

 

Jan 27:  introduction slides, first sample code , glimpse of my finite difference code

Feb 5: (exception – Thursday): CUDA basics, example from Dr Dobb’s

Feb 10: compiling and running CUDA code including the Laplacian challenge

Feb 17: walkthrough the Laplacian model code; how they did it in SFB-Report-2008-025
Feb 24: running on the installed TESLA. Get code on ws150 tesla1 or tesla2


Instructions to run


ssh to tesla1 tesla2 or ws150
on tesla* use device 0 or 1  (TESLA card)
on ws150 use device 0 (8800GT card)


to get the code:

cp -a /mathhome/faculty/jmandel/public_html/jmcuda/laplace .


if you use csh put in your ~/.cshrc the lines
setenv CUDA /opt/cuda
setenv PATH "/opt/g95/g95-install/bin:${CUDA}/bin:${PATH}"
setenv LD_LIBRARY_PATH ${CUDA}/lib
setenv CUDA_LAUNCH_BLOCKING 1
setenv F90 g95
setenv FC $F90


if you use bash put in your ~/.bashrc
export CUDA="/opt/cuda"
export PATH="/opt/g95/g95-install/bin:${CUDA}/bin:${PATH}"
export LD_LIBRARY_PATH="${CUDA}/lib"
export CUDA_LAUNCH_BLOCKING="1"
export F90="g95"
export FC="$F90"




 

Codes

 

spmtv  - my first cuda code, sparse matrix-vector multiplication. Quick and dirty code, runs slow, but at least it compiles and runs right.

laplace -  a very simple demo for the Laplace equation, runs 2x faster than the host code on MacBook Pro.

 

Dr. Dobb’s: CUDA, Supercomputing for the Masses by Rob Farber

 

http://www.ddj.com/architect/207200659 Part 1: Introduction, performance to be gained

http://www.ddj.com/architect/207402986 Part 2: The first example

http://www.ddj.com/architect/207603131 Part 3: Error handling, global memory performance limitations

http://www.ddj.com/architect/208401741 Part 4: CUDA execution and memory model

http://www.ddj.com/architect/208801731 Part 5: CUDA memory kinds and performace comparison

http://www.ddj.com/architect/208801731?pgno=2 Part 5-2: Shared memory example

http://www.ddj.com/architect/209601096 Part 6: Global memory and the CUDA profiler

http://www.ddj.com/hpc-high-performance-computing/215900921 Part 11: Texture memory

NVIDIA

 

NVIDIA_CUDA_Programming_Guide_2.0.pdf -the official documentation

CudaReferenceManual_2.0.pdf

http://forums.nvidia.com/index.php?showtopic=83825 Sparse matrix vector multiplication

 

 

Other links

 

http://www.cis.udel.edu/~cavazos/cisc879/Lecture-04.ppt Nvidia CUDA Programming Basics pdf version

http://mc.stanford.edu/cgi-bin/images/0/0a/M02_4.pdf  Optimizing CUDA

www.cardiff.ac.uk/arcca/services/events/NovelArchitecture/Mike-Giles.pdf - GPUs - the next big advance in HPC?

http://www.ast.cam.ac.uk/~stg20/cuda/programming/index.html CUDA programming notes

 

 

From NVIDIA:

 

http://www.nvidia.com/cuda NVIDIA CUDA home page download SDK, drivers, documentation

 

http://www.nvidia.com/object/cuda_education.html - classes

 

http://www.nvidia.com./tesla - about the Tesla hardware

 

 

From  Manfred Liebmann, University of Gratz, Austria

 

 

SFB-Report-2008-025 Comparing CUDA and OpenGL for Jacobi Iteration 

 

Seminar Slides:

http://www.uni-graz.at/~liebma/CUDA/CUDA1.pdf

http://www.uni-graz.at/~liebma/CUDA/CUDA2.pdf

http://www.uni-graz.at/~liebma/CUDA/CUDA3.pdf

http://www.uni-graz.at/~liebma/CUDA/CUDA4.pdf

http://www.uni-graz.at/~liebma/CUDA/CUDA5.pdf

http://www.uni-graz.at/~liebma/CUDA/CUDA6.pdf

 

Parallel Toolbox Slides:

http://www.uni-graz.at/~liebma/CUDA/ParallelToolbox.pdf

 

Tutorial:

http://www.uni-graz.at/~liebma/CUDA/NVISION08-Getting_Started_with_CUDA.pdf

 

Cuda Programming Guide:

http://www.uni-graz.at/~liebma/CUDA/NVIDIA_CUDA_Programming_Guide_2.0.pdf