A Little Life in the World: Threads, Blocks and Grids

Wednesday, July 29, 2009

Threads, Blocks and Grids

“Experiment” mode discovers and saves optimal configuration

Blocks per Grid Heuristics
Number of blocks should be more than number of multiprocessors, so all multiprocessors have at least one block to execute.
Number of blocks divided by number of multiprocessors should be more than 2
Multiple blocks can run concurrently in a multiprocessor
Blocks that aren’t waiting at a __syncthreads() keep the hardware busy
Subject to resource availability –registers, shared memory
Number of blocks that is more than 100 will scale to future devices

Optimizing threads per block
Choose threads per block as a multiple of warp size to avoid wasting computation on under-populated warps and to facilitate coalescing
Run as many warps as possible per multiprocessor (hide latency)
Multiprocessor can run up to 8 blocks at a time
Heuristics
Minimum: 64 threads per block (Only if multiple concurrent blocks)
192 or 256 threads a better choice (Usually still enough regs to compile and invoke successfully)
However, this all depends on your computation, so experiment!

A Little Life in the World

Wednesday, July 29, 2009

Threads, Blocks and Grids

No comments:

Post a Comment