I am trying to come up with a model, so have to try to do observe behavior of some other applications. It is now a turn of Matrix Multiplication (without streams). Both matrixes had the same dimensions and ran on 8800 GT.
BLOCK_SIZE 1 (Matrix size of 5 x 5 elements)
copy time of A = 0.027000 ms
copy time of B = 0.023000 ms
kernel time = 0.037000 ms
time of result copy = 0.026000 ms
BLOCK_SIZE 2 (Matrix size of 10 x 10 elements)
copy time of A = 0.028000 ms
copy time of B = 0.023000 ms
kernel time = 0.059000 ms
time of result copy = 0.031000 ms
BLOCK_SIZE 4 (Matrix size of 20 x 20 elements)
copy time of A = 0.034000 ms
copy time of B = 0.029000 ms
kernel time = 0.059000 ms
time of result copy = 0.028000 ms
BLOCK_SIZE 8 (Matrix size of 40 x 40 elements)
copy time of A = 0.039000 ms
copy time of B = 0.029000 ms
kernel time = 0.060000 ms
time of result copy = 0.044000 ms
BLOCK_SIZE 16 (Matrix size of 80 x 80 elements)
copy time of A = 0.054000 ms
copy time of B = 0.046000 ms
kernel time = 0.056000 ms
time of result copy = 0.093000 ms
BLOCK_SIZE 32 (Matrix size of 160 x 160 elements)
copy time of A = 0.138000 ms
copy time of B = 0.119000 ms
kernel time = 0.111000 ms
time of result copy = 0.247000 ms
Test FAILED (GPU returns 0.00)
#####
Next...
Multiplier (of BLOCK_SIZE) should be varied to get more results from various dimension of matrixes.
 
No comments:
Post a Comment