It just came up in my mind that the block size in the previous test has been fixed to 512, one-dimension. But what would the results be if it changed.
By fixing the number of streams to 8 and the amount of data to 16M, the results are:
512x1
memcopy: 41.35
kernel: 39.56
non-streamed: 80.28 (80.92 expected)
8 streams: 42.80 (44.73 expected)
-------------------------------
256x1
memcopy: 41.36
kernel: 0.13
non-streamed: 40.75 (41.49 expected)
8 streams: 42.80 (5.30 expected)
-------------------------------
128x1
memcopy: 41.36
kernel: 0.13
non-streamed: 40.74 (41.49 expected)
8 streams: 42.81 (5.30 expected)
-------------------------------
64x1
memcopy: 41.36
kernel: 0.12
non-streamed: 40.75 (41.48 expected)
8 streams: 42.82 (5.29 expected)
-------------------------------
and the test failed for 1024x1 block size. Confused!!
No comments:
Post a Comment