Monday, September 21, 2009

New GeForce GTX 295

Dr. Box has ordered a new card for the research the I am working on. I have not installed it yet, probably need help from Himanshu. These are the specifications of it.

NVIDIA GeForce GTX 295
480 processor cores (240 per GPU)
Memory 1792 MB GDDR3 ( 896MB per GPU )
Memory Bandwidth 223.8 GB/sec


Friday, September 18, 2009

5 T-Shirts for 5 Bucks!

I was just thinking to buy some new t-shirts. And yesterday, I found some on sale in Walgreen. There were labeled of 5 for $10 on adult ones and 5 for $5 on youth ones. I picked 5 youth shirts because with my size, I could wear medium size of youth. :)

The fabric was okay but more important, they were cheap. ^^


Landscape in Latex & Tex to Word

Found in Tex Blog.

To change the whole document use
\documentclass[landscape, 12pt]{report}
or
\usepackage[landscape]{geometry}

To change only single pages
\usepackage{lscape}
with
\begin{landscape}
...
\end{landscape}


Tex to Word
latex2rtf
Converters from LaTeX to PC Textprocessors


Wednesday, September 16, 2009

Wet Wet Wet

I did not get burned by the sun yesterday but I got wet...

I had realized that it was going to rain sooner or later before I left home around noon. Still, I forgot my umbrella. Stupid me!

It started sprinkling when I was in the middle of the way and then pouring. I did not run after all. I knew that I could not avoid getting wet.

Alright. Just soggy...



Saturday, September 05, 2009

First Day On the Road

In the afternoon, James took me to go practice driving. We started around a quarter past two and reached home around four thirty. Two hours on the road!

James took me to somewhere that the traffic was not busy. I would say... it was almost empty. I started driving with the speed of 20 mph. So slow... The area was not flat and the road was not so straight though. I had to drive up and down hills all the time. I could not control the car to be in the lane at first. It took me a while to be able to control it. After that I drove faster but never made it over 50 mph. I also could not keep the car in the right lane when I made a turn. Fortunately, the roads are empty. =S

We passed some cities which were not busy as well. And when the streets were narrow, I kept went far to the right. The oncoming cars on the left scared me. So James had to push the wheel back for me sometimes. T_T

Anyway, I could make my way home safely. =)

===

James just got me the route a moment ago. Started at A and finished at F which was our house.


View Larger Map



Friday, September 04, 2009

Time to Beat Hearts!

Yeah! I just got the learner's permit today. Let's get a car and out into the street. haha

Well, I did not think I could pass the writing test since I got about 5 from 40 questions incorrectly. At any rate, I've passed. =D

Now all I have to do is just practice driving. Let's do it!


Wednesday, September 02, 2009

One Week In The New House

A week has passed since I moved to the new apartment. Everything is okay.

I walked to Tech on Monday. It took me half an hour to get there. Even the weather was not so hot, I was sweaty when I got back. I also got some cookies from my neighbor; my landlady's daughter in law.

Yesterday, I spent the whole afternoon with my friends, Gan and Bell, in Shreveport. We had lunch at some sort of Vietnamese restaurant, went shopping, bought some Asian food.


Wednesday, August 26, 2009

Pix of My New House

Here you go!




Tuesday, August 25, 2009

New House

I have moved to the new apartment, next to James and P Noon. I moved on Monday afternoon and have spent all day cleaning. I am sick of it. Pray not to have to clean for years. Now I am living on my own. Hopefully everything will be alright.


Thursday, August 13, 2009

They Were Really Raccoons!

When I walked to Aig and Monn's house last night (around 9), I saw them strolled around the trash by Dawg House, and there were two of them. By the time I see them I stopped and they also stopped and gazed at me. They were really raccoons with black and gray. There were black circles around their eyes, and black rings along there tails, and black feet, and furry but quite wet 'cause it had rained in the evening. They ran away when I started walking again. They were so cute though. =)


Wednesday, August 12, 2009

It Must Be a Raccoon!

I saw something while I was waking back from Aig and Monn's House last night. It was a small thing, sprang away from the trash by Dawg House (kindda burger house). It was so fast that I could see it in a wink. It was gray. I thought it was a cat at first. But how a cat had a bushy tail with couple of black rings at the end like that? So I supposed it had to be a raccoon, MUST be a raccoon for sure. There is no raccoon in Thailand. This was the first time I saw a raccoon in nature. How interesting. Hopefully, I can see it again tonight.


Monday, August 10, 2009

It Has Been Destroyed

There was a building that used to be an old apartment. No one lived there for months. And today it was being destructed. The workers used like a mobile with an arm for digging and scooping. It gently hit and took down the wall while someone was spraying the water on the wall.

And today is the last day of the class. I was going to have three days to finish my homework and prepare for the final test. Pray for me. =(


Sunday, August 02, 2009

New Pan!

Just got a new pan from Wal-Mart.





Compared to the old ones, it is even bigger and not too heavy.


Okey. Let the first dish of it be fried Tilapia.


Wednesday, July 29, 2009

Threads, Blocks and Grids

From NVIDIA Optimizing CUDA
“Experiment” mode discovers and saves optimal configuration


Blocks per Grid Heuristics
  • Number of blocks should be more than number of multiprocessors, so all multiprocessors have at least one block to execute.
  • Number of blocks divided by number of multiprocessors should be more than 2
    • Multiple blocks can run concurrently in a multiprocessor
    • Blocks that aren’t waiting at a __syncthreads() keep the hardware busy
    • Subject to resource availability –registers, shared memory
  • Number of blocks that is more than 100 will scale to future devices


Optimizing threads per block
  • Choose threads per block as a multiple of warp size to avoid wasting computation on under-populated warps and to facilitate coalescing
  • Run as many warps as possible per multiprocessor (hide latency)
  • Multiprocessor can run up to 8 blocks at a time
  • Heuristics
    • Minimum: 64 threads per block (Only if multiple concurrent blocks)
    • 192 or 256 threads a better choice (Usually still enough regs to compile and invoke successfully)
    • However, this all depends on your computation, so experiment!

Zero Copy (Mapped Memory): Directly Access Host Memory from The Device

Came across with this Zero Copy which has just been introduced in CUDA 2.2 when I was looking for the concept of threads and blocks.

Quoted from NVIDIA Optimizing CUDA


  • Access host memory directly from device code
  • Transfers implicitly performed as needed by device code
  • All set-up is done on host using mapped memory

What should be considered when using Zero Copy
  • Zero copy will always be a win for integrated devices that utilize CPU memory (check this using the integratedfield in cudaDeviceProp)
  • Zero copy will be faster if data is only read/written from/to global memory once: Copy input to GPU, One kernel run, Copy output to CPU
  • Potentially easier and faster alternative to using cudaMemcpyAsync
  • Current devices use pointers that are 32-bit so there is a limit of 4GB per context



Quoted from section 3.2.5.3 Mapped Memory in CUDA Programming Guide 2.2

On some devices, a block of page-locked host memory can also be mapped into the device’s address space by passing flag cudaHostAllocMapped to cudaHostAlloc(). Such a block has therefore two addresses: one in host memory and one in device memory. The host memory pointer is returned by cudaHostAlloc() and the device memory pointer can be retrieved using cudaHostGetDevicePointer() and used to access the block from within a
kernel.

Accessing host memory directly from within a kernel has several advantages:
  • There is no need to allocate a block in device memory and copy data between this block and the block in host memory; data transfers are implicitly performed as needed by the kernel;
  • There is no need to use streams (see Section 3.2.6.1) to overlap data transfers with kernel execution; the kernel-originated data transfers automatically overlap with kernel execution.

Since mapped page-locked memory is shared between host and device however, the application must synchronize memory accesses using streams or events (see Section 3.2.6) to avoid any potential read-after-write, write-after-read, or write-after-
write hazards.

A block of page-locked host memory can be allocated as both mapped and portable (see Section 3.2.5.1), in which case each host thread that needs to map the block to its device address space must call cudaHostGetDevicePointer() to retrieve a device pointer, as device pointers will generally differ from one host thread to the other.

To be able to retrieve the device pointer to any mapped page-locked memory within a given host thread, page-locked memory mapping must be enabled by calling cudaSetDeviceFlags() with the cudaDeviceMapHost flag before any other CUDA operations is performed by the thread. Otherwise, cudaHostGetDevicePointer() will return an error.

cudaHostGetDevicePointer() also returns an error if the device does not support mapped page-locked host memory.

Applications may query whether a device supports mapped page-locked host memory or not by calling cudaGetDeviceProperties() and checking the canMapHostMemory property.



CUDA Wrapping

Quoted from Section 4.1 of CUDA Programming Guide 2.2.1: A Set of SIMT Multiprocessors with On-Chip Shared Memory


The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity as illustrated in Figure 4-1. The threads of a thread block execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.

A multiprocessor consists of eight Scalar Processor (SP) cores, two special function units for transcendentals, a multithreaded instruction unit, and on-chip shared memory. The multiprocessor creates, manages, and executes concurrent threads in hardware with zero scheduling overhead. It implements the __syncthreads() barrier synchronization intrinsic with a single instruction. Fast barrier synchronization together with lightweight thread creation and zero-overhead thread scheduling efficiently support very fine-grained parallelism, allowing, for example, a low granularity decomposition of problems by assigning one thread to each data element (such as a pixel in an image, a voxel in a volume, a cell in a grid-based computation).

To manage hundreds of threads running several different programs, the multiprocessor employs a new architecture we call SIMT (single-instruction, multiple-thread). The multiprocessor maps each thread to one scalar processor core, and each scalar thread executes independently with its own instruction address and register state. The multiprocessor SIMT unit creates, manages, schedules, and executes threads in groups of 32 parallel threads called warps. (This term originates from weaving, the first parallel thread technology. A half-warp is either the first or second half of a warp.) Individual threads composing a SIMT warp start together at the same program address but are otherwise free to branch and execute independently.

When a multiprocessor is given one or more thread blocks to execute, it splits them into warps that get scheduled by the SIMT unit. The way a block is split into warps is always the same; each warp contains threads of consecutive, increasing thread IDs with the first warp containing thread 0. Section 2.2 describes how thread IDs relate to thread indices in the block.

Every instruction issue time, the SIMT unit selects a warp that is ready to execute and issues the next instruction to the active threads of the warp. A warp executes one common instruction at a time, so full efficiency is realized when all 32 threads of a warp agree on their execution path. If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete, the threads converge back to the same execution path. Branch divergence occurs only within a warp; different warps execute independently regardless of whether they are executing common or disjointed code paths.

SIMT architecture is akin to SIMD (Single Instruction, Multiple Data) vector organizations in that a single instruction controls multiple processing elements. A key difference is that SIMD vector organizations expose the SIMD width to the software, whereas SIMT instructions specify the execution and branching behavior of a single thread. In contrast with SIMD vector machines, SIMT enables programmers to write thread-level parallel code for independent, scalar threads, as well as data-parallel code for coordinated threads. For the purposes of correctness, the programmer can essentially ignore the SIMT behavior; however, substantial performance improvements can be realized by taking care that the code seldom requires threads in a warp to diverge. In practice, this is analogous to the role of cache lines in traditional code: Cache line size can be safely ignored when designing for correctness but must be considered in the code structure when designing for peak performance. Vector architectures, on the other hand, require the software to coalesce loads into vectors and manage divergence manually.



Not quite understood but was trying to read more.


Monday, July 20, 2009

Out to Swim (Again)

I intended to go swimming this afternoon... Really intended!

เมื่อตอนกลางวัน เราตั้งใจจะไปว่ายน้ำ ตั้งใจมากๆ

I left my house at nearly four, felt good that the sunlight was not so strong like the other days. I was not sure how long I had been swimming. Perhaps about half an hour. Then the rain came. Though it was an indoor swimming pool, the staff called everyone to get out of the pool. Do not ask me why. I do not know.

เราออกจากบ้านเมื่อตอนเกือบสี่โมง ดีที่วันนี้แดดไม่จัดมาก ไปถึงก็ว่ายน้ำ ไม่แน่ใจว่าได้ว่ายอยู่นานเท่าไหร่ ประมาณครึ่งชั่วโมงได้มั้ง แล้วฝนก็เทลงมา จริงๆ สระมันอยู่ในตึก แต่คนดูแลก็เรียกทุกคนขึ้นจากสระ อย่าถามว่าทำไม หนูไม่ยู้

At least I got some exercises after all. I walked to the pool, swam, and walked back. hehe

เอาน่ะ อย่างน้อยหนูก็ได้ออกกำลังกายแหละ เริ่มจากเดินไปสระ ว่ายน้ำ แล้วก็เดินกลับ อิอิ


Wednesday, July 15, 2009

Summer Class

It has been three days since the class began. I have class everyday this quarter and there are only six weeks to the end of quarter. The weather is so hot that I do not want to step outside. :(

These days I have been addicted to comics that Pao (my senior) has introduced me. It is bad. I should have been focusing on studying and my research.


Wednesday, July 08, 2009

At McDonald's

Today, I had lunch at McDonald's. It was the first meal that I ate out since I got back to Ruston. It was because I wanted somewhere to hide from the sun. I had and appointment with the landlady of my new apartment at 10 AM. So I left my house around 9:10, walked, and got there around 9:40. It was so hot because of the sunlight. After having a talk with the landlady I walked back. The landlady actually offered me a ride but I would like to stop by somewhere (which i realized that it was closed on Wednesdays afterward :( ).

วันนี้ไปกินกลางวันที่แมคโดนัลด์มา เป็นมื้อแรกที่ออกไปกินข้างนอกหลังจากกลับมาถึง Ruston ทั้งนี้ทั้งนั้น เพราะไอ้เอ๋อยากหาที่หลบแดด เดินไปอพาร์ตเมนต์ใหม่ตั้งแต่เก้าโมงสิบ เพราะนัดกับเจ้าของอพาร์ตเมนต์ไว้ตอนสิบโมง ไปถึงนั่นเก้าโมงสี่สิบ อากาศโคตรร้อน แดดแรงมาก พอคุยกับเจ้าของเสร็จแล้วก็เดินกลับ จริงๆ เจ้าของบ้านก็บอกว่าจะมาส่ง แต่ไอ้เอ๋จะแวะซื้อของก่อน (ซึ่งมารู้ที่หลังว่ามันปิดวันพุธ :( เซ็งเป็ด! )

I stopped by County Market around 10:30, grabbed some bread, instant noodles, etc. Then I crossed the street, walked into McDonald's and made an order, paid for it and waited. I waited in front of the counter for couple of minutes with no one got me my meal. :(

ทีนี้ก็เลยแวะ County Market ตอนประมาณสิบโมงครึ่ง ซึ้อขนมปังกับบะหมี่กึ่งสำเร็จรูป แล้วก็อย่างอื่นอีกหน่อย แล้วก็เลยข้ามถนนไปแมค สั่งๆ จ่ายๆ แล้วยืนรอ รอเงกอยู่หน้าเคาน์เตอร์ตั้งนานก็ยังไม่มีใครเอาแฮมเบอร์เกอร์ตูมาให้

However, there at last was a black girl realized that I was waiting for something. She asked me and I told her I was waiting for my burger and fries. She got me my cheeseburger, and another black girl told me to wait for the fries. The were not done yet. After couple of minutes, and the fries were getting done, the first black girl asked me again because she saw me still waiting there.

และแล้วก็มีสาวผิวหมึกคนหนึ่งเห็นไอ้เอ๋รออะไรอยู่ พี่แกเลยเข้ามาถาม ไอ้เอ๋ก็เลยบอกไปว่ารอเบอร์เกอร์กับเฟรนช์ฟรายของตูอยู่ พี่แกก็หยิบชีสเบอร์เกอร์มาให้ สาวหมึกอีกคนก็บอกให้รอเฟรนช์ฟรายก่อน กำลังทอดอยู่ ไม่กี่นาทีต่อมา เฟรนช์ฟรายเสร็จแล้ว สาวผิวหมึกคนแรกก็มาถามต่อ ว่าตูรออะไรอีก

"My fries..." I told her.

"เฟรนช์ฟรายหนูล่ะ..."

I finally got my fries with bunch of ketchup packets. :)

ไอ้เอ๋ได้เฟรนช์ฟรายในที่สุด พร้อมกับห่อซอสมะเขืออีกกำนึง :)