Is Threadfence Needed for Cuda Volatile Variables? -


Each shared / global memory goes to read / write directly to shared / global memory Whether this thread is automatically completed makes? For example:

  unstable __shared__ int s; S = 2; S = 10  

Then there is no need for thread between "s = 2" and "s = 10"?

Can we say that for an unstable variable, the thread is not needed? If not, any example?

Such definition in memory defined in shared memory:

  Unstable __shared__ int s;  

Any access by thread in the threadbalk after the following line execution:

  s = 2;  

will look for s with 2, assume s there are no further updates. Unstable does not impede __ thread () execution barriers are the thread in question from that barrier Unless it is guaranteed that updates to shared memory and global memory ( __ thread () may be visible to other threads. / P>

However, following adaptation With Rum:

  s = 2; S = 10;  

There is no guarantee that what other threads will look like (except for the warp synchronous case, And further under the description of the scenario which you have not provided), except that they will see 2 or 10 (and again, assuming it is not updated ahead of s .)

< / Div>

Comments