Saturday, March 25, 2017

Memory Barriers - What / Why / How ?

Memory Barriers 



I have struggled to understand the use of memory barrier for long time.
Today I am attempting to explain it to all the folks who would have gone through the same experience.

[1000] a = 10  
[1004] b = 20

Modern CPUs can actually store the b first than a , that means instructions are reordered.
The primary reason for that could be these are two different locations and no change can occur in these locations without CPU's knowledge 

c = a + b 

from the CPU perspective: 

  • # 1 fetch the value from memory location pointed by a and store in register 
  • # 2 fetch the value from memory location pointed by b and store in register 

In this situation the architecture garuntees that before c is being computed , the memory locations [1000] and [1004 ] are read.

The real interesting thought is ...... does it really matter which location is read first ? whether the location [ 1000 ] or [ 1004 ] the result is still going to be the same.

Now one  fact we need to take into consideration that there are devices which are address mapped into the CPU. 

And the device registers like status and control registers are inherently asynchronous.


#1 Set particular values in control register. Once the operation is complete it sets the bits in status registers

#2 Read the status register and perform certain action.


Again from the CPU perspective these two are different locations, so CPU thinks it can reorder the sequence of these operations and nothing would change under the hood but the problem is that these locations are address mapped and are pointing to device registers which __CAN__ change based on the external conditions or based on the write oprations performed on the control registers.

Now imagine what would happen if because of the optimization CPU first executes the #2 , and then performs the #1 

This would be a big problem because these operations should occur in sequence cause even though the addresses of two registers are totally different the operations are dependent on each other.

In this case we need to be shutting off the optimizations that modern CPU carries out. This can be done by using memory barrier before performing the read operation in #2 as shown below.

Sequence of operations : 


#1 Set particular values in control register     Once the operation is complete it sets the bits in status registers.

rmb() - This makes sure that #1 is executed before the #2 is being executed, that way , we will always read the status register which gives us result only if the #1 was performed

#2 Read the status register and perform certain action.


Note - rmb() - read memory barrier. 

           wmb() - write memory barrier. 

Resources: