Atomic Instruction | 易学教程

What do you mean by Atomic instructions?

How does the following become Atomic?

TestAndSet

int TestAndSet(int *x){
   register int temp = *x;
   *x = 1;
   return temp;
}

From a software perspective, if one does not want to use non-blocking synchronization primitives, how can one ensure Atomicity of instruction? is it possible only at Hardware or some assembly level directive optimization can be used?

Some machine instructions are intrinsically atomic - for example, reading and writing properly aligned values of the native processor word size is atomic on many architectures.

This means that hardware interrupts, other processors and hyper-threads cannot interrupt the read or store and read or write a partial value to the same location.

More complicated things such as reading and writing together atomically can be achieved by explicit atomic machine instructions e.g. LOCK CMPXCHG on x86.

Locking and other high-level constructs are built on these atomic primitives, which typically only guard a single processor word.

Some clever concurrent algorithms can be built using just the reading and writing of pointers e.g. in linked lists shared between a single reader and writer, or with effort, multiple readers and writers.

Atomic comes from the Greek ἄτομος (atomos) which means "indivisible". (Caveat: I don't speak Greek, so maybe it's really something else, but most English speakers citing etymologies interpret it this way. :-)

In computing, this means that the operation, well, happens. There isn't any intermediate state that's visible before it completes. So if your CPU gets interrupted to service hardware (IRQ), or if another CPU is reading the same memory, it doesn't affect the result, and these other operations will observe it as either completed or not started.

As an example... let's say you wanted to set a variable to something, but only if it has not been set before. You might be inclined to do this:

if (foo == 0)
{
   foo = some_function();
}

But what if this is run in parallel? It could be that the program will fetch foo, see it as zero, meanwhile thread 2 comes along and does the same thing and sets the value to something. Back in the original thread, the code still thinks foo is zero, and the variable gets assigned twice.

For cases like this, the CPU provides some instructions that can do the comparison and the conditional assignment as an atomic entity. Hence, test-and-set, compare-and-swap, and load-linked/store-conditional. You can use these to implement locks (your OS and your C library has done this.) Or you can write one-off algorithms that rely on the primitives to do something. (There's cool stuff to be done here, but most mere mortals avoid this for fear of getting it wrong.)

Atomicity is a key concept when you have any form of parallel processing (including different applications cooperating or sharing data) that includes shared resources.

The problem is well illustrated with an example. Let's say you have two programs that want to create a file but only if the file doesn't already exists. Any of the two program can create the file at any point in time.

If you do (I'll use C since it's what's in your example):

 ...
 f = fopen ("SYNCFILE","r");
 if (f == NULL) {
   f = fopen ("SYNCFILE","w");
 }
 ...

you can't be sure that the other program hasn't created the file between your open for read and your open for write.

There's no way you can do this on your own, you need help from the operating system, that usually provide syncronization primitives for this purpose, or another mechanism that is guaranteed to be atomic (for example a relational database where the lock operation is atomic, or a lower level mechanism like processors "test and set" instructions).

Below are some of my notes on Atomicity that may help you understand the meaning. The notes are from the sources listed at the end and I recommend reading some of them if you need a more thorough explanation rather than point-form bullets as I have. Please point out any errors so that I may correct them.

Definition :

From the Greek meaning "not divisible into smaller parts"
An "atomic" operation is always observed to be done or not done, but never halfway done.
An atomic operation must be performed entirely or not performed at all.
In multi-threaded scenarios, a variable goes from unmutated to mutated directly, with no "halfway mutated" values

Example 1 : Atomic Operations

Consider the following integers used by different threads :

 int X = 2;
 int Y = 1;
 int Z = 0;

 Z = X;  //Thread 1

 X = Y;  //Thread 2

In the above example, two threads make use of X, Y, and Z
Each read and write are atomic
The threads will race :
- If thread 1 wins, then Z = 2
- If thread 2 wins, then Z=1
- Z will will definitely be one of those two values

Example 2 : Non-Atomic Operations : ++/-- Operations

Consider the increment/decrement expressions :
```
i++;  //increment
i--;  //decrement
```
The operations translate to :
1. Read i
2. Increment/decrement the read value
3. Write the new value back to i
The operations are each composed of 3 atomic operations, and are not atomic themselves
Two attempts to increment i on separate threads could interleave such that one of the increments is lost

Example 3 - Non-Atomic Operations : Values greater than 4-Bytes

Consider the following immutable struct :

  struct MyLong
   {
       public readonly int low;
       public readonly int high;

       public MyLong(int low, int high)
       {
           this.low = low;
           this.high = high;
       }
   }

We create fields with specific values of type MyLong :

MyLong X = new MyLong(0xAAAA, 0xAAAA);   
MyLong Y = new MyLong(0xBBBB, 0xBBBB);     
MyLong Z = new MyLong(0xCCCC, 0xCCCC);

We modify our fields in separate threads without thread safety :

X = Y; //Thread 1                                  
Y = X; //Thread 2

In .NET, when copying a value type, the CLR doesn't call a constructor - it moves the bytes one atomic operation at a time
Because of this, the operations in the two threads are now four atomic operations
If there is no thread safety enforced, the data can be corrupted

Consider the following execution order of operations :

X.low = Y.low;      //Thread 1 - X = 0xAAAABBBB            
Y.low = Z.low;      //Thread 2 - Y = 0xCCCCBBBB              
Y.high = Z.high;    //Thread 2 - Y = 0xCCCCCCCC             
X.high = Y.high;    //Thread 1 - X = 0xCCCCBBBB   <-- corrupt value for X

Reading and writing values greater than 32-bits on multiple threads on a 32-bit operating system without adding some sort of locking to make the operation atomic is likely to result in corrupt data as above

Processor Operations

On all modern processors, you can assume that reads and writes of naturally aligned native types are atomic as long as :
- 1 : The memory bus is at least as wide as the type being read or written
- 2 : The CPU reads and writes these types in a single bus transaction, making it impossible for other threads to see them in a half-completed state
On x86 and X64 there is no guarantee that reads and writes larger than eight bytes are atomic
Processor vendors define the atomic operations for each processor in a Software Developer's Manual
In single processors / single core systems it is possible to use standard locking techniques to prevent CPU instructions from being interrupted, but this can be inefficient
Disabling interrupts is another more efficient solution, if possible
In multiprocessor / multicore systems it is still possible to use locks but merely using a single instruction or disabling interrupts does not guarantee atomic access
Atomicity can be achieved by ensuring that the instructions used assert the 'LOCK' signal on the bus to prevent other processors in the system from accessing the memory at the same time

Language Differences

C# guarantees that operations on any built-in value type that takes up to 4-bytes are atomic
Operations on value types that take more than four bytes (double, long, etc.) are not guaranteed to be atomic
The CLI guarantees that reads and writes of variables of value type that are the size (or smaller) of the processor's natural pointer size are atomic
- Ex - running C# on a 64-bit OS in a 64-bit version of the CLR performs reads and writes of 64-bit doubles and long integers atomically
Creating atomic operations :
- .NET provodes the Interlocked Class as part of the System.Threading namespace
- The Interlocked Class provides atomic operations such as increment, compare, exchange, etc.

using System.Threading;             

int unsafeCount;                          
int safeCount;                           

unsafeCount++;                              
Interlocked.Increment(ref safeCount);

C++

C++ standard does not guarantee atomic behavior
All C / C++ operations are presumed non-atomic unless otherwise specified by the compiler or hardware vendor - including 32-bit integer assignment
Creating atomic operations :
- The C++ 11 concurrency library includes the - Atomic Operations Library ()
- The Atomic library provides atomic types as a template class to use with any type you want
- Operations on atomic types are atomic and thus thread-safe

struct AtomicCounter
{

   std::atomic< int> value;   

   void increment(){                                    
       ++value;                                
   }           

   void decrement(){                                         
       --value;                                                 
   }

   int get(){                                             
       return value.load();                                    
   }

}

Java

Java guarantees that operations on any built-in value type that takes up to 4-bytes are atomic
Assignments to volatile longs and doubles are also guaranteed to be atomic
Java provides a small toolkit of classes that support lock-free thread-safe programming on single variables through java.util.concurrent.atomic
This provides atomic lock-free operations based on low-level atomic hardware primitives such as compare-and-swap (CAS) - also called compare and set :
- CAS form - boolean compareAndSet(expectedValue, updateValue );
  - This method atomically sets a variable to the updateValue if it currently holds the expectedValue - reporting true on success

import java.util.concurrent.atomic.AtomicInteger;

public class Counter
{
     private AtomicInteger value= new AtomicInteger();

     public int increment(){
         return value.incrementAndGet();  
     }

     public int getValue(){
         return value.get();
     }
}

Sources
http://www.evernote.com/shard/s10/sh/c2735e95-85ae-4d8c-a615-52aadc305335/99de177ac05dc8635fb42e4e6121f1d2

Atomicity can only be guaranteed by the OS. The OS uses the underlying processor features to achieve this.

So creating your own testandset function is impossible. (Although I'm not sure if one could use an inline asm snippet, and use the testandset mnemonic directly (Could be that this statement can only be done with OS priviliges))

EDIT: According to the comments below this post, making your own 'bittestandset' function using an ASM directive directly is possible (on intel x86). However, if these tricks also work on other processors is not clear.

I stand by my point: if You want to do atmoic things, use the OS functions and don't do it yourself

来源：https://stackoverflow.com/questions/1762148/atomic-instruction

标签

language-agnostic

synchronization

nonblocking

atomicity

test-and-set