I have a number crunching application written in C. It is kind of a main loop that for each value calls, for increasing values of \"i\", a function that performs some calcul
If you are hoping to provide concurrency for a single loop for some kind of scientific computing or similar, OpenMP as @Novikov says really is your best bet; this is what it was designed for.
If you're looking to learn the more classical approach that you would more typically see in an application written in C... On POSIX you want pthread_create()
et al. I'm not sure what your background might be with concurrency in other languages, but before going too deeply into that, you will want to know your synchronization primitives (mutexes, semaphores, etc.) fairly well, as well as understanding when you will need to use them. That topic could be a whole book or set of SO questions unto itself.
You can use pthreads to perform multithreading in C. here is a simple example based on pthreads.
#include<pthread.h>
#include<stdio.h>
void *mythread1(); //thread prototype
void *mythread2();
int main(){
pthread_t thread[2];
//starting the thread
pthread_create(&thread[0],NULL,mythread1,NULL);
pthread_create(&thread[1],NULL,mythread2,NULL);
//waiting for completion
pthread_join(thread[0],NULL);
pthread_join(thread[1],NULL);
return 0;
}
//thread definition
void *mythread1(){
int i;
for(i=0;i<5;i++)
printf("Thread 1 Running\n");
}
void *mythread2(){
int i;
for(i=0;i<5;i++)
printf("Thread 2 Running\n");
}
Reference: C program to implement Multithreading-Multithreading in C
To specifically address the "automatically multithreaded" part of the OP's question:
One really interesting view of how to program parallelism was designed into a language called Cilk Plus invented by MIT and now owned by Intel. To quote Wikipedia, the idea is that
"the programmer should be responsible for exposing the parallelism, identifying elements that can safely be executed in parallel; it should then be left to the run-time environment, particularly the scheduler, to decide during execution how to actually divide the work between processors."
Cilk Plus is a superset of standard C++. It just contains a few extra keywords (_Cilk_spawn
, _Cilk_sync
, and _Cilk_for
) that allow the programmer to tag parts of their program as parallelizable. The programmer does not mandate that any code be run on a new thread, they just allow the lightweight runtime scheduler to spawn a new thread if and only if it is actually the right thing to do under the particular runtime conditions.
To use Cilk Plus, just add its keywords into your code, and build with Intel's C++ compiler.
If an iteration in loop is independent of the ones before it, then there's a very simple approach: try multi-processing, rather than multi-threading.
Say you have 2 cores and ntimes
is 100, then 100/2=50, so create 2 versions of the program where the first iterates from 0 to 49, the other from 50 to 99. Run them both, your cores should be kept quite busy.
This is a very simplistic approach, yet you don't have to mess with thread creation, synchronization, etc
Depending on the OS, you could use posix threads. You could instead implement stack-less multithreading using state machines. There is a really good book entitled "embedded multitasking" by Keith E. Curtis. It's just a neatly crafted set of switch case statements. Works great, I've used it on everything from apple macs, rabbit semiconductor, AVR, PC.
Vali
C11 threads in glibc 2.28.
Tested in Ubuntu 18.04 (glibc 2.27) by compiling glibc from source: Multiple glibc libraries on a single host
Example from: https://en.cppreference.com/w/c/language/atomic
#include <stdio.h>
#include <threads.h>
#include <stdatomic.h>
atomic_int acnt;
int cnt;
int f(void* thr_data)
{
for(int n = 0; n < 1000; ++n) {
++cnt;
++acnt;
// for this example, relaxed memory order is sufficient, e.g.
// atomic_fetch_add_explicit(&acnt, 1, memory_order_relaxed);
}
return 0;
}
int main(void)
{
thrd_t thr[10];
for(int n = 0; n < 10; ++n)
thrd_create(&thr[n], f, NULL);
for(int n = 0; n < 10; ++n)
thrd_join(thr[n], NULL);
printf("The atomic counter is %u\n", acnt);
printf("The non-atomic counter is %u\n", cnt);
}
GitHub upstream.
Compile and run:
gcc -std=c11 main.c -pthread
./a.out
Possible output:
The atomic counter is 10000
The non-atomic counter is 8644
The non-atomic counter is very likely to be smaller than the atomic one due to racy access across threads to the non atomic variable.
TODO: disassemble and see what ++acnt;
compiles to.
POSIX threads
#define _XOPEN_SOURCE 700
#include <assert.h>
#include <stdlib.h>
#include <pthread.h>
enum CONSTANTS {
NUM_THREADS = 1000,
NUM_ITERS = 1000
};
int global = 0;
int fail = 0;
pthread_mutex_t main_thread_mutex = PTHREAD_MUTEX_INITIALIZER;
void* main_thread(void *arg) {
int i;
for (i = 0; i < NUM_ITERS; ++i) {
if (!fail)
pthread_mutex_lock(&main_thread_mutex);
global++;
if (!fail)
pthread_mutex_unlock(&main_thread_mutex);
}
return NULL;
}
int main(int argc, char **argv) {
pthread_t threads[NUM_THREADS];
int i;
fail = argc > 1;
for (i = 0; i < NUM_THREADS; ++i)
pthread_create(&threads[i], NULL, main_thread, NULL);
for (i = 0; i < NUM_THREADS; ++i)
pthread_join(threads[i], NULL);
assert(global == NUM_THREADS * NUM_ITERS);
return EXIT_SUCCESS;
}
Compile and run:
gcc -std=c99 pthread_mutex.c -pthread
./a.out
./a.out 1
The first run works fine, the second fails due to missing synchronization.
Tested on Ubuntu 18.04. GitHub upstream.