I have a number crunching application written in C. It is kind of a main loop that for each value calls, for increasing values of \"i\", a function that performs some calcul
One alternative to multithread your code would be using pthreads ( provides more precise control than OpenMP ).
Assuming x
, y
& result
are global variable arrays,
#include <pthread.h>
...
void *get_result(void *param) // param is a dummy pointer
{
...
}
int main()
{
...
pthread_t *tid = malloc( ntimes * sizeof(pthread_t) );
for( i=0; i<ntimes; i++ )
pthread_create( &tid[i], NULL, get_result, NULL );
... // do some tasks unrelated to result
for( i=0; i<ntimes; i++ )
pthread_join( tid[i], NULL );
...
}
(Compile your code with gcc prog.c -lpthread
)
If the task is highly parallelizable and your compiler is modern, you could try OpenMP. http://en.wikipedia.org/wiki/OpenMP
You should have a look at openMP for this. The C/C++ example on this page is similar to your code: https://computing.llnl.gov/tutorials/openMP/#SECTIONS
#include <omp.h>
#define N 1000
main ()
{
int i;
float a[N], b[N], c[N], d[N];
/* Some initializations */
for (i=0; i < N; i++) {
a[i] = i * 1.5;
b[i] = i + 22.35;
}
#pragma omp parallel shared(a,b,c,d) private(i)
{
#pragma omp sections nowait
{
#pragma omp section
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
#pragma omp section
for (i=0; i < N; i++)
d[i] = a[i] * b[i];
} /* end of sections */
} /* end of parallel section */
}
If you prefer not to use openMP you could use either pthreads or clone/wait directly.
No matter which route you choose you are just dividing up your arrays into chunks which each thread will process. If all of your processing is purely computational (as suggested by your example function) then you should do well to have only as many threads as you have logical processors.
There is some overhead with adding threads to do parallel processing, so make sure that you give each thread enough work to make up for it. Usually you will, but if each thread only ends up with 1 computation to do and the computations aren't that difficult to do then you may actually slow things down. You can always have fewer threads than you have processors if that is the case.
If you do have some IO going on in your work then you may find that having more threads than processors is a win because while one thread may be blocking waiting for some IO to complete another thread can be doing its computations. You have to be careful doing IO to the same file in threads, though.
a good exercise for learning concurrent programming in any language would be to work on a thread pool implementation.
In this pattern you create some threads in advance. Those threads are treated as an resource. A thread pool object/structure is used to assign user defined task to those threads for execution. When the task is finished you can collect it's results. You can use thread pool as a general purpose design pattern for concurrency.
The main idea could look similar to
#define number_of_threads_to_be_created 42
// create some user defined tasks
Tasks_list_t* task_list_elem = CreateTasks();
// Create the thread pool with 42 tasks
Thpool_handle_t* pool = Create_pool(number_of_threads_to_be_created);
// populate the thread pool with tasks
for ( ; task_list_elem; task_list_elem = task_list_elem->next) {
add_a_task_to_thpool (task_list_elem, pool);
}
// kick start the thread pool
thpool_run (pool);
// Now decide on the mechanism for collecting the results from tasks list.
// Some of the candidates are:
// 1. sleep till all is done (naive)
// 2. pool the tasks in the list for some state variable describing that the task has
// finished. This can work quite well in some situations
// 3. Implement signal/callback mechanism that a task can use to signal that it has
// finished executing.
The mechanism for collecting data from tasks and the amount of threads used in pool should be chosen to reflect your requirements and the capabilities of the hardware and runtime environment.
Also please note that this pattern does not say anything how you should "synchronize" your tasks with each other/outside surroundings. Also error handling can be a bit tricky (example: what to do when one task fails). Those two aspects need to be thought in advance - they can restrict usage of thread pool pattern.
About thread pool:
http://en.wikipedia.org/wiki/Thread_pool_pattern
http://docs.oracle.com/cd/E19253-01/816-5137/ggedn/index.html
A good literature about pthreads to get going:
http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf
Your code is not automatically multi-threaded by the compiler if that was your question. Please note that the C standards themselves know nothing about multi-threading, since whether you can use multi-threading or not does not depend on the language you use for coding, but on the destination platform you are coding for. Code written in C can run on pretty much anything for that a C compiler exists for. A C compiler even exists for a C64 computer (almost completely ISO-99 conform); however, to support multiple threads, the platform must have an operating system supporting this and usually this means that at least certain CPU functionality must be present. An operating system can do multithreading almost exclusively in software, this will be awfully slow and there won't be memory protection, but it is possible, however even in that case you need at least programmable interrupts.
So how to write multi-threaded C code depends entirely on the operating system of your target platform. There exists POSIX conform systems (OS X, FreeBSD, Linux, etc.) and systems that have their own library for that (Windows). Some systems have more than library for it (e.g. OS X has the POSIX Library, but there is also the Carbon Thread Manager you can use in C (though I think it is rather legacy nowadays).
Of course there exists cross-platform thread libraries and some modern compilers have support for things like OpenMP, where the compiler will automatically build code to create threads on your chosen target platform; but not many compilers do support it and those that do support it are usually not feature complete. Usually you get the widest system support by using POSIX threads, more often called "pthreads". The only major platform not supporting it is Windows and here you can use free 3rd party libraries like this one. Several other ports exists as well (Cygwin has one for sure). If you will have a UI one day of some kind, you may want to use a cross-platform library like wxWidgets or SDL, both offering consistent multi-thread support on all supported platforms.
Intel's C++ compiler is actually capable of automatically paralellizing your code. It's just a compiler switch you need to enable. It doesn't work as well as OpenMP though (ie. it doesn't always succeed or resulting program is slower). From Intel's website: "Auto-parallelization, which is triggered by the -parallel (Linux* OS and Mac OS* X) or /Qparallel (Windows* OS) option, automatically identifies those loop structures that contain parallelism. During compilation, the compiler automatically attempts to deconstruct the code sequences into separate threads for parallel processing. No other effort by the programmer is needed."