Directives are based on #pragma directives defined in the C and C++ standards. Compilers that support the OpenMP C and C++ API will include a command-line option that activates and allows interpretation of all OpenMP compiler directives.
The syntax of an OpenMP directive is formally specified by the grammar in appendix C, and informally as follows:
#pragma omp directive-name [clause[ [,] clause]. ] new-line
Each directive starts with #pragma omp , to reduce the potential for conflict with other (non-OpenMP or vendor extensions to OpenMP) pragma directives with the same names. The rest of the directive follows the conventions of the C and C++ standards for compiler directives. In particular, white space can be used before and after the # , and sometimes white space must be used to separate the words in a directive. Preprocessing tokens following the #pragma omp are subject to macro replacement.
Directives are case-sensitive. The order in which clauses appear in directives isn't significant. Clauses on directives may be repeated as needed, subject to the restrictions listed in the description of each clause. If variable-list appears in a clause, it must specify only variables. Only one directive-name can be specified per directive. For example, the following directive isn't allowed:
/* ERROR - multiple directive names not allowed */ #pragma omp parallel barrier
An OpenMP directive applies to at most one succeeding statement, which must be a structured block.
The _OPENMP macro name is defined by OpenMP-compliant implementations as the decimal constant yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of a #define or a #undef preprocessing directive.
#ifdef _OPENMP iam = omp_get_thread_num() + index; #endif
If vendors define extensions to OpenMP, they may specify additional predefined macros.
The following directive defines a parallel region, which is a region of the program that's to be executed by many threads in parallel. This directive is the fundamental construct that starts parallel execution.
#pragma omp parallel [clause[ [, ]clause] . ] new-line structured-block
The clause is one of the following:
When a thread gets to a parallel construct, a team of threads is created if one of the following cases is true:
This thread becomes the master thread of the team, with a thread number of 0, and all threads in the team, including the master thread, execute the region in parallel. If the value of the if expression is zero, the region is serialized.
To determine the number of threads that are requested, the following rules will be considered in order. The first rule whose condition is met will be applied:
If the num_threads clause is present then it supersedes the number of threads requested by the omp_set_num_threads library function or the OMP_NUM_THREADS environment variable only for the parallel region it's applied to. Later parallel regions aren't affected by it.
The number of threads that execute the parallel region also depends upon whether dynamic adjustment of the number of threads is enabled. If dynamic adjustment is disabled, then the requested number of threads will execute the parallel region. If dynamic adjustment is enabled then the requested number of threads is the maximum number of threads that may execute the parallel region.
If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads requested for the parallel region is more than the number that the run-time system can supply, the behavior of the program is implementation-defined. An implementation may, for example, interrupt the execution of the program, or it may serialize the parallel region.
The omp_set_dynamic library function and the OMP_DYNAMIC environment variable can be used to enable and disable dynamic adjustment of the number of threads.
The number of physical processors actually hosting the threads at any given time is implementation-defined. Once created, the number of threads in the team stays constant for the duration of that parallel region. It can be changed either explicitly by the user or automatically by the run-time system from one parallel region to another.
The statements contained within the dynamic extent of the parallel region are executed by each thread, and each thread can execute a path of statements that's different from the other threads. Directives encountered outside the lexical extent of a parallel region are referred to as orphaned directives.
There's an implied barrier at the end of a parallel region. Only the master thread of the team continues execution at the end of a parallel region.
If a thread in a team executing a parallel region encounters another parallel construct, it creates a new team, and it becomes the master of that new team. Nested parallel regions are serialized by default. As a result, by default, a nested parallel region is executed by a team composed of one thread. The default behavior may be changed by using either the runtime library function omp_set_nested or the environment variable OMP_NESTED . However, the number of threads in a team that execute a nested parallel region is implementation-defined.
Restrictions to the parallel directive are as follows:
A work-sharing construct distributes the execution of the associated statement among the members of the team that encounter it. The work-sharing directives don't launch new threads, and there's no implied barrier on entry to a work-sharing construct.
The sequence of work-sharing constructs and barrier directives encountered must be the same for every thread in a team.
OpenMP defines the following work-sharing constructs, and these constructs are described in the sections that follow:
The for directive identifies an iterative work-sharing construct that specifies that the iterations of the associated loop will be executed in parallel. The iterations of the for loop are distributed across threads that already exist in the team executing the parallel construct to which it binds. The syntax of the for construct is as follows:
#pragma omp for [clause[[,] clause] . ] new-line for-loop
The clause is one of the following:
The for directive places restrictions on the structure of the corresponding for loop. Specifically, the corresponding for loop must have canonical shape:
for ( init-expr ; var logical-op b ; incr-expr )
init-expr
One of the following:
incr-expr
One of the following:
var
A signed integer variable. If this variable would otherwise be shared, it's implicitly made private for the duration of the for . Do not modify this variable within the body of the for statement. Unless the variable is specified lastprivate , its value after the loop is indeterminate.
logical-op
One of the following:
lb, b, and incr
Loop invariant integer expressions. There's no synchronization during the evaluation of these expressions, so any evaluated side effects produce indeterminate results.
The canonical form allows the number of loop iterations to be computed on entry to the loop. This computation is made with values in the type of var, after integral promotions. In particular, if value of b - lb + incr can't be represented in that type, the result is indeterminate. Further, if logical-op is < or incr-expr must cause var to increase on each iteration of the loop. If logical-op is > or >= , then incr-expr must cause var to get smaller on each iteration of the loop.
The schedule clause specifies how iterations of the for loop are divided among threads of the team. The correctness of a program must not depend on which thread executes a particular iteration. The value of chunk_size, if specified, must be a loop invariant integer expression with a positive value. There's no synchronization during the evaluation of this expression, so any evaluated side effects produce indeterminate results. The schedule kind can be one of the following values:
Table 2-1: schedule clause kind values
Value | Description |
---|---|
static | When schedule(static, chunk_size ) is specified, iterations are divided into chunks of a size specified by chunk_size. The chunks are statically assigned to threads in the team in a round-robin fashion in the order of the thread number. When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, with one chunk assigned to each thread. |
dynamic | When schedule(dynamic, chunk_size ) is specified, the iterations are divided into a series of chunks, each containing chunk_size iterations. Each chunk is assigned to a thread that's waiting for an assignment. The thread executes the chunk of iterations and then waits for its next assignment, until no chunks remain to be assigned. The last chunk to be assigned may have a smaller number of iterations. When no chunk_size is specified, it defaults to 1. |
guided | When schedule(guided, chunk_size ) is specified, the iterations are assigned to threads in chunks with decreasing sizes. When a thread finishes its assigned chunk of iterations, it's dynamically assigned another chunk, until none is left. For a chunk_size of 1, the size of each chunk is approximately the number of unassigned iterations divided by the number of threads. These sizes decrease almost exponentially to 1. For a chunk_size with value k greater than 1, the sizes decrease almost exponentially to k, except that the last chunk may have fewer than k iterations. When no chunk_size is specified, it defaults to 1. |
runtime | When schedule(runtime) is specified, the decision regarding scheduling is deferred until runtime. The schedule kind and size of the chunks can be chosen at run time by setting the environment variable OMP_SCHEDULE . If this environment variable isn't set, the resulting schedule is implementation-defined. When schedule(runtime) is specified, chunk_size must not be specified. |
In the absence of an explicitly defined schedule clause, the default schedule is implementation-defined.
An OpenMP-compliant program shouldn't rely on a particular schedule for correct execution. A program shouldn't rely on a schedule kind conforming precisely to the description given above, because it's possible to have variations in the implementations of the same schedule kind across different compilers. The descriptions can be used to select the schedule that's appropriate for a particular situation.
The ordered clause must be present when ordered directives bind to the for construct.
There's an implicit barrier at the end of a for construct unless a nowait clause is specified.
Restrictions to the for directive are as follows:
The sections directive identifies a noniterative work-sharing construct that specifies a set of constructs that are to be divided among threads in a team. Each section is executed once by a thread in the team. The syntax of the sections directive is as follows:
#pragma omp sections [clause[[,] clause] . ] new-line
The clause is one of the following:
Each section is preceded by a section directive, although the section directive is optional for the first section. The section directives must appear within the lexical extent of the sections directive. There's an implicit barrier at the end of a sections construct, unless a nowait is specified.
Restrictions to the sections directive are as follows:
The single directive identifies a construct that specifies that the associated structured block is executed by only one thread in the team (not necessarily the master thread). The syntax of the single directive is as follows:
#pragma omp single [clause[[,] clause] . ] new-linestructured-block
The clause is one of the following:
There's an implicit barrier after the single construct unless a nowait clause is specified.
Restrictions to the single directive are as follows:
Combined parallel work-sharing constructs are shortcuts for specifying a parallel region that has only one work-sharing construct. The semantics of these directives are the same as explicitly specifying a parallel directive followed by a single work-sharing construct.
The following sections describe the combined parallel work-sharing constructs:
The parallel for directive is a shortcut for a parallel region that contains only a single for directive. The syntax of the parallel for directive is as follows:
#pragma omp parallel for [clause[[,] clause] . ] new-linefor-loop
This directive allows all the clauses of the parallel directive and the for directive, except the nowait clause, with identical meanings and restrictions. The semantics are the same as explicitly specifying a parallel directive immediately followed by a for directive.
The parallel sections directive provides a shortcut form for specifying a parallel region that has only a single sections directive. The semantics are the same as explicitly specifying a parallel directive immediately followed by a sections directive. The syntax of the parallel sections directive is as follows:
#pragma omp parallel sections [clause[[,] clause] . ] new-line
The clause can be one of the clauses accepted by the parallel and sections directives, except the nowait clause.
The following sections describe:
The master directive identifies a construct that specifies a structured block that's executed by the master thread of the team. The syntax of the master directive is as follows:
#pragma omp master new-linestructured-block
Other threads in the team don't execute the associated structured block. There's no implied barrier either on entry to or exit from the master construct.
The critical directive identifies a construct that restricts execution of the associated structured block to a single thread at a time. The syntax of the critical directive is as follows:
#pragma omp critical [(name)] new-linestructured-block
An optional name may be used to identify the critical region. Identifiers used to identify a critical region have external linkage and are in a name space that is separate from the name spaces used by labels, tags, members, and ordinary identifiers.
A thread waits at the beginning of a critical region until no other thread is executing a critical region (anywhere in the program) with the same name. All unnamed critical directives map to the same unspecified name.
The barrier directive synchronizes all the threads in a team. When encountered, each thread in the team waits until all of the others have reached this point. The syntax of the barrier directive is as follows:
#pragma omp barrier new-line
After all threads in the team have encountered the barrier, each thread in the team begins executing the statements after the barrier directive in parallel. Because the barrier directive doesn't have a C language statement as part of its syntax, there are some restrictions on its placement within a program. For more information about the formal grammar, see appendix C. The example below illustrates these restrictions.
/* ERROR - The barrier directive cannot be the immediate * substatement of an if statement */ if (x!=0) #pragma omp barrier . /* OK - The barrier directive is enclosed in a * compound statement. */ if (x!=0)
The atomic directive ensures that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads. The syntax of the atomic directive is as follows:
#pragma omp atomic new-lineexpression-stmt
The expression statement must have one of the following forms:
In the preceding expressions:
Although it's implementation-defined whether an implementation replaces all atomic directives with critical directives that have the same unique name, the atomic directive permits better optimization. Often hardware instructions are available that can perform the atomic update with the least overhead.
Only the load and store of the object designated by x are atomic; the evaluation of expr isn't atomic. To avoid race conditions, all updates of the location in parallel should be protected with the atomic directive, except those that are known to be free of race conditions.
Restrictions to the atomic directive are as follows:
extern float a[], *p = a, b; /* Protect against races among multiple updates. */ #pragma omp atomic a[index[i]] += b; /* Protect against races with updates through a. */ #pragma omp atomic p[i] -= 1.0f; extern union u; /* ERROR - References through incompatible types. */ #pragma omp atomic u.n++; #pragma omp atomic u.x -= 1.0f;
The flush directive, whether explicit or implied, specifies a "cross-thread" sequence point at which the implementation is required to ensure that all threads in a team have a consistent view of certain objects (specified below) in memory. This means that previous evaluations of expressions that reference those objects are complete and subsequent evaluations haven't yet begun. For example, compilers must restore the values of the objects from registers to memory, and hardware may need to flush write buffers to memory and reload the values of the objects from memory.
The syntax of the flush directive is as follows:
#pragma omp flush [(variable-list)] new-line
If the objects that require synchronization can all be designated by variables, then those variables can be specified in the optional variable-list. If a pointer is present in the variable-list, the pointer itself is flushed, not the object the pointer refers to.
A flush directive without a variable-list synchronizes all shared objects except inaccessible objects with automatic storage duration. (This is likely to have more overhead than a flush with a variable-list.) A flush directive without a variable-list is implied for the following directives:
The directive isn't implied if a nowait clause is present. It should be noted that the flush directive isn't implied for any of the following:
A reference that accesses the value of an object with a volatile-qualified type behaves as if there were a flush directive specifying that object at the previous sequence point. A reference that modifies the value of an object with a volatile-qualified type behaves as if there were a flush directive specifying that object at the subsequent sequence point.
Because the flush directive doesn't have a C language statement as part of its syntax, there are some restrictions on its placement within a program. For more information about the formal grammar, see appendix C. The example below illustrates these restrictions.
/* ERROR - The flush directive cannot be the immediate * substatement of an if statement. */ if (x!=0) #pragma omp flush (x) . /* OK - The flush directive is enclosed in a * compound statement */ if (x!=0)
Restrictions to the flush directive are as follows:
The structured block following an ordered directive is executed in the order in which iterations would be executed in a sequential loop. The syntax of the ordered directive is as follows:
#pragma omp ordered new-linestructured-block
An ordered directive must be within the dynamic extent of a for or parallel for construct. The for or parallel for directive to which the ordered construct binds must have an ordered clause specified as described in section 2.4.1. In the execution of a for or parallel for construct with an ordered clause, ordered constructs are executed strictly in the order in which they would be executed in a sequential execution of the loop.
Restrictions to the ordered directive are as follows:
This section presents a directive and several clauses for controlling the data environment during the execution of parallel regions, as follows:
The threadprivate directive makes the named file-scope, namespace-scope, or static block-scope variables specified in the variable-list private to a thread. variable-list is a comma-separated list of variables that don't have an incomplete type. The syntax of the threadprivate directive is as follows:
#pragma omp threadprivate(variable-list) new-line
Each copy of a threadprivate variable is initialized once, at an unspecified point in the program prior to the first reference to that copy, and in the usual manner (i.e., as the master copy would be initialized in a serial execution of the program). Note that if an object is referenced in an explicit initializer of a threadprivate variable, and the value of the object is modified prior to the first reference to a copy of the variable, then the behavior is unspecified.
As with any private variable, a thread must not reference another thread's copy of a threadprivate object. During serial regions and master regions of the program, references will be to the master thread's copy of the object.
After the first parallel region executes, the data in the threadprivate objects is guaranteed to persist only if the dynamic threads mechanism has been disabled and if the number of threads remains unchanged for all parallel regions.
The restrictions to the threadprivate directive are as follows:
The following example illustrates how modifying a variable that appears in an initializer can cause unspecified behavior, and also how to avoid this problem by using an auxiliary object and a copy-constructor.
int x = 1; T a(x); const T b_aux(x); /* Capture value of x = 1 */ T b(b_aux); #pragma omp threadprivate(a, b) void f(int n) < x++; #pragma omp parallel for /* In each thread: * Object a is constructed from x (with value 1 or 2?) * Object b is copy-constructed from b_aux */ for (int i=0; i>
Several directives accept clauses that allow a user to control the sharing attributes of variables for the duration of the region. Sharing attribute clauses apply only to variables in the lexical extent of the directive on which the clause appears. Not all of the following clauses are allowed on all directives. The list of clauses that are valid on a particular directive are described with the directive.
If a variable is visible when a parallel or work-sharing construct is encountered, and the variable isn't specified in a sharing attribute clause or threadprivate directive, then the variable is shared. Static variables declared within the dynamic extent of a parallel region are shared. Heap allocated memory (for example, using malloc() in C or C++ or the new operator in C++) is shared. (The pointer to this memory, however, can be either private or shared.) Variables with automatic storage duration declared within the dynamic extent of a parallel region are private.
Most of the clauses accept a variable-list argument, which is a comma-separated list of variables that are visible. If a variable referenced in a data-sharing attribute clause has a type derived from a template, and there are no other references to that variable in the program, the behavior is undefined.
All variables that appear within directive clauses must be visible. Clauses may be repeated as needed, but no variable may be specified in more than one clause, except that a variable can be specified in both a firstprivate and a lastprivate clause.
The following sections describe the data-sharing attribute clauses:
The private clause declares the variables in variable-list to be private to each thread in a team. The syntax of the private clause is as follows:
private(variable-list)
The behavior of a variable specified in a private clause is as follows. A new object with automatic storage duration is allocated for the construct. The size and alignment of the new object are determined by the type of the variable. This allocation occurs once for each thread in the team, and a default constructor is invoked for a class object if necessary; otherwise the initial value is indeterminate. The original object referenced by the variable has an indeterminate value upon entry to the construct, must not be modified within the dynamic extent of the construct, and has an indeterminate value upon exit from the construct.
In the lexical extent of the directive construct, the variable references the new private object allocated by the thread.
The restrictions to the private clause are as follows:
The firstprivate clause provides a superset of the functionality provided by the private clause. The syntax of the firstprivate clause is as follows:
firstprivate(variable-list)
Variables specified in variable-list have private clause semantics, as described in section 2.7.2.1. The initialization or construction happens as if it were done once per thread, prior to the thread's execution of the construct. For a firstprivate clause on a parallel construct, the initial value of the new private object is the value of the original object that exists immediately prior to the parallel construct for the thread that encounters it. For a firstprivate clause on a work-sharing construct, the initial value of the new private object for each thread that executes the work-sharing construct is the value of the original object that exists prior to the point in time that the same thread encounters the work-sharing construct. In addition, for C++ objects, the new private object for each thread is copy constructed from the original object.
The restrictions to the firstprivate clause are as follows:
The lastprivate clause provides a superset of the functionality provided by the private clause. The syntax of the lastprivate clause is as follows:
lastprivate(variable-list)
Variables specified in the variable-list have private clause semantics. When a lastprivate clause appears on the directive that identifies a work-sharing construct, the value of each lastprivate variable from the sequentially last iteration of the associated loop, or the lexically last section directive, is assigned to the variable's original object. Variables that aren't assigned a value by the last iteration of the for or parallel for , or by the lexically last section of the sections or parallel sections directive, have indeterminate values after the construct. Unassigned subobjects also have an indeterminate value after the construct.
The restrictions to the lastprivate clause are as follows:
This clause shares variables that appear in the variable-list among all the threads in a team. All threads within a team access the same storage area for shared variables.
The syntax of the shared clause is as follows:
shared(variable-list)
The default clause allows the user to affect the data-sharing attributes of variables. The syntax of the default clause is as follows:
default(shared | none)
Specifying default(shared) is equivalent to explicitly listing each currently visible variable in a shared clause, unless it's threadprivate or const -qualified. In the absence of an explicit default clause, the default behavior is the same as if default(shared) were specified.
Specifying default(none) requires that at least one of the following must be true for every reference to a variable in the lexical extent of the parallel construct:
Specifying a variable on a firstprivate , lastprivate , or reduction clause of an enclosed directive causes an implicit reference to the variable in the enclosing context. Such implicit references are also subject to the requirements listed above.
Only a single default clause may be specified on a parallel directive.
A variable's default data-sharing attribute can be overridden by using the private , firstprivate , lastprivate , reduction , and shared clauses, as demonstrated by the following example:
#pragma omp parallel for default(shared) firstprivate(i)\ private(x) private(r) lastprivate(i)
This clause performs a reduction on the scalar variables that appear in variable-list, with the operator op. The syntax of the reduction clause is as follows:
reduction( op : variable-list )
A reduction is typically specified for a statement with one of the following forms:
x
One of the reduction variables specified in the list.
variable-list
A comma-separated list of scalar reduction variables.
expr
An expression with scalar type that doesn't reference x.
op
Not an overloaded operator but one of + , * , - , & , ^ , | , && , or || .
binop
Not an overloaded operator but one of + , * , - , & , ^ , or | .
The following is an example of the reduction clause:
#pragma omp parallel for reduction(+: a, y) reduction(||: am) for (i=0; i
As shown in the example, an operator may be hidden inside a function call. The user should be careful that the operator specified in the reduction clause matches the reduction operation.
Although the right operand of the || operator has no side effects in this example, they're permitted, but should be used with care. In this context, a side effect that's guaranteed not to occur during sequential execution of the loop may occur during parallel execution. This difference can occur because the order of execution of the iterations is indeterminate.
The operator is used to determine the initial value of any private variables used by the compiler for the reduction and to determine the finalization operator. Specifying the operator explicitly allows the reduction statement to be outside the lexical extent of the construct. Any number of reduction clauses may be specified on the directive, but a variable may appear in at most one reduction clause for that directive.
A private copy of each variable in variable-list is created, one for each thread, as if the private clause had been used. The private copy is initialized according to the operator (see the following table).
At the end of the region for which the reduction clause was specified, the original object is updated to reflect the result of combining its original value with the final value of each of the private copies using the operator specified. The reduction operators are all associative (except for subtraction), and the compiler may freely reassociate the computation of the final value. (The partial results of a subtraction reduction are added to form the final value.)
The value of the original object becomes indeterminate when the first thread reaches the containing clause and remains so until the reduction computation is complete. Normally, the computation will be complete at the end of the construct; however, if the reduction clause is used on a construct to which nowait is also applied, the value of the original object remains indeterminate until a barrier synchronization has been performed to ensure that all threads have completed the reduction clause.
The following table lists the operators that are valid and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable.
Operator | Initialization |
---|---|
+ | 0 |
* | 1 |
- | 0 |
& | ~0 |
| | 0 |
^ | 0 |
&& | 1 |
|| | 0 |
The restrictions to the reduction clause are as follows:
#pragma omp parallel private(y) < /* ERROR - private variable y cannot be specified in a reduction clause */ #pragma omp for reduction(+: y) for (i=0; i/* ERROR - variable x cannot be specified in both a shared and a reduction clause */ #pragma omp parallel for shared(x) reduction(+: x)
The copyin clause provides a mechanism to assign the same value to threadprivate variables for each thread in the team executing the parallel region. For each variable specified in a copyin clause, the value of the variable in the master thread of the team is copied, as if by assignment, to the thread-private copies at the beginning of the parallel region. The syntax of the copyin clause is as follows:
copyin( variable-list )
The restrictions to the copyin clause are as follows:
The copyprivate clause provides a mechanism to use a private variable to broadcast a value from one member of a team to the other members. It's an alternative to using a shared variable for the value when providing such a shared variable would be difficult (for example, in a recursion requiring a different variable at each level). The copyprivate clause can only appear on the single directive.
The syntax of the copyprivate clause is as follows:
copyprivate( variable-list )
The effect of the copyprivate clause on the variables in its variable-list occurs after the execution of the structured block associated with the single construct, and before any of the threads in the team have left the barrier at the end of the construct. Then, in all other threads in the team, for each variable in the variable-list, that variable becomes defined (as if by assignment) with the value of the corresponding variable in the thread that executed the construct's structured block.
Restrictions to the copyprivate clause are as follows:
Dynamic binding of directives must adhere to the following rules:
Dynamic nesting of directives must adhere to the following rules: