Parallelism

A large fraction of studies using Pythia follow the same general steps: first the Pythia object(s) and histograms are initialized. Then there is a loop where events are generated using Pythia::next. These events are subsequently analyzed one by one, and the resulting statistics are stored in histograms. Finally, the histograms are normalized and the results are plotted.

In this procedure, usually the events are independently generated, and in principle it should be possible to generate them in parallel. The PythiaParallel class provides a simple interface for doing this. Objects of this class behave similarly to Pythia objects in that they can be configured using readString and readFile, and are initialized using init. The difference is that instead of having a next method that generates a single event, they have the method run which generates a number of events in parallel and analyzes them using a used-defined callback function. The program flow using the parallelism framework is as follows:

In addition to the standard header file, the parallel framework header must be included.
```
 
    #include "Pythia8/Pythia.h" 
    #include "Pythia8/PythiaParallel.h" 
```

The next step is to create, configure, and initialize a parallel generator object. This process is identical to that for a standard Pythia object.
```
 
    PythiaParallel pythia; 
    pythia.readString("HardQCD:all = on"); 
    pythia.init(); 
```

Finally, the PythiaParallel object is run with a user function which performs the necessary analysis on the events. In this example, an event multiplicity histogram is filled.
```
 
    Hist mult("mult", 100, -0.5, 799.5); 
    pythia.run(10000, [&](Pythia* pythiaPtr) { 
      int nFinal = 0; 
      for (int i = 0; i < pythiaPtr->event.size(); ++i) 
        if (pythiaPtr->event[i].isFinal()) ++nFinal; 
      mult.fill( nFinal ); 
    }); 
```
Here the syntax [&](Pythia* pythiaPtr) may be unfamiliar. It defines a lambda function that takes a pointer to a Pythia object as an argument. The [&] indicates that all local variables used by the lambda function (in this case only the mult histogram object) are passed by reference.

The scope of this class is to provide a lightweight way of parallelising simple Pythia studies. As such, it may not offer the necessary features to support more complicated use cases. The specific features are documented below, and examples are given in main221, main222, main223, and main404.

vector<long> PythiaParallel::run(long nEvents, function<void(Pythia*)> callback)
vector<long> PythiaParallel::run( function<void(Pythia*)> callback)
this is the main method of PythiaParallel, analogous to Pythia::next. The method generates nEvents events in parallel, distributing the tasks automatically to different Pythia instances. If nEvents is not specified, Main:numberOfEvents is used instead. The callback is a function that will be called when each event is generated. It takes a pointer to the Pythia instance that generated the event as an argument, and the event can then be accessed through Pythia::event. If an event fails to generate successfully, it will not be passed to the callback. By default, callbacks are synchronized. That is, only one callback can be active at the same time, which means it is safe to e.g. write to histograms from within the callback. Asynchronous processing of callbacks can be enabled by setting Parallelism:processAsync = on (see below). Returns a vector<long> containing the number of events successfully generated by each thread. If any events fail to generate, the entries will sum to a number that is smaller than the requested number of events.

bool PythiaParallel::init()
bool PythiaParallel::init( function<void(Pythia*)> customInit)
initialize each Pythia instance and returns whether successful.
argument customInit : If specified, this function will be called for each Pythia instance after it has been constructed and its settings have been set, but before calling Pythia::init. This can be useful for example for setting a UserHooks object on each instance.

void PythiaParallel::foreach(function<void(Pythia*)> action)
perform the specified action for each Pythia instance. This can be useful for doing custom finalization on each instance, e.g. combining histograms.

void PythiaParallel::foreachAsync( function<void(Pythia*)> action)
as PythiaParallel::foreach, but the actions are performed for all Pythia instances in parallel.

double PythiaParallel::weightSum() const
returns the sum of weights from all Pythia instances, as given by Info::weightSum().

double PythiaParallel::sigmaGen() const
returns the weighted average of the generated cross section for each Pythia instances, as given by Info::sigmaGen().

The following settings are available for the parallelism framework.

mode Parallelism:numThreads (default = 0; minimum = 0)
Number of threads to run in parallel. If set to 0, the number of threads will be estimated using std::thread::hardware_concurrency; if the program is unable to determine the number of threads this way, initialization will fail.

mvec Parallelism:seeds (default = {})
The seeds to use for each Pythia object. If empty, Random:seed will be used, incrementing the seed by 1 for each object. If non-empty, it must have a number of entries equal to Parallelism:numThreads.

flag Parallelism:processAsync (default = off)
By default, PythiaParallel::run will generate events in parallel, which are then processed serially. The advantage of this serial kind of processing is that it prevents race conditions, which can occur for example if two threads are trying to simultaneously write to the same histogram. Normally, event generation is far more time-consuming than analysis, and the time loss from processing in serial is completely negligible. However, there are situations where the analysis takes a non-negligible amount of time compared to event generation. In such scenarios, the user may enable parallel processing of the analysis by setting Parallelism:processAsync = on. In this case, the user is responsible for preventing race conditions, possibly by using mutex and lock_guard objects. An example of how this can be done is shown in main163.cc.

flag Parallelism:doNext (default = on)
By default, PythiaParallel generates events, then passes the Pythia object to the callback function. If this flag is turned off, the event is not automatically generated before calling the callback, so that the user is responsible for calling Pythia::next. This can be useful for example if you want to change the energy on an event-by-event basis, or otherwise want to do checks at the beginning of the event loop before performing the actual generation. If this flag is off, you must also set Parallelism:processAsync = on. If it is off,

mode Parallelism:index (default = -1)
When a Pythia instance is passed to the callback function in PythiaParallel::run, this setting will contain an index that is unique to each instance, starting at 0 for the first instance. This index is particularly useful if Parallelism:processAsync = on. For example if each instance writes to different histograms, this index can be used to specify which histogram to write to. Note that the user should not write to this setting directly.

flag Parallelism:balanceLoad (default = on)
By default, the event generation is divided evenly between the Pythia instances. This way, each instance will generate the same number of events each run with their respective seeds, so that the final statistics will be exactly the same between runs (except events might be processed in a different order). By turning this flag off, instead each Pythia instance will generate events until the desired number has been generated. Thus, different runs might result in slightly different statistics, even with the exact same input settings. The advantage of this is that it can be significantly more efficient if the event generation time can vary significantly (e.g. as it does in central vs. peripheral heavy ion collisions).