Parallelism
- Code Overview
- Parallelism and Plugins
A large fraction of studies using Pythia
follow the same
general steps: first the Pythia
object(s) and histograms
are initialized. Then there is a loop where events are generated using
Pythia::next
. These events are subsequently analyzed one
by one, and the resulting statistics are stored in
histograms. Finally, the histograms are normalized and the results are
plotted.
In this procedure, usually the events are independently generated, and in
principle it should be possible to generate them in parallel. The
PythiaParallel
class provides a simple interface for
doing this. Objects of this class behave similarly to Pythia
objects in that they can be configured using readString
and
readFile
, and are initialized using
init
. The difference is that instead of having a
next
method that generates a single event, they have the
method run
which generates a number of events in parallel
and analyzes them using a used-defined callback function. The program
flow using the parallelism framework is as follows:
-
In addition to the standard header file, the parallel framework header
must be included.
#include "Pythia8/Pythia.h"
#include "Pythia8/PythiaParallel.h"
-
The next step is to create, configure, and initialize a parallel
generator object. This process is identical to that for a standard
Pythia
object.
PythiaParallel pythia;
pythia.readString("HardQCD:all = on");
pythia.init();
-
Finally, the
PythiaParallel
object is run with a user
function which performs the necessary analysis on the events. In this
example, an event multiplicity histogram is filled.
Hist mult("mult", 100, -0.5, 799.5);
pythia.run(10000, [&](Pythia* pythiaPtr) {
int nFinal = 0;
for (int i = 0; i < pythiaPtr->event.size(); ++i)
if (pythiaPtr->event[i].isFinal()) ++nFinal;
mult.fill( nFinal );
});
Here the syntax [&](Pythia* pythiaPtr)
may be
unfamiliar. It defines a lambda function that takes a pointer to a
Pythia
object as an argument. The [&]
indicates
that all local variables used by the lambda function (in this case only the
mult
histogram object) are passed by reference.
Code Overview
The scope of this class is to provide a lightweight way of
parallelising simple Pythia
studies. As such, it may not
offer the necessary features to support more complicated use cases.
The specific features are documented below, and examples are given in
main221
, main222
, main223
, and
main404
.
vector<long> PythiaParallel::run(long nEvents, function<void(Pythia*)> callback)
vector<long> PythiaParallel::run( function<void(Pythia*)> callback)
this is the main method of PythiaParallel
, analogous to
Pythia::next
. The method generates nEvents
events in parallel, distributing the tasks automatically to different
Pythia
instances. If nEvents
is not
specified, Main:numberOfEvents
is used instead.
The callback
is a function that will be called when each
event is generated. It takes a pointer to the Pythia
instance that generated the event as an argument,
and the event can then be accessed
through Pythia::event
. If an event fails to generate
successfully, it will not be passed to the callback.
By default, callbacks are synchronized. That is, only one callback can
be active at the same time, which means it is safe to e.g. write to
histograms from within the callback. Asynchronous processing of
callbacks can be enabled by setting Parallelism:processAsync =
on
(see below).
Returns a vector<long>
containing the number of
events successfully generated by each thread. If any events fail to
generate, the entries will sum to a number that is smaller than the
requested number of events.
bool PythiaParallel::init()
bool PythiaParallel::init( function<void(Pythia*)> customInit)
initialize each Pythia
instance and returns whether successful.
argument
customInit :
If specified, this function will be called for each
Pythia
instance after it has been constructed and its
settings have been set, but before calling Pythia::init
.
This can be useful for example for setting a UserHooks
object
on each instance.
void PythiaParallel::foreach(function<void(Pythia*)> action)
perform the specified action for each Pythia
instance.
This can be useful for doing custom finalization on each instance, e.g.
combining histograms.
void PythiaParallel::foreachAsync( function<void(Pythia*)> action)
as PythiaParallel::foreach
, but the actions are performed
for all Pythia
instances in parallel.
double PythiaParallel::weightSum() const
returns the sum of weights from all Pythia
instances, as given
by Info::weightSum()
.
double PythiaParallel::sigmaGen() const
returns the weighted average of the generated cross section for each
Pythia
instances, as given by Info::sigmaGen()
.
The following settings are available for the parallelism framework.
mode
Parallelism:numThreads
(default = 0
; minimum = 0
)
Number of threads to run in parallel. If set to 0, the number of threads will
be estimated using std::thread::hardware_concurrency
; if the
program is unable to determine the number of threads this way, initialization
will fail.
mvec
Parallelism:seeds
(default = {}
)
The seeds to use for each Pythia object. If empty, Random:seed
will be used, incrementing the seed by 1 for each object. If non-empty, it
must have a number of entries equal to Parallelism:numThreads
.
flag
Parallelism:processAsync
(default = off
)
By default, PythiaParallel::run
will generate events in
parallel, which are then processed serially. The advantage of this serial
kind of processing is that it prevents race conditions, which can occur
for example if two threads are trying to simultaneously write to the
same histogram. Normally, event generation is far more time-consuming
than analysis, and the time loss from processing in serial is
completely negligible.
However, there are situations where the analysis takes a non-negligible
amount of time compared to event generation. In such scenarios, the user
may enable parallel processing of the analysis by setting
Parallelism:processAsync = on
. In this case, the user
is responsible for preventing race conditions, possibly by using
mutex
and lock_guard
objects. An example of
how this can be done is shown in main163.cc
.
flag
Parallelism:doNext
(default = on
)
By default, PythiaParallel
generates events, then passes the
Pythia
object to the callback function. If this flag is turned
off, the event is not automatically generated before calling the callback,
so that the user is responsible for calling Pythia::next
.
This can be useful for example if you want to change the energy on an
event-by-event basis, or otherwise want to do checks at the beginning of
the event loop before performing the actual generation.
If this flag is off, you must also set
Parallelism:processAsync = on
. If it is off,
mode
Parallelism:index
(default = -1
)
When a Pythia
instance is passed to the callback function in
PythiaParallel::run
, this setting will contain an index that is
unique to each instance, starting at 0 for the first instance. This index is
particularly useful if Parallelism:processAsync = on
. For example
if each instance writes to different histograms, this index can be used to
specify which histogram to write to.
Note that the user should not write to this setting directly.
flag
Parallelism:balanceLoad
(default = on
)
By default, the event generation is divided evenly between the
Pythia
instances. This way, each instance will generate
the same number of events each run with their respective seeds, so that
the final statistics will be exactly the same between runs (except events
might be processed in a different order).
By turning this flag off, instead each Pythia
instance will
generate events until the desired number has been generated. Thus, different
runs might result in slightly different statistics, even with the exact
same input settings. The advantage of this is that it can be significantly
more efficient if the event generation time can vary significantly (e.g. as
it does in central vs. peripheral heavy ion collisions).
Parallelism and Plugins
There are a number of Pythia base classes that can be passed as
external pointers to a Pythia
instance, for example
UserHooks
, which are then used by that instance of
Pythia. When working with the parallelism framework, the
PythiaParallel::init
method can be used to set external
pointers per Pythia
object. It is important to note that
this method is threaded, and so whatever code is provided must be
thread safe. This may not always be possible, in which case an
external mutex
can be passed to
PythiaParallel::init
by reference, locked when
necessary, and then unlocked.
mutex initMutex;
pythia.init([&initMutex](Pythia* pythiaPtr) {
initMutex.lock();
// Thread unsafe code here.
shared_ptr<PowhegHooks> powhegHooks =
make_shared<PowhegHooks>();
initMutex.unlock()
// Thread safe code here.
pythiaPtr->setUserHooksPtr((UserHooksPtr)powhegHooks);
});
In this example, the locking prevents the creation of the
PowhegHooks
instances from being threaded (which actually
is not necessary).
A similar issue can arise when loading plugins via the
Init:plugins
setting. Here, Pythia will automatically
create a new plugin instance per thread, but again, if the plugin
itself is not thread safe this can cause issues. To overcome this issue,
every PhysicsBase
object has a mutexPtr
,
corresponding to the top-level PythiaParallel
object.
Whenever this mutex is locked, then the subsequent code is not threaded.
mutexPtr->lock();
// Thread unsafe code can be written here.
mutexPtr->unlock();
// Thread safe code can be written here.
This can be used within any method of a PhysicsBase
derived
class. All classes that can be passed to a Pythia
instance by
external pointer are derived from PhysicsBase
. By default,
plugins are assumed to not be thread safe unless the macro
PYTHIA8_PLUGIN_PARALLEL(true)
is explicitly set. See
Plugins for more details.
Finally, for plugins, it may be useful to combine the output of the
threads together again. This can be accomplished via the
PythiaParallel::stat(bool combine)
method. If
combine = true
, then the following method is called for
the first instance of the plugin.
void onStat(vector<PhysicsBase*> pluginPtrs, Pythia* pythiaPtrIn) {
// Loop over the plugin from each thread.
for (PhysicsBase* pluginPtr : pluginPtrs) {
// Skip the plugin that this method was called for.
if (pluginPtr == this) continue;
// Access all other plugins.
PluginClass* pluginNow = dynamic_cast<PluginClass*>(pluginPtr);
// Perform whatever merging is necessary here.
}
// Call onStat for this plugin instance.
onStat();
}
Here, pluginPtrs
is a vector of the plugin pointer from
each thread. Each of these plugin pointers then needs to be recast
back to the plugin class. Finally, the onStat()
for this
instance of the plugin should be called. This ensures that the
onStat
behaviour is the same between a single
Pythia
instance, and PythiaParallel
.