## Histograms

The `Hist` class gives a simple implementation of one-dimensional histograms, useful for quick-and-dirty testing, without the need to link to more sophisticated packages. For this reason it is used in many of the sample main programs found in the `examples` subdirectory.

### Basic principles

We here provide a simple overview of what is involved. As a first step you need to declare a histogram, with name, title, number of bins and x range (from, to).
```
Hist ZpT( "Z0 pT spectrum", 100, 0., 100.);
```
Alternatively you can first declare it and later define it:
```
Hist ZpT;
ZpT.book( "Z0 pT spectrum", 100, 0., 100.);
```
Once declared, its contents can be added by repeated calls to `fill`,
```
ZpT.fill( 22.7, 1.);
```
where the first argument is the x value and the second the weight. Since the weight defaults to 1 the last argument could have been omitted in this case.

A set of overloaded operators have been defined, so that histograms can be added, subtracted, divided or multiplied by each other. Then the contents are modified accordingly bin by bin. Thus the relative deviation between two histograms `data` and `theory` can be found as

```
diff = (data - theory) / (data + theory);
```
assuming that `diff`, `data` and `theory` have been booked with the same number of bins and x range. That responsibility rests on the user; some checks are made for compatibility, but not enough to catch all possible mistakes.

Also overloaded operations with double real numbers are available. Again these four operations are defined bin by bin, i.e. the corresponding amount is added to, subtracted from, multiplied by or divided by each bin. The double number can come before or after the histograms, with obvious results. Thus the inverse of a histogram `result` is given by `1. / result`. The two kind of operations can be combined, e.g.

```
allpT = ZpT + 2. * WpT
```
Finally, also the `+=, -+, *=, /=` are overloaded, with the right-hand side being either a histogram or a real number.

### Basic output format

A histogram can be printed by making use of the overloaded << operator, e.g.:
```
cout << ZpT;
```
The printout format is inspired by the old HBOOK one. To understand how to read this format, consider the simplified example
```
3.50*10^ 2  9
3.00*10^ 2  X   7
2.50*10^ 2  X  1X
2.00*10^ 2  X6 XX
1.50*10^ 2  XX5XX
1.00*10^ 2  XXXXX
0.50*10^ 2  XXXXX

Contents
*10^ 2  31122
*10^ 1  47208
*10^ 0  79373

Low edge  --
*10^ 1  10001
*10^ 0  05050
```
The key feature is that the `Contents` and `Low edge` have to be read vertically. For instance, the first bin has the contents `3 * 10^2 + 4 * 10^1 + 7 * 10^0 = 347`. Correspondingly, the other bins have contents 179, 123, 207 and 283. The first bin stretches from `-(1 * 10^1 + 0 * 10^0) = -10` to the beginning of the second bin, at `-(0 * 10^1 + 5 * 10^0) = -5`.

The visual representation above the contents give a simple impression of the shape. An `X` means that the contents are filled up to this level, a digit in the topmost row the fraction to which the last level is filled. So the 9 of the first column indicates this bin is filled 9/10 of the way from `3.00*10^2 = 300` to `3.50*10^2 = 350`, i.e. somewhere close to 345, or more precisely in the range 342.5 to 347.5.

The printout also provides some other information, such as the number of entries, i.e. how many times the histogram has been filled, the total weight inside the histogram, the total weight in underflow and overflow, and the mean value and root-mean-square width (disregarding underflow and overflow). The mean and width assumes that all the contents is in the middle of the respective bin. This is especially relevant when you plot a integer quantity, such as a multiplicity. Then it makes sense to book with limits that are half-integers, e.g.

```
Hist multMPI( "number of multiparton interactions", 20, -0.5, 19.5);
```
so that the bins are centered at 0, 1, 2, ..., respectively. This also avoids ambiguities which bin gets to be filled if entries are exactly at the border between two bins. Also note that the `fill( xValue)` method automatically performs a cast to double precision where necessary, i.e. `xValue` can be an integer.

### Matplotlib output format

Assuming you have Python installed on your platform, it is possible to generate simple Matplotlib/Pyplot Python code from the histograms generated above, which then can be run to produce PDF plots. This should be done near the end of a run, after the histograms have been filled and properly normalized, as an alternative or complement to the basic output format above.

In a first step you must then decide on the name of the Python program, e.g.:

```
HistPlot hpl( "bosonpT");
```
where file ending `.py` is added automatically.

For each new frame the name should be given, which will later give rise to a PDF file, with ending `.pdf` added automatically. If you leave the name field empty the same file will be used as for the latest named one, i.e. producing several frames in one file. One can optionally also give title and x and y axis labels:

```
hpl.frame( "pTdist", "Boson pT distributions", "pT (GeV)", "sigma");
```
Next, existing `Hist` histograms can be added to the frame, one by one:
```
hpl.add( ZpT + 2. * WpT , "", "pT spectrum of Z, W+ and W-");
```
where the second argument tells how each histogram will be plotted. Default is histogram style, "h", but the values can also be connected with full lines "-", dashed ones "--", or dash-dotted ones "-.", or plotted as points "." or crosses "x", to mention some of the many options offered by Pyplot. Here you can also specify the colour, separated by a comma from the line style, to override the normal colour cycle. The most common colours can be given just as a single letter, such as "r", "g", "b", but a more extensive colour palette allow finetuning to nuances such as "orange", "gold", "darkgreen", "royalblue", "crimson", and so on. A third argument can set the legend of each histogram; by default it is taken as the title of histogram.
Finally the plot code itself will be set up by
```
hpl.plot();
```
where optionally it is possible to demand a logarithmic y scale.

The `frame - add - plot` steps can be repeated as needed, each giving rise to a separate PDF file with a plot. In case a plot is to be generated from a single histogram the three steps can be joined into one

```
hpl.plotFrame( "onlyZ", ZpT );
```
where only the name of the PDF file and the histogram are compulsory, while further arguments as discussed above are optional.

At the end, a file `bosonpT.py` has now been generated with the proper plotting commands. Additionally a data file has been generated for each histogram to be plotted, `pTdist-0.dat, pTdist-1.dat`, etc. Now doing

```
python bosonpT.py
```
in a terminal window will produce the plots, such as `pTdist.pdf` and `onlyZ.pdf`. You may of course edit the python file further to improve on the finer details.

Examples are provided in `main03.cc` and `main07.cc`, where the latter is the simpler one. The `main51.cc` example illustrates that the x scale can be chosen logarithmically, by using an optional last argument when histograms are booked.

Alternatively, it also straight forward to produce histograms direcly through the Python interface using the `binEdges` and `binContents` methods of the `Hist` class.

```
import pythia8, random
from matplotlib import pyplot
hst = pythia8.Hist("log", 100, 0.1, 4, True)
for i in range(0, int(1e6)): hst.Fill(random.gauss(0, 1))
pyplot.hist(hst.getBinEdges()[:-1], hst.getBinEdges(),
weights = hst.getBinContents(),
histtype = "step", label = hst.getTitle())
```

### The methods

We here collect a more complete and formal overview of the methods.

Hist::Hist()
declare a histogram, but does not define it.

Hist::Hist(string title, int numberOfBins, double xMin, double xMax, bool logX = false)
declare and define a histogram, where
`argument` title : is a string with the title of the histogram at output,
`argument` numberOfBins : is the number of bin the x range will be subdivided into, limited to be at most 1000,
`argument` xMin : is the lower edge of the histogram,
`argument` xMax : is the upper edge of the histogram,
`argument` logX : gives a logarithmic x scale between `xMin` and `xMax` if set `true`. Note that then `xMin` must be above 1e-20 so as to stay with positive numbers.

Hist::Hist(const Hist& h)
creates an identical copy of the histogram in the argument, including bin contents.

Hist::Hist(string title, const Hist& h)
creates an identical copy of the histogram in the argument, including bin contents, except that a new title is provided as first argument.

Hist& Hist::operator=(const Hist& h)
copies all properties of the histogram in the argument, except that the original histogram title is retained.

void Hist::book(string title, int numberOfBins, double xMin, double xMax, bool logXIn = false)
define a histogram that previously was only declared; see above for the meaning of the arguments.

static Hist Hist::plotFunc(function<double(double)> f, string title, int nBin, double xMin, double xMax, bool logX = false)
create a histogram out of a provided function f(x). The function is evaluated in the middle of each bin.

void Hist::title(string title)
change the title of a histogram, but keep other properties unchanged.

void Hist::null()
reset bin contents, but keep other histogram properties unchanged.

void Hist::fill(double xValue, double weight)
fill the histogram, where
`argument` xValue : is the x position where the filling should occur, and
`argument` weight (`default = 1.`) : is the amount of weight to be added at this x value.

friend ostream& operator<<(ostream& os, const Hist& h)
appends a simple histogram printout (see above for format) to the `ostream`, while leaving the histogram object itself unchanged. At most 100 columns are allowed to be displayed. If the number of bins is larger than 100 then the contents of adjacent bins are added to give the value in each column. (Two by two up to 200 bins, three by three up to 300, and so on, with the very last column possibly summing fewer rows than the others.)
If the histogram has been booked with logarithmic x scale then the log10 of each lower bin border will be displayed rather than the border itself, to avoid undue complication. The Pyplot interface will display correct values, however.

void Hist::table(ostream& os = cout, bool printOverUnder = false, bool xMidBin = true)
void Hist::table(string fileName, bool printOverUnder = false, bool xMidBin = true)
print a two-column table, where the first column gives the center of each bin and the second one the corresponding bin contents. The table may be useful for plotting e.g. with Gnuplot.
The desired output stream or file name can be provided as argument. The former is more flexible (e.g., it allows easy append to an existing file), whereas the latter is simpler for the case that each histogram should be a file of its own.
An optional `printOverUnder = true` argument allows also underflow and overflow contents to be printed. (The arbitrary x coordinates for these are placed as if corresponding to same-size bins just below or above the regular histogram bins.)
An optional `xMidBin = false` argument will have the x value at the beginning of each bin printed, rather than the default midpoint value.

void Hist::rivetTable(ostream& os = cout, bool printError = false)
void Hist::rivetTable(string fileName, bool printError = false)
print a five-column table, where the first two columns give the lower and upper borders of each bin, the third one the bin contents, and the fourth and fifth the error (up and down) associated with the contents. This format matches the one that Rivet uses for its histograms. The choice between the two methods is the same as above for the `table` methods.
The error bins are put to zero by default, since the PYTHIA histogramming is not sophisticated enough to compensate for rescalings or other operations, or for weighted events. With the optional `printError = true` the error will be taken as the square root of the bin content, as is relevant if this content has the same unit weight for each entry to it.

void Hist::pyplotTable(ostream& os = cout, bool isHist = true)
void Hist:pyplotTable(string fileName, bool isHist = true)
prints either a two- or a three-column table, depending on whether the contents are to be plotted as a curve or as a histogram, the latter being default. In either case the first column gives the center of each bin and the second one the corresponding bin contents. For a histogram the third column gives the lower border of each bin. Then also a final line is provided, that gives no further histogram contents but provides the upper border of the last bin. This format is used for plotting with Pyplot, see below. The choice between the two methods is the same as above for the `table` methods.

void Hist::fillTable(ostream& os = cout)
void Hist::fillTable(string fileName)
fill the contents of a two-column table into the histogram. The first column should give the center of each bin and the second one the corresponding bin contents to be filled. This is basically the inverse of the `table` methods above, whereby a written histogram contents can be read back in. Over- and underflow will be read back in if provided, but the `xMidBin = false` option will not work.

friend void table(const Hist& h1, const Hist& h2, ostream& os = cout, bool printOverUnder = false, bool xMidBin = true)
friend void table(const Hist& h1, const Hist& h2, string fileName, bool printOverUnder = false, bool xMidBin = true)
print a three-column table, where the first column gives the center of each bin and the second and third ones the corresponding bin contents of the two histograms. Only works if the two histograms have the same x axis (within a tiny tolerance), else nothing will be done. The optional last two arguments allows also underflow and overflow contents to be printed, and the x to refer to the beginning of the bin rather than the center; see above.

string Hist::getTitle()
return the title of the histogram.

int Hist::getBinNumber()
return the number of bins in the histogram.

int Hist::getNonFinite()
return the number of entries that are neither infinite nor NaN.

bool Hist::getLinX()
return true if the histogram has a linear x scale and false if a logarithmic one.

bool Hist::getXMin()
bool Hist::getXMax()
return the lower and upper edge of the booked x range.

bool Hist::getYMin()
bool Hist::getYMax()
return the highest and lowest value of any bin inside the histogram, i.e. excluding underflow and overflow.

double Hist::getYAbsMin()
returns the smallest absolute value (above 1e-20, to sidestep bins with zero content) in any of the bins inside the histogram.

double Hist::getBinContent(int iBin)
return the value in bin `iBin`, ranging from 1 through `numberOfBins`, with `0` for underflow and `numberOfBins + 1` for overflow.

double Hist::getBinEdge(int iBin)
return the lower edge of bin `iBin`, ranging from 1 through `numberOfBins`.

double Hist::getBinWidth(int iBin)
return the width of bin `iBin`, ranging from 1 through `numberOfBins`.

vector<double> Hist::getBinContents()
return the bin contents.

vector<double> Hist::getBinEdges()
return the bin edges.

int Hist::getEntries(bool alsoNonFinite = true)
return the number of entries, i.e. the number of time that `fill(...)` has been called. If `alsoNonFinite` is false, then entries that are infinity or NaN are ignored.

bool Hist::sameSize(const Hist& h)
checks that the number of bins and upper and lower limits are the same as in the histogram in the argument.

void Hist::takeFunc(function<double(double)> func)
pass an arbitrary function that takes as an argument a `double` and returns a `double`. This function is applied to the current contents bin by bin.

void Hist::takeLog(bool tenLog = true)
by default take 10-logarithm of current contents bin by bin. With optional argument `false` instead take e-logarithm of contents bin by bin. If to be used, then right before the histogram is output.

void Hist::takeSqrt()
take square root of current contents bin by bin, with negative contents set to zero.

void Hist::normalize( double f = 1., bool overflow = true)
Rescale the histogram to a given sum of all bin contents, where
`argument` sum (`default = 1.`) : is the intended sum of all bin contents after rescaling, and
`argument` overflow (`default = on`) : includes the underflow and overflow bins in the definition of the original area to be rescaled, or not. Either way, the under/overflow are always rescaled by the same factor as the histogram proper.

void Hist::normalizeIntegral( double f = 1., bool alsoOverflow = true)
Rescale the histogram so that it integrates to the value given by
`argument` f (`default = 1.`) : . This is the appropriate normalization if the histogram represents a probability distribution. The argument
`argument` overflow (`default = on`) : specifies whether the underflow and overflow bins in the definition of the original area to be rescaled, or not. Either way, the under/overflow are always rescaled by the same factor, except the bin width, as the histogram proper. Bin widths are handled for both linear and log plots.

void Hist::normalizeSpectrum( double wtSum)
Rescale the histogram by a factor `1/(wtSum * dx)` where `dx` is the bin width. This is the appropriate renormalization if the histogram represents a spectrum on the form (1/N_{ev}) dN/dx.
`argument` wtSum (`default = 1.`) : is the total weight of the events, which if events are unweighted is just the total number of successfully generated events. the under/overflow are always rescaled by the same factor, except the bin width, as the histogram proper. Bin widths are handled for both linear and log plots.

Hist& Hist::operator+=(const Hist& h)
Hist& Hist::operator-=(const Hist& h)
adds or subtracts the current histogram by the contents of the histogram in the argument if `sameSize(...)` is true, else does nothing.

Hist& Hist::operator*=(const Hist& h)
Hist& Hist::operator/=(const Hist& h)
multiplies or divides the current histogram by the contents of the histogram in the argument if `sameSize(...)` is true, else does nothing.

Hist& Hist::operator+=(double f)
Hist& Hist::operator-=(double f)
adds or subtracts each bin content by the common offset f.

Hist& Hist::operator*=(double f)
Hist& Hist::operator*=(double f)
multiplies or divides each bin content by the common factor f.

friend Hist operator+(double f, const Hist& h1)
friend Hist operator+(const Hist& h1, double f)
friend Hist operator+(const Hist& h1, const Hist h2)
add a constant to a histogram or two histograms to each other, bin by bin.

friend Hist operator-(double f, const Hist& h1)
friend Hist operator-(const Hist& h1, double f)
friend Hist operator-(const Hist& h1, const Hist h2)
subtract a histogram from a constant, a constant from a histogram, or two histograms from each other, bin by bin.

friend Hist operator*(double f, const Hist& h1)
friend Hist operator*(const Hist& h1, double f)
friend Hist operator*(const Hist& h1, const Hist h2)
multiply a constant by a histogram or two histograms by each other, bin by bin.

friend Hist operator/(double f, const Hist& h1)
friend Hist operator/(const Hist& h1, double f)
friend Hist operator/(const Hist& h1, const Hist h2)
divide a constant by a histogram, a histogram by a constant, or two histograms by each other, bin by bin.

HistPlot::HistPlot(string pythonName)
create a file to which successively Python commands can be written.
`argument` pythonName : the name of the Python code file will become `pythonName.py`. When later executed with `python pythonName.py` the frames defined below will be drawn and output.

void HistPlot::frame( string frameName, string title = "", string xLabel = "", string yLabel = "", double xSize = 8., double ySize = 6.)
command to open a frame where successive histograms can be added and plotted.
`argument` frameName : the name of the frame. When the Python program is run, the output PDF file will be `frameName.pdf`. If this is blank or it is the same name as the previous frame, then the frame will be appended to the most recent previous PDF file. It cannot be left blank for the first call.
`argument` title : an optional title of the histogram, plotted on top of the frame.
`argument` xLabel : an optional label plotted below the x axis.
`argument` yLabel : an optional label plotted to the right of the y axis.
`argument` xSize, ySize : specifies the x and y size, in inches, of the canvas (the full figure area). Default values agree with the Pyplot defaults.
Note: the title and labels may contain special LaTeX symbols, e.g. simple math formulae enclosed by dollar signs \$...\$, that are interpreted by the Pyplot package. A backslash \ needs to be doubled to \\, however, since else it will be intepreted as an escape sequence.

void HistPlot::add(const Hist& hist, string style = "h", string legend = "void")
add a histogram to the current plot frame, where
`argument` hist : is the name of the histogram to be plotted,
`argument` style : tells how the histogram data will be represented in the figure, where default "h" is histogram style, "-" full lines, "--" dashed ones, "-." dash-dotted ones, "." points, "o" circles, "+" and "x" crosses, "*" stars, and so on as described in the Pyplot manual; additionally a colour may also be specified as described earlier, separated by a comma from the symbol; either of both can be used, e.g. "h,r", ",magenta", "+,", "--" or just ""; and
`argument` legend : is the text written in association with the plotting symbol, where the default `void` is replaced by the title provided when the histogram was booked, and LaTex input may be used as described in the note above.

void HistPlot::addFile(string file, string style = "o", string legend = "void", string xyerr = "")
add data in an already existing file to the current plot frame, where
`argument` file : is the name of the file to be plotted,
`argument` style : tells how the data will be represented in the figure, just like in the `add` method above, but by default intended for discrete points rather than histograms or connected lines;
`argument` legend : is the text written in association with the plotting symbol, where the default `void` is replaced by the name of the input file, and LaTex input may be used as described in the note above for `frame`; and
`argument` xyerr : can be used to provide error bars for a point, where the options are to give an `x` to plot a symmetric error bar in the x direction, an `X` for an asymmetric ditto, a `y` for a symmetric error bar in the y direction, and a `Y` for an asymmetric ditto, and where both x and y can be combined or omitted as desired.
Note: this method is primarily intended to overlay data points over existing histograms, and not for standalone usage. The data file is expected to contain at least two columns, where the first two give the (x,y) coordinates of the points. Then up to four error columns can follow, first the ones pertaining to the x error, if any, and then to y. Symmetric errors require one column with length in both directions, whereas asymetric gives first the length downwards and then the length upwards, both as non-negative numbers. For example, to plot the point (1.0, 2.0) with a symmetric error +/-0.5 in the x-direction and an asymmetric error +0.2/-0.5 in the y-direction, the file should contain the row `1.0 2.0 0.5 0.2 -0.5` and the value of `xyerr` might be `"xY"`.

void HistPlot::plot( bool logY = false, bool logX = false)
write the Pyplot commands for the current frame, as defined by the input from the two methods above, and additionally write the data files needed when the Python code is to be run.
`argument` logY : By default a linear y scale is used, but when true the y scale is instead logarithmic, except that a linear scale is used the very last step to zero bin content.
`argument` logX : By default a linear or logarithmic x scale is used depending on how the histograms were booked, i.e. by default linear, and this should not be overridden. For the case that the only input comes from `addFile` commands, however, this option gives the possibility to pick a logarithimic x scale instead of the default linear one. (Presumably you would then use the `plot`variant below.) Note the confusing order of the two arguments, which is caused by the first presumably being much more commonly used than the second.

void HistPlot::plot( double xMin, double xMax, double yMin, double yMax, bool logY = false, bool logX = false)
a variant of the `plot` method above, where the x and y ranges of the plot frame can be set by hand instead of calculated from the histogram information.

void HistPlot::plotFrame( string frameName, const Hist& hist, string title = "", string xLabel = "", string yLabel = "", string style = "h", string legend = "void", bool logY = false)
is an omnibus version combining the three above methods when only a single histogram is to be plotted in a frame. The arguments and their meaning is the same as already explained for them, and the order in which they appear is almost the same, the exception being that the two compulsory arguments `frameName` and `hist` have been put at the beginning.