How to use the code¶

In this section a guide on how to use the code is provided including examples.

Data needed to run a fit¶

First of all, one needs data to run a fit. More specifically one needs:

FK-tables
Binwidths
Event rates
Errors
Grid nodes

The event rates and errors can either be from event rate measurements or can be sourced from pseudo data. It is also possible to create pseudo data and to rebin the data to a certain number of events if one has:

FK-tables
Binwidths
Neutrino flux

Using the file generate_data.py one can generate data, rebin it if wanted and write the data to files stored in the Data directory. This data is pseudo data and one needs an input neutrino flux with which event rates can be computed by convoluting this with the FK-table.

All settings for the data generation can be specified in a yaml file like this:

data:
  pdf: "FASERv_Run3_EPOS+POWHEG_7TeV" 
  min_num_events: 20
  observable: "Eh"
  combine_nu_nub_data: False
  particle_id: 14
  pdf_set: 2
  filename_fk_table: "FK_Eh_final"
  filename_binwidth: "FK_Eh_binsize"
  filename_to_store_events: "FASERv_Run3_EPOS+POWHEG_7TeV_events"
  filename_to_store_stat_error: "FASERv_Run3_EPOS+POWHEG_7TeV_stat_error"
  filename_to_store_sys_error: "FASERv_Run3_EPOS+POWHEG_7TeV_sys_error"
  filename_to_store_cov_matrix: "FASERv_Run3_EPOS+POWHEG_7TeV_cov_matrix"
  multiplication_factor_sys_error: 0.2

where multiplication_factor_sys_error is a factor to take pseudo systematic uncertainties into account. One can put it to 0 if one only wants to include statistical uncertainties. Then type:

python generate_data.py data.yaml

to generate data

All the data files should be written to and read from the Data directory.

FK-table generation with POWHEG+PYTHIA8¶

The FK-tables can be generated by using the modified version of the neutrino DIS Monte Carlo event generator. This variant replaces the neutrino flux with the set of Lagrange interpolation polynomials following the procedure described here. To generate the FK-table for an observable a histogram has to be booked and filled in the analysis subroutine. After the simulation a differential distribution will be present for each member of the basis of interpolation polynomials with the same binning. The spacing of the grids, blocksize and the dimension of the basis of interpolation polynomials can be adapted in the by modifying the subroutines interpolation.f90 and lepton_flux.f90. To compile the code, adapt the paths in the Makefile. An example for a runcard and scripts to run the code are provided in the testrun-fk folder.

Available Data and Format¶

In the Data directory of the git repository, all data used in this work is available: FK-tables, binning, event rates and statistical uncertainties. The filenames of this data is as follows:

datatype_observable_(fine)_geometry_generator_7TeV_nu(mu,bmu,e,be)_W.dat

or

datatype_observable_(fine)_geometry_generator_7TeV_comb_W.dat

The corresponding fluxes are formatted in this way:

geometry_(generator/bsm/IC)_7TeV.dat

This data was used to parametrise the neutrino fluxes which can be found in the neutrino_pdfs_lhpadf folder. The user can also use this data to make fits.

Running a fit¶

The fitting code is available in the directory NN_fit/src/NN_fit/. When one wants to run a fit it starts with a yaml file. In this file all settings are found, for example the structure of the NN, the data one wants to use and the training parameters:

model:
  hidden_layers: [4, 4,4]
  activation_function: ["softplus","softplus","softplus"]
  preproc: True
  extended_loss: False
  num_output_layers: 1
  num_input_layers: 1

closure_test:
  fit_level: 2
  num_reps: 3
  diff_l1_inst: 3

training:
  patience: 100
  max_epochs: 2500
  lr: 0.03
  optimizer: "Adam"
  wd: 0.001
  range_alpha: 5
  range_beta: 20
  range_gamma: 100
  validation_split: 0.0
  max_chi_sq: 5
  lag_mult_pos: 0.001
  lag_mult_int: 0.001
  x_int: [0.001,0.98]

dataset:
  observable: "Eh"
  filename_data: "FASERv_Run3_EPOS+POWHEG_7TeV_events_comb_min_20_events"
  filename_stat_error: "FASERv_Run3_EPOS+POWHEG_7TeV_stat_error_comb_min_20_events"
  filename_sys_error: "FASERv_Run3_EPOS+POWHEG_7TeV_sys_error_comb_min_20_events"
  filename_cov_matrix: "FASERv_Run3_EPOS+POWHEG_7TeV_cov_matrix_comb_min_20_events"
  filename_binning: "FK_Eh_binsize_nub_min_20_events"
  grid_node: 'x_alpha.dat'
  pdf: "FASERv_Run3_EPOS+POWHEG_7TeV"
  pdf_set: 2
  fit_faser_data: False

postfit:
  postfit_criteria: True
  postfit_measures: True
  dir_for_data: 'test_dir_faserv_Eh_elec_epos'
  neutrino_pdf_fit_name_lhapdf: 'testgrid'
  particle_id_nu: 12
  particle_id_nub: -12
  produce_plot: True

If extended_loss is set to True one also takes positivity into account as well as ensures the neutrino PDF goes to zero in low- and high-x regions. The lag_mult_pos, lag_mult_int and x_int are the settings for this extended loss i.e. the Lagrange multipliers and the x-points to punish high-values of the neutrino PDF. If fit_faser_data is set o True the bins with the highest energy for muon and anti-muon neutrino event rates are combined due to the way FASER has measured and published the event rate measurements.

When running a fit type:

python execute_fit.py runcards/fit_settings.yaml

This will perform the fit and also, if wanted, perform the postfit analysis consisting of postfit measures, postfit criteria and plot the result. It will also write the results to a seperate directory and to a separate LHAPDF grid.

Hyperparameter optimization¶

An hyperparameter optimization algorithm is also available, based on k-fold cross validation and bayesian optimization. To perform hyperparameter optimizationf for a specific dataset run:

python perform_hyperopt.py hyperopt_settings.py

In the Framework section, the workings of this algorithm will be explained.