How to use the code¶
In this section a guide on how to use the code is provided including examples.
Data needed to run a fit¶
First of all, one needs data to run a fit. More specifically one needs:
-
FK-tables
-
Binwidths
-
Event rates
-
Errors
-
Grid nodes
The event rates and errors can either be from event rate measurements or can be sourced from pseudo data. It is also possible to create pseudo data and to rebin the data to a certain number of events if one has:
-
FK-tables
-
Binwidths
-
Neutrino flux
Using the file generate_data.py one can generate data, rebin it if wanted and write the data to files stored in the Data directory. This data is pseudo data and one needs an input neutrino flux with which event rates can be computed by convoluting this with the FK-table.
All settings for the data generation can be specified in a yaml file like this:
data:
pdf: "FASERv_Run3_EPOS+POWHEG_7TeV"
min_num_events: 20
observable: "Eh"
combine_nu_nub_data: False
particle_id: 14
pdf_set: 2
filename_fk_table: "FK_Eh_final"
filename_binwidth: "FK_Eh_binsize"
filename_to_store_events: "FASERv_Run3_EPOS+POWHEG_7TeV_events"
filename_to_store_stat_error: "FASERv_Run3_EPOS+POWHEG_7TeV_stat_error"
filename_to_store_sys_error: "FASERv_Run3_EPOS+POWHEG_7TeV_sys_error"
filename_to_store_cov_matrix: "FASERv_Run3_EPOS+POWHEG_7TeV_cov_matrix"
multiplication_factor_sys_error: 0.2
multiplication_factor_sys_error is a factor to take pseudo systematic uncertainties into account. One can put it to 0 if one only wants to include statistical uncertainties.
Then type:
python generate_data.py data.yaml
All the data files should be written to and read from the Data directory.
FK-table generation with POWHEG+PYTHIA8¶
The FK-tables can be generated by using the modified version of the neutrino DIS Monte Carlo event generator. This variant replaces the neutrino flux with the set of Lagrange interpolation polynomials following the procedure described here. To generate the FK-table for an observable a histogram has to be booked and filled in the analysis subroutine. After the simulation a differential distribution will be present for each member of the basis of interpolation polynomials with the same binning. The spacing of the grids, blocksize and the dimension of the basis of interpolation polynomials can be adapted in the by modifying the subroutines interpolation.f90 and lepton_flux.f90. To compile the code, adapt the paths in the Makefile. An example for a runcard and scripts to run the code are provided in the testrun-fk folder.
Available Data and Format¶
In the Data directory of the git repository, all data used in this work is available: FK-tables, binning, event rates and statistical uncertainties. The filenames of this data is as follows:
datatype_observable_(fine)_geometry_generator_7TeV_nu(mu,bmu,e,be)_W.dat
or
datatype_observable_(fine)_geometry_generator_7TeV_comb_W.dat
The corresponding fluxes are formatted in this way:
geometry_(generator/bsm/IC)_7TeV.dat
This data was used to parametrise the neutrino fluxes which can be found in the neutrino_pdfs_lhpadf folder. The user can also use this data to make fits.
Running a fit¶
The fitting code is available in the directory NN_fit/src/NN_fit/.
When one wants to run a fit it starts with a yaml file. In this file all settings are found, for example the structure of the NN, the data one wants to use and the training parameters:
model:
hidden_layers: [4, 4,4]
activation_function: ["softplus","softplus","softplus"]
preproc: True
extended_loss: False
num_output_layers: 1
num_input_layers: 1
closure_test:
fit_level: 2
num_reps: 3
diff_l1_inst: 3
training:
patience: 100
max_epochs: 2500
lr: 0.03
optimizer: "Adam"
wd: 0.001
range_alpha: 5
range_beta: 20
range_gamma: 100
validation_split: 0.0
max_chi_sq: 5
lag_mult_pos: 0.001
lag_mult_int: 0.001
x_int: [0.001,0.98]
dataset:
observable: "Eh"
filename_data: "FASERv_Run3_EPOS+POWHEG_7TeV_events_comb_min_20_events"
filename_stat_error: "FASERv_Run3_EPOS+POWHEG_7TeV_stat_error_comb_min_20_events"
filename_sys_error: "FASERv_Run3_EPOS+POWHEG_7TeV_sys_error_comb_min_20_events"
filename_cov_matrix: "FASERv_Run3_EPOS+POWHEG_7TeV_cov_matrix_comb_min_20_events"
filename_binning: "FK_Eh_binsize_nub_min_20_events"
grid_node: 'x_alpha.dat'
pdf: "FASERv_Run3_EPOS+POWHEG_7TeV"
pdf_set: 2
fit_faser_data: False
postfit:
postfit_criteria: True
postfit_measures: True
dir_for_data: 'test_dir_faserv_Eh_elec_epos'
neutrino_pdf_fit_name_lhapdf: 'testgrid'
particle_id_nu: 12
particle_id_nub: -12
produce_plot: True
extended_loss is set to True one also takes positivity into account as well as ensures the neutrino PDF goes to zero in low- and high-x regions. The lag_mult_pos, lag_mult_int and x_int are the settings for this extended loss i.e. the Lagrange multipliers and the x-points to punish high-values of the neutrino PDF. If fit_faser_data is set o True the bins with the highest energy for muon and anti-muon neutrino event rates are combined due to the way FASER has measured and published the event rate measurements.
When running a fit type:
python execute_fit.py runcards/fit_settings.yaml
Hyperparameter optimization¶
An hyperparameter optimization algorithm is also available, based on k-fold cross validation and bayesian optimization. To perform hyperparameter optimizationf for a specific dataset run:
python perform_hyperopt.py hyperopt_settings.py