Configuration

The following lines rovides a comprehensive guide to the configuration file in YAML format used for the analysis. The file includes specifications for training and validation samples, input variables, general settings, data preparation parameters, validation configurations, and model hyperparameters. Each section is explained in detail to assist users in understanding and customizing the analysis settings.

Training and Validation Samples

training_samples: List of training samples, each identified by a label.
- path: [str] Path to the sample data file. Several files can be included by separating them with ":" symbol.
- event_dataset: [str] Name of the event dataset.
- particle_dataset: [str] Name of the particle dataset.
- weights: [str] Name of the MC weight variable.
- nevents: [int] Number of events to use (same for each file included, to be update).
- cross_section: [float] Process cross section.
- branching_ratio: [float] Decay branching ratio.
- acceptance_factor: [float] Additional acceptance correction factor.
- legend: [str] Legend label for the sample.
- colour: [str] Color for visualization.
- type: [str] Type of the sample (signal or background).
validation_samples: List of validation samples, similar to training samples.
- path: [str] Path to the sample data file. Several files can be included by separating them with ":" symbol.
- event_dataset: [str] Name of the event dataset.
- particle_dataset: [str] Name of the particle dataset.
- weights: [str] Name of the MC weight variable.
- nevents: [int] Number of events to use (same for each file included, to be update).
- cross_section: [float] Process cross section.
- branching_ratio: [float] Decay branching ratio.
- acceptance_factor: [float] Additional acceptance correction factor.
- legend: [str] Legend label for the sample.
- colour: [str]Color for visualization.
- type: [str] Type of the sample (signal or background).

Input Variables

input_variables: List of final state particle features used in the analysis.
ghost_variables: List of ghost event features to be included in prediction file for validation.

General Configuration

general_configuration: General settings for the analysis.
- output_directory: [str] Directory for storing output files.
- training_mode: [str] Accepted modes: "classification" or "regression".
- analysis_title: [str] Title for the analysis.
- use_gpu: [bool] Use GPU device when available.

Preparation Configuration

preparation_configuration: Parameters for data preparation.
- regression_target: [str] Target variable for regression.
- regression_target_label: [str] Label for the regression target.
- nparticles: [int] Number of selected final state particles.
- batch_size: [int] Size of the data batches.
- norm: [bool] Normalize samples, preserving only shape differences.
- duplicate: [bool] Duplicate training statistics for low input statistics.
- validation_plots: [bool] Produce input data validation plots.
- validation_plots_log: [bool] Use log scale for y-axis.

Model and Training Hyperparameters

transformer_classification_parameters: Hyperparameters for the transformer classification model.
- model_name: [str] Name of the model.
- nMHAlayers: [int] Number of multi-head attention layers.
- nheads: [int] Number of attention heads per multi-head attention layer.
- nDlayers: [int] Number of dense layers.
- vdropout: [float] Dropout factor.
- act_fn: [str] Activation function.
- nepochs: [int] Number of training epochs.
- learning_rate: [float] Learning rate.
- verbose: [int] Displayed information during training.
- embedding: [bool] Include an embedding layer.
- embedding_dim: [int] Dimension of the embedding layer.

Validation Configuration

validation_configuration: Configuration for the validation phase.
- luminosity_scaling: [float] Rescale samples to the given luminosity.
- save_predictions: [bool] Save predictions in HFD5 and ROOT file formats during validation process.
- save_onnx_model: [bool] Save best model into ONNX format.
- plot_model: [bool] Plot model architecture.
- plot_embedding: [bool] Plot embedding visualization.
- plot_confusion: [bool] Plot confusion matrix.
- plot_scores: [bool] Plot training performance scores (accuracy, precision, recall, F1, etc.).
- plot_discriminant: [bool] Plot LLR discriminant distributions.
- plot_proba: [bool] Plot output probabilities distributions.
- plot_roc: [bool] Plot ROC curves.
- plot_efficiency: [bool] Plot efficiency curves.
- plot_log_probabilities: [bool] Plot network output distributions with a log-scale y-axis.
- plot_log_discriminant: [bool] Plot network output distributions with a log-scale y-axis.