Library
General
Module to initialize the package and load preprocessed data for training models.
- library.load_and_preprocess_data(data_path)[source]
Load and preprocess the data from the given path.
- Parameters:
data_path (str) – Path to the data file.
- Returns:
- A tuple containing the preprocessed training,
validation, and test sets for features (X_train, X_val, X_test) and targets (y_train, y_val, y_test).
- Return type:
tuple
Preprocessing
This module contains functions to load and preprocess the dataset for training. It includes functions to load the dataset, balance the class distribution, split the data into training, validation, and test sets, and scale the features using StandardScaler. The preprocessed data is then saved for later use.
- Functions:
load_data: Loads the dataset from a specified path.
split_data: Balances the dataset and separates features and target variable.
preprocess_data: Splits the data into training, validation, and test sets, scales the features, and saves the processed data.
main: Loads and preprocesses the data.
- library.preprocessing.data_preprocessing.load_data(path)[source]
Load dataset from the given path. :param path: Path to the CSV file containing the dataset. :type path: str
- Returns:
Loaded dataset as a pandas DataFrame.
- Return type:
pd.DataFrame
- library.preprocessing.data_preprocessing.main()[source]
Main function to load the data and preprocess it.
- library.preprocessing.data_preprocessing.preprocess_data(dataframe)[source]
Preprocess the dataset by balancing the class distribution, splitting it into training, validation, and test sets, and scaling the features using StandardScaler.
- Parameters:
dataframe (pd.DataFrame) – The raw dataset to be preprocessed.
- Returns:
- A tuple containing the scaled training, validation, and test features
and their corresponding target variables.
- Return type:
tuple
- library.preprocessing.data_preprocessing.split_data(dataframe)[source]
Balance the dataset to have equal numbers of fraud and non-fraud cases. Separates the dataset into features and target variables. :param dataframe: The dataset to be split. :type dataframe: pd.DataFrame
- Returns:
A tuple containing the features (X) and the target (y).
- Return type:
tuple
Models
Module for evaluating the performance of trained models using both Sklearn and FHE implementations.
This module includes functionality to compare the accuracy and execution times of models trained with Scikit-learn and Fully Homomorphic Encryption (FHE). Results are saved as a CSV file.
- library.models.evaluate.evaluate_models(models, datasets, training_times)[source]
Evaluate trained models and compare performance.
This function computes accuracy and execution time metrics for Scikit-learn and FHE models and calculates their ratios.
- Parameters:
models (dict) –
Dictionary where keys are model names and values are tuples containing Scikit-learn and FHE models: {
”model_name”: (sklearn_model, fhe_model)
}.
datasets (dict) –
Dictionary containing training and validation datasets: {
”x_train”: np.ndarray, “x_val”: np.ndarray, “y_train”: np.ndarray, “y_val”: np.ndarray
}.
training_times (dict) –
Dictionary containing training times for Scikit-learn models: {
”model_name”: float
}.
- Returns:
A list of dictionaries, where each dictionary contains evaluation metrics for a model: [
- {
“Model”: str, “Sklearn Accuracy”: float, “FHE Accuracy”: float, “Sklearn Time”: float, “FHE Time”: float, “Time Ratio (FHE/Sklearn)”: float, “Accuracy Ratio (FHE/Sklearn)”: float
].
- Return type:
list
- library.models.evaluate.main()[source]
Main function to load data, evaluate models, and save results.
This function loads processed data and pre-trained models, evaluates their performance using evaluate_models, and saves the results to a CSV file.
Steps: 1. Load the processed datasets and models from serialized files. 2. Evaluate the models on accuracy and execution time metrics. 3. Save the evaluation results in a CSV file for further analysis.
- Outputs:
results.csv: A CSV file containing model evaluation metrics.
Module to train and store FHE models for homomorphic encryption.
This module trains models for Fully Homomorphic Encryption (FHE) and stores them after training. It supports the training of Sklearn models, compiles them for homomorphic encryption, and saves the FHE models for later use.
- library.models.fhe_model.main()[source]
Main function to train FHE models and print training times.
This function loads training data, trains the models for homomorphic encryption, and prints the training times for each model.
- Outputs:
Prints the training times for the trained FHE models.
- library.models.fhe_model.train_fhe_models(models, x_train, y_train)[source]
Train and store FHE models for homomorphic encryption.
This function trains models for homomorphic encryption, compiles them, and stores them for future use.
- Parameters:
models (dict) –
Dictionary containing model names as keys and model tuples as values. The tuples contain the Scikit-learn model and the FHE model: {
”model_name”: (sklearn_model, fhe_model)
}.
x_train (array-like) – Feature set used for training the models. Should be a 2D array where each row represents a sample and each column represents a feature.
y_train (array-like) – Target labels for the training data. Should be a 1D array with the same number of elements as the number of samples in x_train.
- Returns:
A dictionary with the training times for each FHE model: {
”model_name”: training_time
}, where training_time is the time taken to train and compile the FHE model.
- Return type:
dict
This module contains the models to compare between Sklearn and FHE.
This module defines a function get_models() that returns a dictionary of models from both Sklearn and Fully Homomorphic Encryption (FHE) implementations for comparison purposes. The models include Random Forest, Logistic Regression, Decision Tree, Linear SVC, and XGBoost Classifier.
- library.models.model_comparaison.get_models()[source]
This function returns a dictionary where the keys are the names of the models and the values are tuples containing the Sklearn model and the corresponding FHE model. These models are used to compare the performance of traditional machine learning models (Sklearn) with their Fully Homomorphic Encryption (FHE) counterparts.
- Returns:
- A dictionary where each key is a model name and each value is a tuple containing:
Sklearn model (e.g., RandomForestClassifier, LogisticRegression, etc.)
FHE model (e.g., FHERandomForestClassifier, FHELogisticRegression, etc.)
- Return type:
dict
This script trains multiple models and stores them in a dictionary.
This script loads a set of machine learning models, trains them on a given training dataset, and stores both the trained models and their corresponding training times in a dictionary. The trained models include both Sklearn and Fully Homomorphic Encryption (FHE) versions of various classifiers.
- library.models.train.main()[source]
Main function to train models.
This function loads the training data, trains the models using the train_models function, and stores the trained models and their training times in a file.
- Returns:
None
- library.models.train.train_models(models, x_train, y_train)[source]
This function takes a dictionary of models and their corresponding Sklearn and FHE versions, trains the Sklearn models on the provided training data, and stores the trained models and their training times.
- Parameters:
models (dict) – A dictionary where each key is a model name and each value is a tuple containing the Sklearn model and the FHE model.
x_train (array-like) – The training feature set (input data).
y_train (array-like) – The training target set (labels).
- Returns:
- A tuple containing:
- trained_models (dict): A dictionary where each key is a model name, and each value is
a tuple containing the trained Sklearn and FHE models.
- training_times (dict): A dictionary where each key is a model name, and each value is
the time taken to train the Sklearn model.
- Return type:
tuple