G.O.D. Framework

Script: ai_cross_validation_hyperparameter_optimization.py - Model Evaluation and Tuning

Introduction

The ai_cross_validation_hyperparameter_optimization.py script is a key module in the G.O.D. Framework dedicated to optimizing machine learning models through cross-validation and hyperparameter tuning. By automating these processes, the script ensures optimal configuration for the model under evaluation.

Purpose

Key Features

Logic and Implementation

At its core, this script automates the evaluation and optimization of machine learning models. The workflow follows these steps:

  1. Model Preparation: Receives a model, dataset, and parameter grid for tuning.
  2. Tuning Configuration: Configures a hyperparameter tuning approach (grid search or random search).
  3. Cross-Validation: Validates the model using k-fold cross-validation for each parameter combination.
  4. Evaluation: Records and saves performance metrics (e.g., accuracy, precision, F1 score) for each trial.
  5. Result Selection: Selects the best hyperparameter configuration based on a scoring function (e.g., validation accuracy).

            from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score
            from sklearn.ensemble import RandomForestClassifier
            from sklearn.datasets import make_classification

            class HyperparameterOptimizer:
                def __init__(self, model, param_grid, search_type="grid", cv=5):
                    """
                    Initializes the optimizer with a machine learning model and parameter grid.
                    :param model: The machine learning model (e.g., RandomForestClassifier()).
                    :param param_grid: A dictionary of hyperparameter ranges to optimize.
                    :param search_type: Type of search ('grid' or 'random').
                    :param cv: Number of cross-validation folds.
                    """
                    self.model = model
                    self.param_grid = param_grid
                    self.search_type = search_type
                    self.cv = cv

                def perform_search(self, X, y):
                    """
                    Executes the hyperparameter optimization search based on the selected type.
                    :param X: Feature matrix.
                    :param y: Target vector.
                    """
                    if self.search_type == "grid":
                        search = GridSearchCV(estimator=self.model, param_grid=self.param_grid, cv=self.cv)
                    elif self.search_type == "random":
                        search = RandomizedSearchCV(estimator=self.model, param_distributions=self.param_grid, cv=self.cv, n_iter=10)
                    else:
                        raise ValueError("Invalid search type. Use 'grid' or 'random'.")

                    search.fit(X, y)
                    return search.best_params_, search.best_score_

            if __name__ == "__main__":
                # Example: Hyperparameter optimization for a Random Forest classifier
                X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
                model = RandomForestClassifier(random_state=42)
                param_grid = {
                    "n_estimators": [50, 100, 150],
                    "max_depth": [5, 10, 15],
                    "min_samples_split": [2, 5, 10]
                }
                optimizer = HyperparameterOptimizer(model=model, param_grid=param_grid, search_type="grid", cv=5)
                best_params, best_score = optimizer.perform_search(X, y)
                print(f"Best Parameters: {best_params}")
                print(f"Best Score: {best_score}")
            

Dependencies

The script requires the following Python libraries, which are common for ML workflows:

How to Use This Script

  1. Prepare your dataset as feature matrix X and target vector y.
  2. Define a candidate machine learning model (e.g., RandomForest, SVM).
  3. Specify a hyperparameter grid or distribution to tune.
  4. Run the perform_search method to start optimization.
  5. Review and apply the best hyperparameters for your final trained model.

            # Example usage
            optimizer = HyperparameterOptimizer(
                model=RandomForestClassifier(),
                param_grid={"n_estimators": [100, 200], "max_depth": [10, 20]},
                search_type="grid",
                cv=3
            )
            best_params, best_score = optimizer.perform_search(X, y)
            print("Optimization Complete:", best_params, best_score)
            

Role in the G.O.D. Framework

Future Enhancements