This is an old revision of the document!

Experiment Manager

The Experiment Manager is a robust and modular system designed to manage and log experiment configurations, results, and metadata efficiently. The system is key to enabling reproducibility, traceability, and analysis of experimental workflows, making it a vital component for research and production environments.

Overview

The Experiment Manager ensures that all essential details regarding experiments are properly logged and stored for further review and analysis. It is implemented in Python and works seamlessly with both local storage and extensible systems (databases, cloud, etc.).

Key Features

Comprehensive Experiment Logging:

Logs configurations and results for individual or batch experiments into JSON files, which makes it easy to import or process later.

Error-Resilient Design:

Handles potential errors during the logging process, ensuring smooth execution in production.

Metadata Support:

Includes options for timestamps, experiment IDs, and custom metadata to provide detailed documentation for each experiment.

Customizable Storage:

By default, logs experiments to a JSON file, but it can be modified for databases or cloud integrations.

Ease of Integration:

Simple, static methods allow for quickly adding logging functionality to any research or development pipeline.

Purpose

The Goal of the system is to:

1. **Organize Experimental Data**:
    Log experiments consistently for traceability and reproducibility.
2. **Improve Automation**:
    Provide automated tools to reduce the manual overhead of managing experiment logs.
3. **Enable Scalability**:
    Handle small-scale and large-scale projects by modular design.
4. **Enhance Results Visibility**:
    Use structured data storage for visualization, analysis, and reporting.

System Design

The Experiment Manager uses core Python libraries (`logging` and `json`) to handle structured data storage and error handling.

Core Design: ExperimentManager Class

The `ExperimentManager` class implements a static method `log_experiment` that takes the following parameters:

config (`dict`): The details of the experiment configuration (e.g., model parameters).
results (`dict`): The outcomes and metrics of the experiment (e.g., accuracy, F1 score).
file_path (`str`): The JSON file path where the logs are stored. (Default: `experiment_logs.json`)

import logging import json class ExperimentManager: """ Manages experiments, from setup to result tracking. """ @staticmethod def log_experiment(config, results, file_path="experiment_logs.json"): """ Logs configurations and results of an experiment. :param config: Configuration of the experiment :param results: Results obtained from the experiment :param file_path: Path to save the experiment log """ logging.info("Logging experiment data...") try: experiment_data = {"config": config, "results": results} with open(file_path, "a") as log_file: json.dump(experiment_data, log_file, indent=4) log_file.write("\n") logging.info("Experiment data logged successfully.") except Exception as e: logging.error(f"Failed to log experiment data: {e}")

The experiment data is appended as structured JSON into the specified file, enabling easy processing later.

Usage Examples

The system can be used for individual experiments as well as batch processing.

Example 1: Basic Experiment Logging

In this example, we log a basic experiment with model configuration and its results.

from experiment_manager import ExperimentManager experiment_config = { "model": "RandomForest", "hyperparameters": { "n_estimators": 100, "max_depth": 10 }, "dataset": "dataset_v1.csv" } experiment_results = { "accuracy": 0.89, "f1_score": 0.87 } # Log the experiment to the default file ExperimentManager.log_experiment(experiment_config, experiment_results) print("Experiment logged successfully!")

Logged JSON Output: ```json {

  "config": {
      "model": "RandomForest",
      "hyperparameters": {
          "n_estimators": 100,
          "max_depth": 10
      },
      "dataset": "dataset_v1.csv"
  },
  "results": {
      "accuracy": 0.89,
      "f1_score": 0.87
  }

} ```

Example 2: Custom File Path

Change the log file location by specifying the `file_path` parameter.

experiment_config = {"model": "SVM", "parameters": {"C": 1.0, "kernel": "linear"}} experiment_results = {"accuracy": 0.93} # Save to a custom location custom_file = "path/to/custom_experiment_log.json" ExperimentManager.log_experiment(experiment_config, experiment_results, file_path=custom_file)

This stores the JSON data in the provided file path.

Example 3: Adding Metadata

To log additional metadata like timestamps or unique IDs for each experiment, extend the configuration dictionary.

import datetime import uuid experiment_config = { "model": "LogisticRegression" } experiment_results = { "accuracy": 0.85 } # Add metadata experiment_config["metadata"] = { "timestamp": datetime.datetime.now().isoformat(), "experiment_id": str(uuid.uuid4()) } ExperimentManager.log_experiment(experiment_config, experiment_results)

Enhanced JSON Output Example: ```json {

  "config": {
      "model": "LogisticRegression",
      "metadata": {
          "timestamp": "2023-10-12T12:34:56.789Z",
          "experiment_id": "e75d48e8-b406-11ed-afa1-0242ac120002"
      }
  },
  "results": {
      "accuracy": 0.85
  }

} ```

Example 4: Batch Logging of Multiple Experiments

Log multiple experiments in a batch using iteration.

experiment_batches = [ {"config": {"model": "KNN", "k": 5}, "results": {"accuracy": 0.82}}, {"config": {"model": "GradientBoosting", "learning_rate": 0.05}, "results": {"accuracy": 0.91}} ] for experiment in experiment_batches: ExperimentManager.log_experiment(experiment["config"], experiment["results"])

Example 5: Enhanced Error Handling

Handle issues gracefully when file paths or permissions are incorrect.

try: ExperimentManager.log_experiment({"model": "XGB"}, {"accuracy": 0.88}, file_path="/restricted_path/log.json") except Exception as e: print(f"Failed to log due to: {e}")

Advanced Features

1. Cloud Storage: Save logs directly in storage services like AWS S3 or Azure Blob. 2. Database Integration: Replace JSON files with SQL/NoSQL storage. 3. Live Visualization: Use log files to feed dashboards using libraries like Dash/Streamlit. 4. Log Summarization: Automatically summarize experiment logs with key statistics.

Conclusion

The Experiment Manager is a highly modular and scalable solution for managing experiment logs. With its extensibility and practical design, it fits perfectly in any research or industrial application that requires traceability and reproducibility.

Generalized Omni-dimensional Development

Table of Contents