This is an old revision of the document!
Table of Contents
Experiment Manager
The Experiment Manager system provides a structured and extensible solution for managing experiments by logging configurations, results, and metadata in a reproducible and scalable way. This enables researchers to ensure traceability of experiments while seamlessly integrating into machine learning or research workflows.
Overview
The Experiment Manager simplifies the process of:
- Logging experimental configurations and results in a consistent, structured format.
- Storing logs in a JSON file for further analysis or sharing.
- Extending functionality to include additional metadata or integrate with different storage backends.
This documentation provides instructions for using the Experiment Manager system efficiently, with advanced examples and practical usage guidance.
Key Features
- Customizable Experiment Logs: Logs detailed configuration and results for experiments while supporting additional metadata on demand.
- Error Handling: Ensures logging failures do not interrupt the larger process.
- JSON-Based Logs: Outputs scalable and structured data compatible with visualization and analytics tools.
- Extensibility: Easy to extend or adapt for complex workflows.
- Plug-and-Play Design: Simple integration into research pipelines or machine learning processes.
Purpose and Goals
The Experiment Manager was designed to: 1. Facilitate Reproducibility: Record complete experiment details for accurate reproduction of results. 2. Enable Systematic Logging: Automate the tracking of configurations and results to reduce human error. 3. Support Scalable Workflows: Handle multiple experiments with ease. 4. Empower Transparent Research: Maintain an accessible log of experiments for analysis, sharing, or validation.
System Design
The Experiment Manager relies on Python's logging and json modules to ensure:
- Structured Output: All experiments are appended as JSON objects into a log file.
- Seamless Processing: The system is ready for extensions, from cloud integrations to storage backends like databases.
Here is the core implementation:
import logging import json class ExperimentManager: """ Handles experiment management: logs configurations and results. """ @staticmethod def log_experiment(config, results, file_path="experiment_logs.json"): """ Logs an experiment's configurations and results into a file. :param config: Dictionary describing the experiment's settings. :param results: Dictionary representing the outcomes of the experiment. :param file_path: Path to save the experiment log (default = experiment_logs.json). """ logging.info("Logging experiment data...") try: experiment_data = {"config": config, "results": results} with open(file_path, "a") as log_file: json.dump(experiment_data, log_file, indent=4) log_file.write("\n") logging.info("Experiment data logged successfully.") except Exception as e: logging.error(f"Failed to log experiment data: {e}")
Implementation and Usage
Example 1: Logging a Basic Experiment
Log a single experiment with a simple configuration and results.
from experiment_manager import ExperimentManager experiment_config = { "model": "RandomForest", "hyperparameters": { "n_estimators": 100, "max_depth": 10 }, "dataset": "train_v1.csv" } experiment_results = { "accuracy": 0.89, "f1_score": 0.87 } # Log the experiment ExperimentManager.log_experiment(experiment_config, experiment_results) print("Experiment logged successfully!")
Expected JSON Output (Default: `experiment_logs.json`):
```json {
"config": {
"model": "RandomForest",
"hyperparameters": {
"n_estimators": 100,
"max_depth": 10
},
"dataset": "train_v1.csv"
},
"results": {
"accuracy": 0.89,
"f1_score": 0.87
}
} ```
Example 2: Using a Custom Log File
Change the storage location for experiment logs by supplying a different file path.
experiment_config = { "model": "SVM", "parameters": { "C": 1.0, "kernel": "linear" } } experiment_results = { "accuracy": 0.91 } # Specify a custom path for logging custom_file_path = "logs/svm_experiment.json" ExperimentManager.log_experiment(experiment_config, experiment_results, file_path=custom_file_path)
JSON Output (Example: `logs/svm_experiment.json`): ```json {
"config": {
"model": "SVM",
"parameters": {
"C": 1.0,
"kernel": "linear"
}
},
"results": {
"accuracy": 0.91
}
} ```
Example 3: Enhanced Logging with Metadata
Add additional fields like `timestamp` or `experiment_id` for traceability.
import datetime import uuid from experiment_manager import ExperimentManager experiment_config = {"model": "Logistic Regression"} experiment_results = {"accuracy": 0.85} # Add metadata timestamp = datetime.datetime.now().isoformat() experiment_id = str(uuid.uuid4()) experiment_config["metadata"] = { "timestamp": timestamp, "experiment_id": experiment_id } # Log with metadata ExperimentManager.log_experiment(experiment_config, experiment_results)
Enhanced JSON Output: ```json {
"config": {
"model": "Logistic Regression",
"metadata": {
"timestamp": "2023-10-12T12:34:56.789123",
"experiment_id": "b1c95b89-d03e-4d5e-832f-4a5d4124e238"
}
},
"results": {
"accuracy": 0.85
}
} ```
Example 4: Batch Logging of Multiple Experiments
Log a pipeline of experiments in a batch for better efficiency.
experiments = [ { "config": {"model": "KNN", "parameters": {"k": 3}}, "results": {"accuracy": 0.78} }, { "config": {"model": "XGBoost", "parameters": {"learning_rate": 0.01}}, "results": {"accuracy": 0.92} } ] for experiment in experiments: ExperimentManager.log_experiment(experiment["config"], experiment["results"])
Advanced Features
1. Extensible Storage Backends:
Use SQLite, PostgreSQL, or NoSQL databases like MongoDB for logging large datasets.
2. Integrations with Cloud Storage:
Save experiment logs to cloud-based solutions like AWS S3, Azure Blob, or Google Drive.
3. Data Visualization:
Process logged experiments to easily generate analysis or plots with Seaborn, Matplotlib, or Plotly.
4. Summarization Tools:
Include summarization techniques to extract key metrics (e.g., highest accuracy).
Best Practices
- Always define custom experiment IDs for traceability in larger pipelines.
- Regularly back up your logs to avoid data loss.
- Use structured metadata to inject contextual details (e.g., timestamps, execution environments).
Conclusion
The Experiment Manager offers an efficient methodology to log, manage, and analyze experimental data. Whether you manage small, standalone experiments or massive machine learning pipelines, the system is customizable and extensible to fit your needs.
