Introduction
The config.yaml file serves as a central configuration repository for the G.O.D Framework.
It defines global settings such as database connections, logging levels, file storage paths, and pipeline
parameters. This modular configuration allows the framework to maintain flexibility and ensures that developers
can easily update or customize settings without directly modifying code.
Purpose
The key purposes of config.yaml are:
- To act as a single source of truth for global configuration settings used throughout the framework.
- To enable easy customization for different environments (e.g., production, staging, development).
- To decouple configuration parameters from the core codebase, improving maintainability and scalability.
Structure
The configuration file follows a YAML format with a hierarchical structure. Below is an annotated example:
# config.yaml
system:
name: "G.O.D Framework" # Name of the framework
version: "1.0.0" # Framework version
environment: "production" # Current environment (production, development, staging)
database:
type: "mongodb" # Database type (e.g., mongodb, mysql)
host: "localhost" # Database host
port: 27017 # Database port
username: "admin" # Database username
password: "secure_password" # Database password (use environment variables for sensitive data)
database_name: "god_framework_db" # Name of the database instance
logging:
level: "INFO" # Logging level (DEBUG, INFO, WARNING, ERROR)
handlers: # Logging handlers
- console # Output logs to the console
- file # Write logs to a file
paths:
data_root: "/data" # Base directory for data storage
models: "/data/models" # Path to store machine learning models
logs: "/data/logs" # Path to store log files
cache: "/data/cache" # Path to store temporary files
pipeline:
default_batch_size: 64 # Default batch size for processing
retries: 3 # Number of retries for failed pipeline steps
timeout: 300 # Timeout (in seconds) for long-running operations
This example encapsulates key sections:
- System: General framework-level metadata such as name, version, and environment.
- Database: Settings for database connectivity.
- Logging: Manages log levels and output handlers.
- Paths: Defines directory paths for various data and model storage needs.
- Pipeline: Contains default parameters for pipeline configuration.
Core Fields
- system: High-level framework metadata and runtime environment control.
- database: Critical settings required for connecting and authenticating to the framework's primary database.
- logging: Directs the behavior of logging mechanisms, including log levels and output targets.
- paths: Centralizes directory paths for consistent file organization and access.
- pipeline: Parameterizes processing behaviors and fault tolerance mechanisms.
Integration with the G.O.D Framework
The config.yaml file is deeply integrated into the following modules:
- ai_pipeline_orchestrator.py: Reads pipeline configurations such as batch size and retry limits.
- ai_data_registry.py: Fetches paths for storing and retrieving data and model artifacts.
- logging system: Uses the specified logging level and handlers for monitoring execution.
- database modules: Extracts necessary credentials and details from the
databasefield.
Best Practices
- Store sensitive credentials (e.g., passwords) in environment variables and reference them in the YAML file.
- Validate the configuration file syntax after updates to prevent runtime issues.
- Keep environment-specific configurations in separate files (e.g.,
config-prod.yaml). Use a loader script to dynamically select the correct config. - Use descriptive comments within the YAML file to improve readability and reduce misconfigurations.
Future Enhancements
- Automate environment-specific configuration loading using runtime scripts.
- Introduce support for hierarchical overrides (e.g., local settings overriding global parameters).
- Implement validation scripts to detect invalid keys or values in the YAML file.
- Support dynamic reloading of configurations without restarting the system.