Introduction
The ai_pre_execution_validator.py
module is a critical component of the G.O.D Framework, responsible for verifying the readiness of data, configurations, and system resources before initiating an AI pipeline. It ensures that all preconditions are met to prevent execution failures, ensuring robust and error-free pipeline runs.
Purpose
The key purpose of ai_pre_execution_validator.py
is to perform comprehensive validation checks, such as:
- Data Quality Assessment: Ensures input data conforms to expected formats, schemas, and statistical properties.
- Configuration Validation: Verifies the correctness of system configurations, parameters, and dependencies.
- System Readiness: Confirms resource availability (e.g., memory, CPU, GPU) for the pipeline run.
- Error Prevention: Detects potential issues early, preventing runtime failures.
Key Features
- Comprehensive Data Validation: Supports schema validation, range checks, and null value analysis.
- Dependency Checker: Ensures required third-party libraries and APIs are accessible.
- Dynamic Configuration Validation: Cross-checks configurations against predefined rules.
- Detailed Reports: Generates descriptive validation reports for debugging and traceability.
- Flexible Plugin System: Enables developers to add custom validation logic using plugins.
Logic and Implementation
The module uses a modular approach where individual validation checks are implemented as functions, organized into stages. A central class, PreExecutionValidator
, orchestrates the various checks.
import json
import os
import psutil
class PreExecutionValidator:
"""
Validates data, configurations, and system readiness before running the pipeline.
"""
def validate_data(self, data_file, schema_file):
"""
Validate input data using a schema file.
Args:
data_file (str): Path to the input data file.
schema_file (str): Path to the schema file.
Returns:
bool: True if validation passes, False otherwise.
"""
with open(data_file, 'r') as data, open(schema_file, 'r') as schema:
data = json.load(data)
schema = json.load(schema)
for field, properties in schema.items():
if field not in data:
print(f"Error: Missing field {field} in data.")
return False
if not isinstance(data[field], properties["type"]):
print(f"Error: Field {field} has incorrect type.")
return False
return True
def check_system_resources(self, min_memory_mb):
"""
Checks if the system has enough free memory to proceed.
Args:
min_memory_mb (int): Minimum memory required in MB.
Returns:
bool: True if sufficient memory is available, False otherwise.
"""
available_memory = psutil.virtual_memory().available / (1024 * 1024)
if available_memory < min_memory_mb:
print(f"Error: Insufficient memory. Required: {min_memory_mb} MB, Available: {available_memory} MB.")
return False
return True
def validate_config(self, config):
"""
Validates that the pipeline configurations meet minimum criteria.
Args:
config (dict): Pipeline configuration settings.
Returns:
bool: True if all validations pass.
"""
if "pipeline_name" not in config:
print("Error: 'pipeline_name' is required in configuration.")
return False
if "threads" in config and config["threads"] <= 0:
print("Error: 'threads' must be greater than 0.")
return False
return True
if __name__ == "__main__":
validator = PreExecutionValidator()
# Validate input data
is_data_valid = validator.validate_data("input_data.json", "data_schema.json")
# Check system resources
is_memory_sufficient = validator.check_system_resources(min_memory_mb=1024)
# Validate configuration
configuration = {
"pipeline_name": "example_pipeline",
"threads": 4
}
is_config_valid = validator.validate_config(configuration)
if is_data_valid and is_memory_sufficient and is_config_valid:
print("Validation successful. Ready to execute pipeline.")
else:
print("Validation failed. Fix errors before execution.")
Dependencies
The module depends on the following libraries:
json
: For processing and validating JSON data and schema files.psutil
: To retrieve system memory details and validate resource availability.os
: For file and path management when working with input/output files.
Usage
The script can be executed directly for pre-execution validation:
# Validate input data against schema
python ai_pre_execution_validator.py --data input_data.json --schema data_schema.json
# Check system resource availability (e.g., minimum 2GB memory)
python ai_pre_execution_validator.py --check-memory 2048
# Validate configurations
python ai_pre_execution_validator.py --config pipeline_config.json
Integration with the System
This module integrates with various components of the G.O.D Framework, including:
- ai_pipeline_orchestrator.py: Triggers this module to validate readiness before executing pipeline stages.
- ai_error_tracker.py: Logs errors and validation issues detected during execution.
- ai_pipeline_audit_logger.py: Records validation successes and failures in audit logs.
Future Enhancements
- Introduce machine learning-based anomaly detection for data validation.
- Enhance plugin architecture for better extensibility.
- Integrate with cloud systems to validate remote server resources.
- Develop a user interface to display validation results in real time.