ai_disaster_recovery
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ai_disaster_recovery [2025/05/26 14:38] – [Key Features] eagleeyenebula | ai_disaster_recovery [2025/05/26 14:44] (current) – [Best Practices] eagleeyenebula | ||
|---|---|---|---|
| Line 46: | Line 46: | ||
| ==== Core Components ==== | ==== Core Components ==== | ||
| - | 1. **`save_checkpoint(step_name, | + | 1. **save_checkpoint(step_name, |
| - | - Saves the current state of the pipeline for the specified step. | + | * Saves the current state of the pipeline for the specified step. |
| - | - Logs the operation to ensure visibility in execution traces. | + | * Logs the operation to ensure visibility in execution traces. |
| - | 2. **`rollback_to_checkpoint(step_name)`**: | + | 2. **rollback_to_checkpoint(step_name)**: |
| - | - Retrieves the saved state for the specified step name. | + | * Retrieves the saved state for the specified step name. |
| - | - Allows the pipeline to resume execution from the last known good state. | + | * Allows the pipeline to resume execution from the last known good state. |
| - | 3. **Checkpoints Store (`self.checkpoints`)**: | + | 3. **Checkpoints Store (self.checkpoints)**: |
| - | - Maintains the in-memory storage for all pipeline checkpoints. | + | * Maintains the in-memory storage for all pipeline checkpoints. |
| - | - Key: `step_name` (uniquely identifies the pipeline step). | + | * Key: **step_name** (uniquely identifies the pipeline step). |
| - | - Value: Serialized state (`data`) to restore the pipeline. | + | * Value: Serialized state (**data**) to restore the pipeline. |
| ==== Class Definition ==== | ==== Class Definition ==== | ||
| - | ```python | + | < |
| + | python | ||
| import logging | import logging | ||
| Line 89: | Line 90: | ||
| logging.info(f" | logging.info(f" | ||
| return self.checkpoints.get(step_name, | return self.checkpoints.get(step_name, | ||
| - | ``` | + | </ |
| - | + | ||
| - | --- | + | |
| ===== Usage Examples ===== | ===== Usage Examples ===== | ||
| Line 101: | Line 100: | ||
| The following example demonstrates how to save pipeline checkpoints and perform a rollback: | The following example demonstrates how to save pipeline checkpoints and perform a rollback: | ||
| - | ```python | + | < |
| + | python | ||
| from ai_disaster_recovery import DisasterRecovery | from ai_disaster_recovery import DisasterRecovery | ||
| - | + | </ | |
| - | # Initialize the recovery manager | + | # **Initialize the recovery manager** |
| + | < | ||
| recovery_manager = DisasterRecovery() | recovery_manager = DisasterRecovery() | ||
| - | + | </ | |
| - | # Save checkpoints for pipeline steps | + | # **Save checkpoints for pipeline steps** |
| + | < | ||
| recovery_manager.save_checkpoint(" | recovery_manager.save_checkpoint(" | ||
| recovery_manager.save_checkpoint(" | recovery_manager.save_checkpoint(" | ||
| - | + | </ | |
| - | # Rollback to a checkpoint | + | # **Rollback to a checkpoint** |
| + | < | ||
| state_step_2 = recovery_manager.rollback_to_checkpoint(" | state_step_2 = recovery_manager.rollback_to_checkpoint(" | ||
| print(f" | print(f" | ||
| - | + | </ | |
| - | # Rollback to an undefined checkpoint | + | # **Rollback to an undefined checkpoint** |
| + | < | ||
| state_invalid = recovery_manager.rollback_to_checkpoint(" | state_invalid = recovery_manager.rollback_to_checkpoint(" | ||
| print(f" | print(f" | ||
| - | ``` | + | </ |
| **Expected Output:** | **Expected Output:** | ||
| Line 127: | Line 131: | ||
| In scenarios requiring persistent storage of checkpoints, | In scenarios requiring persistent storage of checkpoints, | ||
| - | ```python | + | < |
| + | python | ||
| import pickle | import pickle | ||
| from ai_disaster_recovery import DisasterRecovery | from ai_disaster_recovery import DisasterRecovery | ||
| Line 152: | Line 157: | ||
| logging.warning(f" | logging.warning(f" | ||
| return None | return None | ||
| - | + | < | |
| - | # Usage | + | # **Usage** |
| + | < | ||
| persistent_recovery = PersistentDisasterRecovery() | persistent_recovery = PersistentDisasterRecovery() | ||
| - | + | </ | |
| - | # Save and rollback with disk persistence | + | # **Save and rollback with disk persistence** |
| + | < | ||
| persistent_recovery.save_checkpoint(" | persistent_recovery.save_checkpoint(" | ||
| restored_data = persistent_recovery.rollback_to_checkpoint(" | restored_data = persistent_recovery.rollback_to_checkpoint(" | ||
| print(f" | print(f" | ||
| - | ``` | + | </ |
| **Expected Output:** | **Expected Output:** | ||
| - | |||
| - | |||
| - | --- | ||
| - | |||
| ===== Use Cases ===== | ===== Use Cases ===== | ||
| 1. **AI Model Training Pipelines**: | 1. **AI Model Training Pipelines**: | ||
| - | - Save model state after every training epoch for fault recovery. | + | * Save model state after every training epoch for fault recovery. |
| 2. **Data Processing Pipelines**: | 2. **Data Processing Pipelines**: | ||
| - | - Save intermediate transformation results to prevent reprocessing from scratch in the event of failure. | + | * Save intermediate transformation results to prevent reprocessing from scratch in the event of failure. |
| 3. **Workflow Management Systems**: | 3. **Workflow Management Systems**: | ||
| - | - Use checkpoints to incrementally save the state of a multi-step workflow. | + | * Use checkpoints to incrementally save the state of a multi-step workflow. |
| 4. **Debugging Complex Errors**: | 4. **Debugging Complex Errors**: | ||
| - | - Rollback to a known-good state for error analysis and testing. | + | * Rollback to a known-good state for error analysis and testing. |
| - | + | ||
| - | --- | + | |
| ===== Best Practices ===== | ===== Best Practices ===== | ||
| 1. **Granular Checkpoints**: | 1. **Granular Checkpoints**: | ||
| - | - Save checkpoints at critical pipeline steps (e.g., post-feature extraction, model training). | + | * Save checkpoints at critical pipeline steps (e.g., post-feature extraction, model training). |
| 2. **Logging and Debugging**: | 2. **Logging and Debugging**: | ||
| - | - Leverage logging to monitor checkpoint creation and rollback actions. | + | * Leverage logging to monitor checkpoint creation and rollback actions. |
| 3. **Serialization**: | 3. **Serialization**: | ||
| - | - Use serialization (e.g., | + | * Use serialization (e.g., |
| 4. **Version Control**: | 4. **Version Control**: | ||
| - | - Employ versioning for checkpoints to avoid overwriting critical recovery points. | + | * Employ versioning for checkpoints to avoid overwriting critical recovery points. |
| 5. **Secure Recovery**: | 5. **Secure Recovery**: | ||
| - | - When using external storage (e.g., cloud), ensure encryption to secure sensitive pipeline states. | + | * When using external storage (e.g., cloud), ensure encryption to secure sensitive pipeline states. |
| - | + | ||
| - | --- | + | |
| ===== Conclusion ===== | ===== Conclusion ===== | ||
ai_disaster_recovery.1748270295.txt.gz · Last modified: 2025/05/26 14:38 by eagleeyenebula
