ai_data_detection
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ai_data_detection [2025/05/25 15:06] – [Key Features] eagleeyenebula | ai_data_detection [2025/05/25 15:09] (current) – [Best Practices] eagleeyenebula | ||
|---|---|---|---|
| Line 233: | Line 233: | ||
| ===== Best Practices ===== | ===== Best Practices ===== | ||
| 1. **Use Incremental Checks:** Perform quality checks at different stages of the pipeline (e.g., after loading raw data and after preprocessing steps). | 1. **Use Incremental Checks:** Perform quality checks at different stages of the pipeline (e.g., after loading raw data and after preprocessing steps). | ||
| + | |||
| 2. **Automate Logging:** Set up centralized logging for tracking data issues across multiple datasets. | 2. **Automate Logging:** Set up centralized logging for tracking data issues across multiple datasets. | ||
| + | |||
| 3. **Adapt Custom Methods:** Extend the module for domain-specific checks, such as outlier detection, range checks, or invalid category detection. | 3. **Adapt Custom Methods:** Extend the module for domain-specific checks, such as outlier detection, range checks, or invalid category detection. | ||
| + | |||
| 4. **Handle Issues Early:** Address identified data issues before training machine learning models. | 4. **Handle Issues Early:** Address identified data issues before training machine learning models. | ||
| Line 246: | Line 249: | ||
| **Example: Adding Invalid Category Detection** | **Example: Adding Invalid Category Detection** | ||
| - | ```python | + | < |
| + | python | ||
| def has_invalid_categories(data, | def has_invalid_categories(data, | ||
| for col in data.select_dtypes(include=[' | for col in data.select_dtypes(include=[' | ||
| Line 254: | Line 258: | ||
| return True | return True | ||
| return False | return False | ||
| - | ``` | + | </ |
| ---- | ---- | ||
ai_data_detection.1748185583.txt.gz · Last modified: 2025/05/25 15:06 by eagleeyenebula
