ai_data_registry
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ai_data_registry [2025/05/25 19:47] – [Basic Use Case] eagleeyenebula | ai_data_registry [2025/05/25 19:55] (current) – [Conclusion] eagleeyenebula | ||
|---|---|---|---|
| Line 139: | Line 139: | ||
| You can add more intricate metadata, such as tags and related datasets, to help track high-level attributes of your datasets. | You can add more intricate metadata, such as tags and related datasets, to help track high-level attributes of your datasets. | ||
| - | < | + | < |
| catalog.add_entry(" | catalog.add_entry(" | ||
| " | " | ||
| Line 151: | Line 151: | ||
| === 2. Custom Storage Paths === | === 2. Custom Storage Paths === | ||
| - | By default, the **DataCatalog** saves the registry in `data_registry.json`. You can configure it to use a different file path when needed. | + | By default, the **DataCatalog** saves the registry in **data_registry.json**. You can configure it to use a different file path when needed. |
| - | <code python> | ||
| - | # Initialize the catalog with a custom file path | ||
| - | catalog = DataCatalog(registry_path="/ | ||
| - | # Add an entry to the catalog | + | # **Initialize the catalog with a custom file path** |
| + | < | ||
| + | catalog = DataCatalog(registry_path="/ | ||
| + | </ | ||
| + | # **Add an entry to the catalog** | ||
| + | < | ||
| catalog.add_entry(" | catalog.add_entry(" | ||
| " | " | ||
| " | " | ||
| }) | }) | ||
| - | + | </ | |
| - | # Output the catalog contents | + | # **Output the catalog contents** |
| + | < | ||
| print(catalog.load_catalog()) | print(catalog.load_catalog()) | ||
| </ | </ | ||
| Line 174: | Line 177: | ||
| Combine **DataCatalog** with versioning capabilities. This allows you to track the progress of specific dataset versions directly in your pipeline. | Combine **DataCatalog** with versioning capabilities. This allows you to track the progress of specific dataset versions directly in your pipeline. | ||
| - | < | + | < |
| def versioned_entry(catalog, | def versioned_entry(catalog, | ||
| """ | """ | ||
| Line 190: | Line 193: | ||
| } | } | ||
| catalog.add_entry(dataset_name, | catalog.add_entry(dataset_name, | ||
| - | + | </ | |
| - | # Initialize catalog | + | # **Initialize catalog** |
| + | < | ||
| catalog = DataCatalog() | catalog = DataCatalog() | ||
| - | + | </ | |
| - | # Add a versioned entry | + | # **Add a versioned entry** |
| + | < | ||
| versioned_entry( | versioned_entry( | ||
| catalog, | catalog, | ||
| Line 202: | Line 207: | ||
| size=" | size=" | ||
| ) | ) | ||
| - | + | </ | |
| - | # Output the catalog | + | # **Output the catalog** |
| + | < | ||
| print(catalog.load_catalog()) | print(catalog.load_catalog()) | ||
| </ | </ | ||
| Line 209: | Line 215: | ||
| **Expected Output:** | **Expected Output:** | ||
| - | < | + | < |
| { | { | ||
| " | " | ||
| Line 228: | Line 234: | ||
| ==== Best Practices ==== | ==== Best Practices ==== | ||
| - | To get the most out of **DataCatalog**, | + | To get the most out of **DataCatalog**, |
| - | * **Use Metadata Consistently**: | + | * **Use metadata consistently** Add fields like **source**, **size**, and **tags** to all datasets |
| - | Ensure that metadata | + | * **Secure |
| - | | + | * **Version |
| - | * **Secure | + | * **Automate |
| - | | + | |
| - | + | ||
| - | * **Version-Control Your Datasets**: | + | |
| - | Use versioning (e.g., | + | |
| - | + | ||
| - | * **Automate | + | |
| - | Integrate registry updates using pipeline automation | + | |
| - | + | ||
| - | --- | + | |
| ==== Extensibility ==== | ==== Extensibility ==== | ||
| Line 261: | Line 258: | ||
| ==== Conclusion ==== | ==== Conclusion ==== | ||
| - | The **DataCatalog** module is a scalable and flexible solution for managing metadata registries. With support for versioning, extensibility, | + | The **DataCatalog** module is a scalable and flexible solution for managing metadata registries. With support for versioning, extensibility, |
| - | + | ||
| - | Whether you’re working on small-scale or enterprise-level pipelines, the **DataCatalog** provides all the tools you need for clean and structured data management. | + | |
ai_data_registry.1748202433.txt.gz · Last modified: 2025/05/25 19:47 by eagleeyenebula
