test_data_ingestion
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| test_data_ingestion [2025/04/25 23:40] – external edit 127.0.0.1 | test_data_ingestion [2025/06/06 15:16] (current) – [Test Data Ingestion] eagleeyenebula | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Test Data Ingestion ====== | ====== Test Data Ingestion ====== | ||
| - | * **[[https:// | + | **[[https:// |
| - | The **Test Data Ingestion** module is designed to validate the integrity and reliability of the data ingestion pipeline. It ensures | + | The **Test Data Ingestion** module is designed to validate the integrity and reliability of the data ingestion pipeline |
| + | {{youtube> | ||
| + | |||
| + | ------------------------------------------------------------- | ||
| + | |||
| + | Beyond simply verifying data correctness, | ||
| ===== Overview ===== | ===== Overview ===== | ||
| Line 25: | Line 30: | ||
| 1. **Test Initialization**: | 1. **Test Initialization**: | ||
| - | | + | * Import the required |
| 2. **Test Case Creation**: | 2. **Test Case Creation**: | ||
| - | | + | * Define a **unittest.TestCase** class to encapsulate the test cases for data ingestion. |
| 3. **Data Loading Validation**: | 3. **Data Loading Validation**: | ||
| - | Test the `load_data()` method of the `DataIngestion` class to ensure proper functionality. | + | * Test the **load_data()** method of the **DataIngestion** class to ensure proper functionality. |
| 4. **Assertions**: | 4. **Assertions**: | ||
| - | Check the dataset for expected properties, such as row count, column names, and data consistency. | + | * Check the dataset for expected properties, such as row count, column names, and data consistency. |
| ===== Class and Code Skeleton ===== | ===== Class and Code Skeleton ===== | ||
| - | The `TestDataIngestion` class is structured to validate the loading of data files and ensure the module behaves as expected. | + | The **TestDataIngestion** class is structured to validate the loading of data files and ensure the module behaves as expected. |
| < | < | ||
| - | ```python | + | python |
| import unittest | import unittest | ||
| from ai_data_ingestion import DataIngestion | from ai_data_ingestion import DataIngestion | ||
| Line 56: | Line 61: | ||
| data = DataIngestion.load_data(" | data = DataIngestion.load_data(" | ||
| self.assertEqual(len(data), | self.assertEqual(len(data), | ||
| - | ``` | + | |
| </ | </ | ||
| === Test Method Breakdown === | === Test Method Breakdown === | ||
| - | Below is a breakdown of the `test_data_loading` method: | + | Below is a breakdown of the **test_data_loading** method: |
| - | * **Loading the Dataset**: | + | **Loading the Dataset**: |
| - | The `load_data` method loads the CSV file and returns the data as a structured object (e.g., a Pandas DataFrame or similar format). | + | |
| - | * **Validation**: | + | **Validation**: |
| - | The test validates that the dataset contains exactly 1,000 rows, ensuring no data loss during ingestion. | + | |
| === Running the Test Suite === | === Running the Test Suite === | ||
| Line 73: | Line 78: | ||
| To execute the test suite, use the `unittest` CLI command: | To execute the test suite, use the `unittest` CLI command: | ||
| < | < | ||
| - | ```bash | + | bash |
| python -m unittest test_data_ingestion.py | python -m unittest test_data_ingestion.py | ||
| - | ``` | + | |
| </ | </ | ||
| **Expected Output**: | **Expected Output**: | ||
| < | < | ||
| - | ``` | + | |
| - | `. ---------------------------------------------------------------------- Ran 1 test in 0.002s OK ` | + | . ---------------------------------------------------------------------- Ran 1 test in 0.002s OK |
| - | ``` | + | |
| </ | </ | ||
| Line 94: | Line 99: | ||
| < | < | ||
| - | ```python | + | python |
| def test_column_validation(self): | def test_column_validation(self): | ||
| """ | """ | ||
| Line 103: | Line 108: | ||
| for column in required_columns: | for column in required_columns: | ||
| self.assertIn(column, | self.assertIn(column, | ||
| - | ``` | + | |
| </ | </ | ||
| Line 111: | Line 116: | ||
| < | < | ||
| - | ```python | + | python |
| def test_empty_file(self): | def test_empty_file(self): | ||
| """ | """ | ||
| Line 118: | Line 123: | ||
| with self.assertRaises(ValueError): | with self.assertRaises(ValueError): | ||
| DataIngestion.load_data(" | DataIngestion.load_data(" | ||
| - | ``` | + | |
| </ | </ | ||
| Line 126: | Line 131: | ||
| < | < | ||
| - | ```python | + | python |
| def test_invalid_file_path(self): | def test_invalid_file_path(self): | ||
| """ | """ | ||
| Line 133: | Line 138: | ||
| with self.assertRaises(FileNotFoundError): | with self.assertRaises(FileNotFoundError): | ||
| DataIngestion.load_data(" | DataIngestion.load_data(" | ||
| - | ``` | + | |
| </ | </ | ||
| Line 141: | Line 146: | ||
| < | < | ||
| - | ```python | + | python |
| def test_data_integrity(self): | def test_data_integrity(self): | ||
| """ | """ | ||
| Line 149: | Line 154: | ||
| self.assertEqual(data.iloc[0][" | self.assertEqual(data.iloc[0][" | ||
| self.assertAlmostEqual(data.iloc[0][" | self.assertAlmostEqual(data.iloc[0][" | ||
| - | ``` | + | |
| </ | </ | ||
| Line 157: | Line 162: | ||
| < | < | ||
| - | ```python | + | python |
| def test_large_dataset(self): | def test_large_dataset(self): | ||
| """ | """ | ||
| Line 168: | Line 173: | ||
| self.assertEqual(len(data), | self.assertEqual(len(data), | ||
| self.assertLess(end_time - start_time, 10) # Ingestion should complete within 10 seconds | self.assertLess(end_time - start_time, 10) # Ingestion should complete within 10 seconds | ||
| - | ``` | + | |
| </ | </ | ||
| Line 176: | Line 181: | ||
| < | < | ||
| - | ```yaml | + | yaml |
| name: Test Data Ingestion | name: Test Data Ingestion | ||
| Line 199: | Line 204: | ||
| - name: Run Unit Tests | - name: Run Unit Tests | ||
| run: python -m unittest discover tests | run: python -m unittest discover tests | ||
| - | ``` | + | |
| </ | </ | ||
| Line 211: | Line 216: | ||
| 3. **Continuous Testing**: | 3. **Continuous Testing**: | ||
| - | - Integrate the test module into automated CI/CD pipelines to catch regression errors. | + | - Integrate the test module into automated |
| 4. **Extend Framework**: | 4. **Extend Framework**: | ||
| - | - Add new tests as additional ingestion features or file formats (e.g., JSON, Parquet) are supported. | + | - Add new tests as additional ingestion features or file formats (e.g., |
| ===== Advanced Functionalities ===== | ===== Advanced Functionalities ===== | ||
| Line 227: | Line 232: | ||
| ===== Conclusion ===== | ===== Conclusion ===== | ||
| - | The **Test Data Ingestion** | + | The **Test Data Ingestion |
| + | |||
| + | Incorporating this module | ||
test_data_ingestion.1745624455.txt.gz · Last modified: 2025/04/25 23:40 by 127.0.0.1
