Introduction
The ai_lambda_model_inference.py
script is a component of the G.O.D Framework responsible for deploying
machine learning model inference capabilities in a serverless environment. This module primarily supports **AWS Lambda**, enabling scalable,
pay-as-you-go inference for real-time and batch predictions. Using this script, developers can invoke pre-trained models
to generate predictions without managing underlying server infrastructure.
Purpose
- Provide serverless deployment and real-time inference for trained ML models.
- Eliminate the need for manual server management and reduce operational costs.
- Offer integration with APIs or automated pipelines via AWS Lambda.
- Streamline input preprocessing and output postprocessing workflows during inference.
Key Features
- Serverless Model Hosting: Upload and deploy ML models in AWS Lambda for near-instant workloads.
- Real-time Prediction: Handle real-time prediction requests from connected APIs or webhooks.
- Cost Efficiency: Automatically scale to handle requests without fixed hardware costs.
- Data Transformation: Perform input preprocessing and output formatting inside the lambda function.
- Cloud Compatibility: Full compatibility with AWS services (S3, Lambda layers, API Gateway).
Logic and Implementation
The core logic of this script involves defining an AWS Lambda function that loads a pre-trained ML model and processes incoming HTTP or event-based requests. The script processes the data, runs inference with the model, and formats the response for downstream systems.
import boto3
import pickle
import json
import logging
import numpy as np
# Initialize logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Load model from S3 bucket during lambda cold start
s3 = boto3.client('s3')
def load_model(bucket_name, model_key):
"""
Load a serialized model from S3.
:param bucket_name: S3 bucket name.
:param model_key: Path to the model inside the S3 bucket.
:return: Loaded machine learning model.
"""
logger.info(f"[INFO] Downloading model from S3 bucket: {bucket_name}, Key: {model_key}")
response = s3.get_object(Bucket=bucket_name, Key=model_key)
model = pickle.loads(response['Body'].read())
logger.info("[INFO] Model loaded successfully.")
return model
# Lambda handler function
def lambda_handler(event, context):
"""
AWS Lambda entry point for predictions.
Reads input data, runs model inference, and returns results.
"""
bucket_name = "my-ml-models"
model_key = "iris_model.pkl"
model = load_model(bucket_name, model_key)
try:
# Parse input data
input_data = json.loads(event['body'])
features = np.array(input_data['features']).reshape(1, -1)
# Generate predictions
predictions = model.predict(features)
logger.info(f"[INFO] Prediction successful: {predictions}")
# Build successful response
return {
"statusCode": 200,
"body": json.dumps({"predictions": predictions.tolist()})
}
except Exception as e:
logger.error(f"[ERROR] Inference failed: {str(e)}")
return {
"statusCode": 500,
"body": json.dumps({"error": "Inference failed. Check input data format."})
}
# (Optional) Local testing
if __name__ == "__main__":
event = {
"body": json.dumps({"features": [5.1, 3.5, 1.4, 0.2]})
}
print(lambda_handler(event, None))
Dependencies
boto3
: AWS SDK for managing S3 and other AWS services.pickle
: Used to serialize and deserialize machine learning models.numpy
: Handles input features and numerical data transformation.json
: Processes event data and serializes responses.
Usage
This script is directly deployed as an AWS Lambda function using CloudFormation or CLI. Developers can configure Lambda triggers (e.g., API Gateway or an SQS queue) to invoke the function for predictions. Here’s an example CloudFormation template snippet:
Resources:
PredictLambda:
Type: AWS::Lambda::Function
Properties:
Handler: ai_lambda_model_inference.lambda_handler
Runtime: python3.x
Code:
S3Bucket: my-deployment-bucket
S3Key: deployment-package.zip
Role: arn:aws:iam::123456789012:role/lambda-execution-role
System Integration
- REST APIs: Invoke Lambda function from an API Gateway for real-time predictions.
- Serverless Pipelines: Chain Lambda in data processing workflows (e.g., SQS -> Lambda -> Model).
- IoT Devices: Use Lambda for lightweight, scalable predictions over IoT event streams.
Future Enhancements
- Implement multi-model support to handle multiple prediction tasks in one Lambda function.
- Add GPU support for ML inference to improve performance on deep learning models.
- Implement caching mechanisms for downloaded models to reduce S3 retrievals.