Serverless Generative AI on AWS- An Overview

Prasanth Mathesh
3 min readMar 31, 2024

Introduction

Generative AI can be adopted in a wide range of industries unlike Blockchain that was very specific to financial and supply chain domain. Last year, hyper scalers rolled out native services for Generative AI use cases, specifically from AWS was Amazon Bedrock. In this article, let us see an overview about the Gen AI specific AWS services and how one can build a serverless application for Generative AI using them.

AWS Services

Amazon Bedrock: It is a new service specifically designed for building and scaling generative AI applications.

With Amazon Bedrock, you can:

  • Access leading foundation models (such as Claude 3 Haiku) that serve as the basis for generative AI.
  • Customize the foundation models with your own data.
  • Benefit from enterprise-grade security, privacy and responsible AI features.
  • Claude 3 Haiku is one of the foundation models available on Amazon Bedrock. It is designed to generate poetic and concise text, making it ideal for creative applications.

Amazon EC2 P5 instances: These GPU-based instances provide high performance for training models.

AWS Trainium and AWS Inferentia: Purpose-built accelerators for efficient inference.

Amazon SageMaker JumpStart’s ML Hub to accelerate your model development.

The developer’s responsibility is to build and deploy the model. For other task like LLM Ops, the developer need not create any custom code, as AWS has ready to use services.

The key services to complete an AWS based serverless Generative AI architecture are:

  1. Serverless Compute: AWS Lambda, allows us to run code in response to events without managing servers.
  2. Generative AI Models: Amazon Bedrock provides access to foundation models from Amazon and select third-party providers via a simple API. These models serve as the building blocks for generative AI applications.
  3. API Endpoint Creation: API endpoint that accepts POST requests. For example, you can send a POST request to an Generative AI endpoint. For this, AWS API Gateway will be a good choice.
  4. Response Generation: When you send a prompt, Amazon Bedrock generates a response. AWS Lambda can be invoked via API Gateway and it can fetch response from the End Point.

High Level Solution:

Serverless LLM

In my previous posts, I have explained how to provision API gateway, set up end point etc. This is inline with those only the target endpoint will interact with the AWS Bedrock Agents. The code snippet to invoke a titan text model is given below:

bedrock_client = session.client(service_name="bedrock",config=retry_config,**client_kwargs )
prompt = "can you get me the previous day sales"
body = json.dumps({"inputText": prompt})
modelId = "amazon.titan-tg1-large"
accept = "application/json"
contentType = "application/json"
response = bedrock_client.invoke_model_with_response_stream(
body=body, modelId=modelId, accept=accept, contentType=contentType
)

Detailed Reference:

For InvokeModelWithResponseStream, response is returned as stream so while designing the app layer one should consider this.

Conclusion

As we have seen above, there is no necessity to provision capacity, manage infrastructure or write custom code — Amazon Bedrock handles prompt engineering, etc. Developers need to associate a knowledge base with the agent for end users to interact with the model. Data Engineering and Architecture will play a pivotal role while designing the data models and creating the training datasets. The foundation models like Bedrock along with self fine tuning stack (like SFT Trainer) will find a place in the existing Data & AI department of organizations and Customer Facing SaaS products.

--

--