LangChain + AWS Bedrock Workshop — Deploy LLMs on AWS

header

info

These are my notes for the Workshop Section.

In the workshop you have two methods of running the labs, at an AWS event, or in your own account.

Running in my own AWS account

Enable Bedrock

I've already done this.

AWS Cloud9 setup

spin up a t3.small EC2 instance.

pull down the repo:

cd ~/environment/
curl 'https://static.us-east-1.prod.workshops.aws/public/b41bacc3-e25c-4826-8554-b4aa2cb9a2e5/assets/workshop.zip' --output workshop.zip
unzip workshop.zip

install requirements

pip3 install -r ~/environment/workshop/setup/requirements.txt -U

test working

cloudbuilderio:~/environment/workshop $ python3 ./completed/api/bedrock_api.py 

Manchester is the largest and most populous city in New Hampshire.

Local Setup

Please note, for a few of the labs I ran it in my local Linux environment which required specific setup to get things going.

I still downloaded the workshop.zip and followed instructions as per, but had to tweak my environment along the way.

A few things if you're going to run local, in the root workshop/ directory:

create a virtual env: python3 -m venv .env
activate it: source .env/bin/activate
install dependencies pip3 install -r requirements

I will list my compiled requirements.txt here:

# requirements
boto3
langchain_community
streamlit
langchain
pypdf

Foundational Concepts

Play around with examples, play with temp, top p, response length.

View API request doesn't show up on all examples (greyed out).

Here's one:

aws bedrock-runtime invoke-model \
--model-id meta.llama2-13b-chat-v1 \
--body "{\"prompt\":\"[INST]You are a a very intelligent bot with exceptional critical thinking[/INST]\\nI went to the market and bought 10 apples. I gave 2 apples to your friend and 2 to the helper. I then went and bought 5 more apples and ate 1. How many apples did I remain with?\\n\\nLet's think step by step.\\n\\n\\nFirst, I went to the market and bought 10 apples.\\n\\nThen, I gave 2 apples to your friend.\\n\\nSo, I have 10 - 2 = 8 apples left.\\n\\nNext, I gave 2 apples to the helper.\\n\\nSo, I have 8 - 2 = 6 apples left.\\n\\nNow, I went and bought 5 more apples.\\n\\nSo, I have 6 + 5 = 11 apples left.\\n\\nFinally, I ate 1 apple.\\n\\nSo, I have 11 - 1 = 10 apples left.\\n\\nTherefore, I remain with 10 apples.\",\"max_gen_len\":512,\"temperature\":0.5,\"top_p\":0.9}" \
--cli-binary-format raw-in-base64-out \
--region us-east-1 \
invoke-model-output.txt

note

The API call was most familiar to me because of my SageMaker LLM project, but for that I pointed at an inference endpoint, whereas here we call the --model-id.

API

import json
import boto3

session = boto3.Session()

bedrock = session.client(service_name='bedrock-runtime') #creates a Bedrock client

bedrock_model_id = "ai21.j2-ultra-v1" #set the foundation model

prompt = "What's the name of the emerald mine that Elon Musk's father owns?" #the prompt to send to the model

body = json.dumps({
    "prompt": prompt, #AI21
    "maxTokens": 1024, 
    "temperature": 0, 
    "topP": 0.5, 
    "stopSequences": [], 
    "countPenalty": {"scale": 0 }, 
    "presencePenalty": {"scale": 0 }, 
    "frequencyPenalty": {"scale": 0 }
}) #build the request payload

# invoke

response = bedrock.invoke_model(body=body, modelId=bedrock_model_id, accept='application/json', contentType='application/json') #send the payload to Bedrock

response_body = json.loads(response.get('body').read()) # read the response

response_text = response_body.get("completions")[0].get("data").get("text") #extract the text from the JSON response

print(response_text)

output

~/R/AWSB/w/l/api ❯ python3 ./bedrock_api.py

Elon Musk's father, Errol Musk, owns the emerald mine in Chivor, Colombia.

I originally set my prompt to "Write a poem about Serena Williams" and this is what I got:

~/R/AWSB/w/l/api ❯ python3 ./bedrock_api.py             took 4s

Manchester is the largest and most populous city in New Hampshire.

~/R/AWSB/w/l/api ❯ python3 ./bedrock_api.py            took 19s

Serena Williams,

A champion on the court,

A role model off,

A fierce competitor,

A fierce advocate for equality,

A fierce advocate for women's rights,

A fierce advocate for social justice,

A fierce advocate for change,

A fierce advocate for herself,

A fierce advocate for others,

A fierce advocate for the game,

A fierce advocate for the sport,

A fierce advocate for the world,

A fierce advocate for humanity,

A fierce advocate for love,

A fierce advocate for life,

A fierce advocate for everything,

A fierce advocate for nothing,

A fierce advocate for everything,

A fierce advocate for nothing,
...

# repeats the everything, nothing line again 263 times!!!

a bit 😬.

Speed

✅ For the single answer questions, the API is really quite fast: ~4s

⚠️ The poem took a while ~19s but from the output, looked caught in a loop.

Langchain

info

	✅ Pros	❌ Cons
boto3	more control, details	have to handle, manage more details
Langchain	abstracted, focus on text in and out	less verbose, granular than boto3

Code:

bedrock_langchain.py
from langchain_community.llms import Bedrock

llm = Bedrock( #create a Bedrock llm client
    model_id="ai21.j2-ultra-v1" #set the foundation model
)

prompt = "What is the largest city in New Zealand?"

response_text = llm.invoke(prompt) #return a response to the prompt

print(response_text)

output

~/R/AWSB/w/l/langchain ❯ python3 ./bedrock_langchain.py 

The largest city in New Zealand is Auckland, with a population of approximately 1.5 million. It is located

Code must smaller than with boto3.

Inference Parameters

missing updates.

I had to update some details in the workshop code as default params for the models had been updated e.g. for Anthropic, the parameter is replaced max_tokens with max_tokens_to_sample

params.py
import sys
from langchain_community.llms import Bedrock

def get_inference_parameters(model): #return a default set of parameters based on the model's provider
    bedrock_model_provider = model.split('.')[0] #grab the model provider from the first part of the model id
    
    if (bedrock_model_provider == 'anthropic'): #Anthropic model
        return { #anthropic
            "max_tokens_to_sample": 512, # my update
            "temperature": 0, 
            "top_k": 250, 
            "top_p": 1, 
            "stop_sequences": ["\n\nHuman:"] 
           }
    
    elif (bedrock_model_provider == 'ai21'): #AI21
        return { #AI21
            "maxTokens": 512, 
            "temperature": 0, 
            "topP": 0.5, 
            "stopSequences": [], 
            "countPenalty": {"scale": 0 }, 
            "presencePenalty": {"scale": 0 }, 
            "frequencyPenalty": {"scale": 0 } 
           }
    
    elif (bedrock_model_provider == 'cohere'): #COHERE
        return {
            "max_tokens": 512,
            "temperature": 0,
            "p": 0.01,
            "k": 0,
            "stop_sequences": [],
            "return_likelihoods": "NONE"
        }
    
    elif (bedrock_model_provider == 'meta'): #META
        return {
            "temperature": 0,
            "top_p": 0.9,
            "max_gen_len": 512
        }
    
    elif (bedrock_model_provider == 'mistral'): #MISTRAL
        return {
            "max_tokens" : 512,
            "stop" : [],    
            "temperature": 0,
            "top_p": 0.9,
            "top_k": 50
        } 

    else: #Amazon
        #For the LangChain Bedrock implementation, these parameters will be added to the 
        #textGenerationConfig item that LangChain creates for us
        return { 
            "maxTokenCount": 512, 
            "stopSequences": [], 
            "temperature": 0, 
            "topP": 0.9 
        }

# setup a function that pulls our request params together
def get_text_response(model, input_content): #text-to-text client function
    
    model_kwargs = get_inference_parameters(model) #get the default parameters based on the selected model
    
    llm = Bedrock( #create a Bedrock llm client
        model_id=model, #use the requested model
        model_kwargs = model_kwargs
    )
    
    return llm.invoke(input_content) #return a response to the prompt

# make a call, capture in response
response = get_text_response(sys.argv[1], sys.argv[2])

print(response)

Run it with args (cos you asked for sys.argv[1] and sys.argv[2]):

python3 ./params.py "ai21.j2-ultra-v1" "Write a haiku:"

output:

~/R/AWSB/w/l/params ❯ python3 ./params.py "ai21.j2-ultra-v1" "Write a haiku:"

leaves rustle in breeze
autumn colors slowly fade
nature's symphony

Control Response Variability

temperature.py
import sys
from langchain_community.llms import Bedrock

def get_text_response(input_content, temperature): #text-to-text client function
  
  model_kwargs = { #AI21
      "maxTokens": 1024, 
      "temperature": temperature, 
      "topP": 0.5, 
      "stopSequences": [], 
      "countPenalty": {"scale": 0 }, 
      "presencePenalty": {"scale": 0 }, 
      "frequencyPenalty": {"scale": 0 } 
  }
  
  llm = Bedrock( #create a Bedrock llm client
      model_id="ai21.j2-ultra-v1",
      model_kwargs = model_kwargs
  )
  
  return llm.invoke(input_content) #return a response to the prompt

for i in range(3):
  response = get_text_response(sys.argv[1], float(sys.argv[2]))
  print(response)

Basically, you're setting up the function to take temperature argument from user, pass it into the model kwargs.

A temperature of 0.0 should give you same reponse every time, anything over that should have some variety.

output:

/workshop/labs/temperature ❯ python3 ./temperature.py "Write a haiku about China" 1.0s

China - vast and ancient
A land of contrasts and wonders
A tapestry woven

China - vast and ancient
A land of contrasts and mystery
A tapestry woven through time

China - vast and ancient
A land of contrasts and wonders
A tapestry woven
/workshop/labs/temperature ❯ python3 ./temperature.py "Write a haiku about China" 1.0s

China - vast and ancient
A land of contrasts and mystery
A tapestry woven through time

China - vast and ancient
A land of contrasts and wonders
A journey to discovery

China - vast and ancient
A land of contrasts and wonders
A tapestry woven
/workshop/labs/temperature ❯ python3 ./temperature.py "Write a haiku about China" 1.0s

China - vast and ancient
A land of contrasts and wonders
A place to discover

China - vast and ancient
A land of contrasts and wonders
A journey to discovery

China - vast and ancient
A land of contrasts and mystery
A tapestry woven through time
/workshop/labs/temperature ❯ python3 ./temperature.py "Write a haiku about China" 1.0s

China - vast and ancient
A land of contrasts and mystery
A fascinating country

China - vast and ancient
A land of contrasts and wonders
A journey to discovery

China - vast and ancient
A land of contrasts and mystery
A tapestry woven through time
/workshop/labs/temperature ❯ python3 ./temperature.py "Write a haiku about China" 1.0s

China - vast and ancient
A land of contrasts and mystery
A tapestry woven through time

China - vast and ancient
A land of contrasts and wonders
A culture rich and beautiful

China - vast and ancient
A land of contrasts and mystery
A world of wonder

note

Pretty shit tbh 🤣

Streaming API

streaming.py
import json
import boto3

session = boto3.Session()

bedrock = session.client(service_name='bedrock-runtime') #creates a Bedrock client

def chunk_handler(chunk):
  print(chunk, end='')

def get_streaming_response(prompt, streaming_callback):

  bedrock_model_id = "anthropic.claude-3-sonnet-20240229-v1:0" #set the foundation model

  body = json.dumps({
    "prompt": prompt, #ANTHROPIC
    "max_tokens": 4000,
    "temperature": 0, 
    "top_k": 250, 
    "top_p": 1, 
    "stop_sequences": ["\n\nHuman:"] 
})
  
  
  
  body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 8000,
    "temperature": 0,
    "messages": [
      {
        "role": "user",
        "content": [{ "type": "text", "text": prompt } ]
      }
    ],
  })
  
  response = bedrock.invoke_model_with_response_stream(modelId=bedrock_model_id, body=body) #invoke the streaming method
  
  for event in response.get('body'):
    chunk = json.loads(event['chunk']['bytes'])

    if chunk['type'] == 'content_block_delta':
      if chunk['delta']['type'] == 'text_delta':
        streaming_callback(chunk['delta']['text'])

prompt = "Tell me a story about two puppies and two kittens who became best friends:"

get_streaming_response(prompt, chunk_handler)

Clunky, but works as expected:

workshop/labs/intro_streaming ❯ python3 ./intro_streaming.py                                                                                                            took  10s  .env at  12:42:30
Here is a story about two puppies and two kittens who became best friends:

Daisy and Buddy were two rambunctious golden retriever puppies who loved to play and get into mischief. One sunny day, they dug their way under the fence into the neighbor's yard. To their surprise, they came face to face with two tiny kittens named Smokey and Ginger who had been born just a few weeks earlier. 

At first, the puppies and kittens were wary of each other, having never seen animals like that before. Daisy barked and Buddy wagged his tail furiously. Smokey arched his back and hissed while little Ginger tried to hide behind a potted plant. But after circling each other cautiously, Daisy plopped down and let out a friendly puppy whine. Smokey was the first to relax, sniffing at the puppies' faces.

From that day on, the four became an inseparable crew. The puppies were infinitely gentle and patient, letting the kittens climb all over them. They taught the kittens to play chase and tug-of-war with old socks. The kittens showed the puppies how to stalk and pounce on toys. They napped together in warm puppy piles, taking turns grooming each other's fur.

As they grew older, their differences didn't matter at all. Daisy, Buddy, Smokey and Ginger were the best of friends who loved romping in the yard, going on walks together, and curling up side-by-side at naptime and bedtime. Their unique little family brought joy to all the neighbors who watched their silly antics and special bond. The four friends proved that differences don't matter when you have fun, caring companions to share your days with.%

Embeddings

streaming.py
from langchain_community.embeddings import BedrockEmbeddings
from numpy import dot
from numpy.linalg import norm

#create an Amazon Titan Embeddings client
belc = BedrockEmbeddings()

class EmbedItem:
  def __init__(self, text):
    self.text = text
    self.embedding = belc.embed_query(text)

class ComparisonResult:
  def __init__(self, text, similarity):
    self.text = text
    self.similarity = similarity

def calculate_similarity(a, b): #See Cosine Similarity: https://en.wikipedia.org/wiki/Cosine_similarity
  return dot(a, b) / (norm(a) * norm(b))

#Build the list of embeddings to compare
items = []

with open("items.txt", "r") as f:
  text_items = f.read().splitlines()

for text in text_items:
  items.append(EmbedItem(text))

# compare
for e1 in items:
  print(f"Closest matches for '{e1.text}'")
  print ("----------------")
  cosine_comparisons = []
  
  for e2 in items:
    similarity_score = calculate_similarity(e1.embedding, e2.embedding)
    
    cosine_comparisons.append(ComparisonResult(e2.text, similarity_score)) #save the comparisons to a list
      
  cosine_comparisons.sort(key=lambda x: x.similarity, reverse=True) # list the closest matches first
  
  for c in cosine_comparisons:
    print("%.6f" % c.similarity, "\t", c.text)
  
  print()

output looks good, ranks match scores accordingly:

python3 ./bedrock_embedding.py                                       took  31s  .env at  13:36:19
Closest matches for 'Felines, canines, and rodents'
----------------
000000   Felines, canines, and rodents
872856   Cats, dogs, and mice
599730   Chats, chiens et souris
516598   Lions, tigers, and bears
455923   猫、犬、ネズミ
068916   パン屋への道順を知りたい
061314   パン屋への行き方を教えてください
002239   Can you please tell me how to get to the stadium?
-0.003159   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
-0.007595   Can you please tell me how to get to the bakery?
-0.019469   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
-0.020840   I need directions to the bread shop

Closest matches for 'Can you please tell me how to get to the bakery?'
----------------
000000   Can you please tell me how to get to the bakery?
712236   I need directions to the bread shop
541959   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
484672   Can you please tell me how to get to the stadium?
455479   パン屋への行き方を教えてください
406388   パン屋への道順を知りたい
369163   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
078357   猫、犬、ネズミ
022138   Cats, dogs, and mice
015661   Lions, tigers, and bears
005211   Chats, chiens et souris
-0.007595   Felines, canines, and rodents

Closest matches for 'Lions, tigers, and bears'
----------------
000000   Lions, tigers, and bears
530917   Cats, dogs, and mice
516598   Felines, canines, and rodents
386125   Chats, chiens et souris
337012   猫、犬、ネズミ
068164   I need directions to the bread shop
056721   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
054695   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
042972   パン屋への道順を知りたい
032731   Can you please tell me how to get to the stadium?
021517   パン屋への行き方を教えてください
015661   Can you please tell me how to get to the bakery?

Closest matches for 'Chats, chiens et souris'
----------------
000000   Chats, chiens et souris
669460   Cats, dogs, and mice
599730   Felines, canines, and rodents
498394   猫、犬、ネズミ
386125   Lions, tigers, and bears
299799   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
156950   パン屋への道順を知りたい
131597   パン屋への行き方を教えてください
091534   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
025773   I need directions to the bread shop
005211   Can you please tell me how to get to the bakery?
-0.036810   Can you please tell me how to get to the stadium?

Closest matches for '猫、犬、ネズミ'
----------------
000000   猫、犬、ネズミ
503620   Cats, dogs, and mice
498394   Chats, chiens et souris
487732   パン屋への道順を知りたい
460217   パン屋への行き方を教えてください
455923   Felines, canines, and rodents
337012   Lions, tigers, and bears
162600   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
153400   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
078357   Can you please tell me how to get to the bakery?
063395   I need directions to the bread shop
014240   Can you please tell me how to get to the stadium?

Closest matches for 'Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?'
----------------
000000   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
592948   I need directions to the bread shop
541959   Can you please tell me how to get to the bakery?
530933   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
433526   パン屋への行き方を教えてください
383732   パン屋への道順を知りたい
299799   Chats, chiens et souris
241092   Can you please tell me how to get to the stadium?
153400   猫、犬、ネズミ
056721   Lions, tigers, and bears
031843   Cats, dogs, and mice
-0.019469   Felines, canines, and rodents

Closest matches for 'Kannst du mir bitte sagen, wie ich zur Bäckerei komme?'
----------------
000000   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
530933   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
419582   I need directions to the bread shop
369163   Can you please tell me how to get to the bakery?
360738   パン屋への行き方を教えてください
307116   パン屋への道順を知りたい
270668   Can you please tell me how to get to the stadium?
162600   猫、犬、ネズミ
091534   Chats, chiens et souris
054695   Lions, tigers, and bears
028943   Cats, dogs, and mice
-0.003159   Felines, canines, and rodents

Closest matches for 'パン屋への行き方を教えてください'
----------------
000000   パン屋への行き方を教えてください
895563   パン屋への道順を知りたい
491218   I need directions to the bread shop
460217   猫、犬、ネズミ
455479   Can you please tell me how to get to the bakery?
433526   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
360738   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
220985   Can you please tell me how to get to the stadium?
131597   Chats, chiens et souris
078212   Cats, dogs, and mice
061314   Felines, canines, and rodents
021517   Lions, tigers, and bears

Closest matches for 'パン屋への道順を知りたい'
----------------
000000   パン屋への道順を知りたい
895563   パン屋への行き方を教えてください
487732   猫、犬、ネズミ
466405   I need directions to the bread shop
406388   Can you please tell me how to get to the bakery?
383732   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
307116   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
156950   Chats, chiens et souris
131994   Can you please tell me how to get to the stadium?
101027   Cats, dogs, and mice
068916   Felines, canines, and rodents
042972   Lions, tigers, and bears

Closest matches for 'Can you please tell me how to get to the stadium?'
----------------
000000   Can you please tell me how to get to the stadium?
484672   Can you please tell me how to get to the bakery?
305550   I need directions to the bread shop
270668   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
241092   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
220985   パン屋への行き方を教えてください
131994   パン屋への道順を知りたい
032731   Lions, tigers, and bears
014240   猫、犬、ネズミ
002239   Felines, canines, and rodents
-0.008508   Cats, dogs, and mice
-0.036810   Chats, chiens et souris

Closest matches for 'I need directions to the bread shop'
----------------
000000   I need directions to the bread shop
712236   Can you please tell me how to get to the bakery?
592948   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
491218   パン屋への行き方を教えてください
466405   パン屋への道順を知りたい
419582   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
305550   Can you please tell me how to get to the stadium?
068164   Lions, tigers, and bears
063395   猫、犬、ネズミ
025934   Cats, dogs, and mice
025773   Chats, chiens et souris
-0.020840   Felines, canines, and rodents

Closest matches for 'Cats, dogs, and mice'
----------------
000000   Cats, dogs, and mice
872856   Felines, canines, and rodents
669460   Chats, chiens et souris
530917   Lions, tigers, and bears
503620   猫、犬、ネズミ
101027   パン屋への道順を知りたい
078212   パン屋への行き方を教えてください
031843   Pouvez-vous s'il vous plaît me dire comment me rendre à la boulangerie?
028943   Kannst du mir bitte sagen, wie ich zur Bäckerei komme?
025934   I need directions to the bread shop
022138   Can you please tell me how to get to the bakery?
-0.008508   Can you please tell me how to get to the stadium?

Streamlit

streamlit.py
#all streamlit commands will be available through the "st" alias
import streamlit as st

st.set_page_config(page_title="🔗🦜 Streamlit Demo") #HTML title
st.title("Streamlit Demo") #page title

color_text = st.text_input("What's your favorite color?") #display a text box
go_button = st.button("Go", type="primary") #display a primary button

if go_button:
  #code in this if block will be run when the button is clicked
    st.write(f"I like {color_text} too!") #display the response content

run it with streamlit's command; streamlit run simple_streamlit_app.py --server.port 8080

Model Selection

No hard and fast rules about which model is best for given scenarios, all the ones available on Bedrock seem to do the same-ish thing. Each model will have relative strengths and weaknesses based on its training data, overall size, and training approach.

Current Models

As as April 6, 2024

Provider	Model name	Version	Model ID
Amazon	Titan Text G1 - Express	1.x	amazon.titan-text-express-v1
Amazon	Titan Text G1 - Lite	1.x	amazon.titan-text-lite-v1
Amazon	Titan Embeddings G1 - Text	1.x	amazon.titan-embed-text-v1
Amazon	Titan Multimodal Embeddings G1	1.x	amazon.titan-embed-image-v1
Amazon	Titan Image Generator G1	1.x	amazon.titan-image-generator-v1
Anthropic	Claude	2.0	anthropic.claude-v2
Anthropic	Claude	2.1	anthropic.claude-v2:1
Anthropic	Claude 3 Sonnet	1.0	anthropic.claude-3-sonnet-20240229-v1:0
Anthropic	Claude 3 Haiku	1.0	anthropic.claude-3-haiku-20240307-v1:0
Anthropic	Claude Instant	1.x	anthropic.claude-instant-v1
AI21 Labs	Jurassic-2 Mid	1.x	ai21.j2-mid-v1
AI21 Labs	Jurassic-2 Ultra	1.x	ai21.j2-ultra-v1
Cohere	Command	14.x	cohere.command-text-v14
Cohere	Command Light	15.x	cohere.command-light-text-v14
Cohere	Embed English	3.x	cohere.embed-english-v3
Cohere	Embed Multilingual	3.x	cohere.embed-multilingual-v3
Meta	Llama 2 Chat 13B	1.x	meta.llama2-13b-chat-v1
Meta	Llama 2 Chat 70B	1.x	meta.llama2-70b-chat-v1
Mistral AI	Mistral 7B Instruct	0.x	mistral.mistral-7b-instruct-v0:2
Mistral AI	Mixtral 8X7B Instruct	0.x	mistral.mixtral-8x7b-instruct-v0:1
Mistral AI	Mistral Large	1.x	mistral.mistral-large-2402-v1:0
Stability AI	Stable Diffusion XL	0.x	stability.stable-diffusion-xl-v0
Stability AI	Stable Diffusion XL	1.x	stability.stable-diffusion-xl-v1

Basic patterns

B1 Text Generation

Putting together a streamlit app that does text-to-text generation for us.

Creating 2 x files

text_lib.py # the backend functions
text_app.py # the frontend UI

Backend Functions

text_lib.py
from langchain_community.llms import Bedrock

def get_text_response(input_content): #text-to-text client function

    llm = Bedrock( #create a Bedrock llm client
        model_id="cohere.command-text-v14", #set the foundation model
        model_kwargs={
            "max_tokens": 512,
            "temperature": 0,
            "p": 0.01,
            "k": 0,
            "stop_sequences": [],
            "return_likelihoods": "NONE"
        }
    )
    return llm.invoke(input_content) #return a response to the prompt

The streamlit UI

text_app.py
import streamlit as st
import text_lib as glib

# Titles
st.set_page_config(page_title="Text to Text")
st.title("Text to Text") 

# Inputs
input_text = st.text_area("Input text", label_visibility="collapsed")
go_button = st.button("Go", type="primary")

# Outputs
if go_button:
  #show a spinner while the code in this with block runs
  with st.spinner("Working..."):
    #call the model through the supporting library
    response_content = glib.get_text_response(input_content=input_text)
    #display the response content
    st.write(response_content)

Run it: streamlit run text_app.py --server.port 8080

Success

streamlit ui

B2 Image Generation

Same as text generation, we have a _lib.py file (backend) and an _app.py file (frontend)

image_lib.py
import boto3 #import aws sdk and supporting libraries
import json
import base64
from io import BytesIO

# init client, bedrock id
session = boto3.Session()
bedrock = session.client(service_name='bedrock-runtime') #creates a Bedrock client
bedrock_model_id = "stability.stable-diffusion-xl-v1" #use the Stable Diffusion model

# convert reponse to streamlit can display
def get_response_image_from_payload(response): #returns the image bytes from the model response payload

    payload = json.loads(response.get('body').read()) #load the response body into a json object
    images = payload.get('artifacts') #extract the image artifacts
    image_data = base64.b64decode(images[0].get('base64')) #decode image

    return BytesIO(image_data) #return a BytesIO object for client app consumption

# call bedrock from UI
def get_image_response(prompt_content): #text-to-text client function
    
    request_body = json.dumps({"text_prompts": 
                               [ {"text": prompt_content } ], #prompts to use
                               "cfg_scale": 9, #how closely the model tries to match the prompt
                               "steps": 50, }) #number of diffusion steps to perform
    
    response = bedrock.invoke_model(body=request_body, modelId=bedrock_model_id) #call the Bedrock endpoint
    
    output = get_response_image_from_payload(response) #convert the response payload to a BytesIO object for the client to consume
    
    return output

Frontend

image_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import image_lib as glib #reference to local lib script

st.set_page_config(layout="wide", page_title="Image Generation") #set the page width wider to accommodate columns
st.title("Image Generation") #page title
col1, col2 = st.columns(2) #create 2 columns

with col1: #everything in this with block will be placed in column 1
    st.subheader("Image generation prompt") #subhead for this column    
    prompt_text = st.text_area("Prompt text", height=200, label_visibility="collapsed") #display a multiline text box with no label
    process_button = st.button("Run", type="primary") #display a primary button

with col2: #everything in this with block will be placed in column 2
    st.subheader("Result") #subhead for this column    
    if process_button: #code in this if block will be run when the button is clicked
        with st.spinner("Drawing..."): #show a spinner while the code in this with block runs
            generated_image = glib.get_image_response(prompt_content=prompt_text) #call the model through the supporting library
        st.image(generated_image) #display the generated image

Run it: streamlit run text_app.py --server.port 8080

Success

streamlit ui

B3 RAG

rag_lib.py
from langchain_community.embeddings import BedrockEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.llms import Bedrock

def get_llm():
    
    model_kwargs = { #AI21
        "maxTokens": 1024, 
        "temperature": 0, 
        "topP": 0.5, 
        "stopSequences": [], 
        "countPenalty": {"scale": 0 }, 
        "presencePenalty": {"scale": 0 }, 
        "frequencyPenalty": {"scale": 0 } 
    }
    
    llm = Bedrock(
        model_id="ai21.j2-ultra-v1", #set the foundation model
        model_kwargs=model_kwargs) #configure the properties for Claude
    
    return llm

def get_index(): #creates and returns an in-memory vector store to be used in the application
    
    embeddings = BedrockEmbeddings() #create a Titan Embeddings client
    pdf_path = "2022-Shareholder-Letter.pdf" #assumes local PDF file with this name
    loader = PyPDFLoader(file_path=pdf_path) #load the pdf file
  
    text_splitter = RecursiveCharacterTextSplitter( #create a text splitter
        separators=["\n\n", "\n", ".", " "], #split chunks at (1) paragraph, (2) line, (3) sentence, or (4) word, in that order
        chunk_size=1000, #divide into 1000-character chunks using the separators above
        chunk_overlap=100 #number of characters that can overlap with previous chunk
    )
    
    index_creator = VectorstoreIndexCreator( #create a vector store factory
        vectorstore_cls=FAISS, #use an in-memory vector store for demo purposes
        embedding=embeddings, #use Titan embeddings
        text_splitter=text_splitter, #use the recursive text splitter
    )
    
    index_from_loader = index_creator.from_loaders([loader]) #create an vector store index from the loaded PDF
    
    return index_from_loader #return the index to be cached by the client app

def get_rag_response(index, question): #rag client function
    
    llm = get_llm()  
    response_text = index.query(question=question, llm=llm) #search against the in-memory index, stuff results into a prompt and send to the llm
    
    return response_text

streamlit app rag_app.py

rag_app.py

import streamlit as st #all streamlit commands will be available through the "st" alias
import rag_lib as glib #reference to local lib script

# Titles
st.set_page_config(page_title="Retrieval-Augmented Generation") #HTML title
st.title("Retrieval-Augmented Generation") #page title

# Vector Index
if 'vector_index' not in st.session_state: #see if the vector index hasn't been created yet
    with st.spinner("Indexing document..."): #show a spinner while the code in this with block runs
        st.session_state.vector_index = glib.get_index() #retrieve the index through the supporting library and store in the app's session cache

# Inputs
input_text = st.text_area("Input text", label_visibility="collapsed") #display a multiline text box with no label
go_button = st.button("Go", type="primary") #display a primary button

# Outputs
if go_button: #code in this if block will be run when the button is clicked
    
    with st.spinner("Working..."): #show a spinner while the code in this with block runs
        response_content = glib.get_rag_response(index=st.session_state.vector_index, question=input_text) #call the model through the supporting library
        
        st.write(response_content) #display the response content

add requirements: faiss-cpu

Run it: streamlit run rag_app.py --server.port 8080

Success

RAG

B4 Chatbot

Create backend functions

chatbot_lib.py
from langchain.memory import ConversationSummaryBufferMemory
from langchain_community.chat_models import BedrockChat
from langchain.chains import ConversationChain

# setup LLM
def get_llm():
        
    model_kwargs = { #anthropic
        "max_tokens": 512,
        "temperature": 0, 
        "top_k": 250, 
        "top_p": 1, 
        "stop_sequences": ["\n\nHuman:"] 
    }
    
    llm = BedrockChat(
        model_id="anthropic.claude-3-sonnet-20240229-v1:0", #set the foundation model
        model_kwargs=model_kwargs) #configure the properties for Claude
    
    return llm

# init a langchain memory object
def get_memory(): #create memory for this chat session
    
    #ConversationSummaryBufferMemory requires an LLM for summarizing older messages
    #this allows us to maintain the "big picture" of a long-running conversation
    llm = get_llm()
    
    memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=1024) #Maintains a summary of previous messages
    
    return memory

# call bedrock
def get_chat_response(input_text, memory): #chat client function
    
    llm = get_llm()
    
    conversation_with_summary = ConversationChain( #create a chat client
        llm = llm, #using the Bedrock LLM
        memory = memory, #with the summarization memory
        verbose = True #print out some of the internal states of the chain while running
    )
    
    chat_response = conversation_with_summary.invoke(input_text) #pass the user message and summary to the model
    
    return chat_response['response']

Setup frontend UI

chatbot_app.py

import streamlit as st #all streamlit commands will be available through the "st" alias
import chatbot_lib as glib #reference to local lib script

# titles

st.set_page_config(page_title="Chatbot") #HTML title
st.title("Chatbot") #page title

# add langchain memory to session cache

if 'memory' not in st.session_state: #see if the memory hasn't been created yet
    st.session_state.memory = glib.get_memory() #initialize the memory

# add ui chat history to session cache

if 'chat_history' not in st.session_state: #see if the chat history hasn't been created yet
    st.session_state.chat_history = [] #initialize the chat history

# render previous chat using a loop

if 'chat_history' not in st.session_state: #see if the chat history hasn't been created yet
    st.session_state.chat_history = [] #initialize the chat history

# Inputs

input_text = st.chat_input("Chat with your bot here") #display a chat input box

if input_text: #run the code in this if block after the user submits a chat message
    
    with st.chat_message("user"): #display a user chat message
        st.markdown(input_text) #renders the user's latest message
    
    st.session_state.chat_history.append({"role":"user", "text":input_text}) #append the user's latest message to the chat history
    
    chat_response = glib.get_chat_response(input_text=input_text, memory=st.session_state.memory) #call the model through the supporting library
    
    with st.chat_message("assistant"): #display a bot chat message
        st.markdown(chat_response) #display bot's latest response
    
    st.session_state.chat_history.append({"role":"assistant", "text":chat_response}) #append the bot's latest message to the chat history

add requirements: anthropic

Run it: streamlit run chatbot_app.py --server.port 8080

Success

chatbot

Text Patterns

T1 Chatbot RAG

Backend functions

from langchain.memory import ConversationBufferWindowMemory
from langchain_community.chat_models import BedrockChat
from langchain.chains import ConversationalRetrievalChain

from langchain_community.embeddings import BedrockEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# setup llm
def get_llm():
        
    model_kwargs = { #anthropic
        "max_tokens": 512,
        "temperature": 0, 
        "top_k": 250, 
        "top_p": 1, 
        "stop_sequences": ["\n\nHuman:"] 
    }
    
    llm = BedrockChat(
        model_id="anthropic.claude-3-sonnet-20240229-v1:0", #set the foundation model
        model_kwargs=model_kwargs) #configure the properties for Claude
    
    return llm

# in-memory vector store

def get_index(): #creates and returns an in-memory vector store to be used in the application
    
    embeddings = BedrockEmbeddings() #create a Titan Embeddings client
    
    pdf_path = "2022-Shareholder-Letter.pdf" #assumes local PDF file with this name

    loader = PyPDFLoader(file_path=pdf_path) #load the pdf file
    
    text_splitter = RecursiveCharacterTextSplitter( #create a text splitter
        separators=["\n\n", "\n", ".", " "], #split chunks at (1) paragraph, (2) line, (3) sentence, or (4) word, in that order
        chunk_size=1000, #divide into 1000-character chunks using the separators above
        chunk_overlap=100 #number of characters that can overlap with previous chunk
    )
    
    index_creator = VectorstoreIndexCreator( #create a vector store factory
        vectorstore_cls=FAISS, #use an in-memory vector store for demo purposes
        embedding=embeddings, #use Titan embeddings
        text_splitter=text_splitter, #use the recursive text splitter
    )
    
    index_from_loader = index_creator.from_loaders([loader]) #create an vector store index from the loaded PDF
    
    return index_from_loader #return the index to be cached by the client app

# init langchain memory object

def get_memory(): #create memory for this chat session
    
    memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=True) #Maintains a history of previous messages
    
    return memory

# call bedrock
def get_rag_chat_response(input_text, memory, index): #chat client function
    
    llm = get_llm()
    
    conversation_with_retrieval = ConversationalRetrievalChain.from_llm(llm, index.vectorstore.as_retriever(), memory=memory, verbose=True)
    
    chat_response = conversation_with_retrieval.invoke({"question": input_text}) #pass the user message and summary to the model
    
    return chat_response['answer']

Frontend UI

rag_chatbot_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import rag_chatbot_lib as glib #reference to local lib script

# titles
st.set_page_config(page_title="RAG Chatbot") #HTML title
st.title("RAG Chatbot") #page title

# add langchain memory to session cache
if 'memory' not in st.session_state: #see if the memory hasn't been created yet
    st.session_state.memory = glib.get_memory() #initialize the memory

# add UI history to session cache
if 'chat_history' not in st.session_state: #see if the chat history hasn't been created yet
    st.session_state.chat_history = [] #initialize the chat history

# add vector index to session cache
if 'vector_index' not in st.session_state: #see if the vector index hasn't been created yet
    with st.spinner("Indexing document..."): #show a spinner while the code in this with block runs
        st.session_state.vector_index = glib.get_index() #retrieve the index through the supporting library and store in the app's session cache

# Output - render chat history
#Re-render the chat history (Streamlit re-runs this script, so need this to preserve previous chat messages)
for message in st.session_state.chat_history: #loop through the chat history
    with st.chat_message(message["role"]): #renders a chat line for the given role, containing everything in the with block
        st.markdown(message["text"]) #display the chat content

# Inputs
input_text = st.chat_input("Chat with your bot here") #display a chat input box

if input_text: #run the code in this if block after the user submits a chat message
    
    with st.chat_message("user"): #display a user chat message
        st.markdown(input_text) #renders the user's latest message
    
    st.session_state.chat_history.append({"role":"user", "text":input_text}) #append the user's latest message to the chat history
    
    chat_response = glib.get_rag_chat_response(input_text=input_text, memory=st.session_state.memory, index=st.session_state.vector_index,) #call the model through the supporting library
    
    with st.chat_message("assistant"): #display a bot chat message
        st.markdown(chat_response) #display bot's latest response
    
    st.session_state.chat_history.append({"role":"assistant", "text":chat_response}) #append the bot's latest message to the chat history

add requirements: anthropic

Run it: streamlit run chatbot_app.py --server.port 8080

Success

chatbot

T2 Doc Summary

Backend functions

summarization_lib.py
from langchain.prompts import PromptTemplate
from langchain_community.llms import Bedrock
from langchain.chains.summarize import load_summarize_chain
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader

# setup llm
def get_llm():
    
    model_kwargs = { #AI21
        "maxTokens": 8000, 
        "temperature": 0, 
        "topP": 0.5, 
        "stopSequences": [], 
        "countPenalty": {"scale": 0 }, 
        "presencePenalty": {"scale": 0 }, 
        "frequencyPenalty": {"scale": 0 } 
    }
    
    llm = Bedrock(
        model_id="ai21.j2-ultra-v1", #set the foundation model
        model_kwargs=model_kwargs) #configure the properties for Claude
    
    return llm

# create doc chunks of PDF
pdf_path = "2022-Shareholder-Letter.pdf"

def get_docs():
    
    loader = PyPDFLoader(file_path=pdf_path)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n", ".", " "], chunk_size=4000, chunk_overlap=100 
    )
    docs = text_splitter.split_documents(documents=documents)
    
    return docs

# call bedrock
def get_summary(return_intermediate_steps=False):
    
    map_prompt_template = "{text}\n\nWrite a few sentences summarizing the above:"
    map_prompt = PromptTemplate(template=map_prompt_template, input_variables=["text"])
    
    combine_prompt_template = "{text}\n\nWrite a detailed analysis of the above:"
    combine_prompt = PromptTemplate(template=combine_prompt_template, input_variables=["text"])
    
    llm = get_llm()
    docs = get_docs()
    
    chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=map_prompt, combine_prompt=combine_prompt, return_intermediate_steps=return_intermediate_steps)
    
    if return_intermediate_steps:
        return chain.invoke({"input_documents": docs}, return_only_outputs=True)
    else:
        return chain.invoke(docs, return_only_outputs=True)

Frontend UI

summarization_app
import streamlit as st
import summarization_lib as glib

# titles
st.set_page_config(page_title="Document Summarization")
st.title("Document Summarization")

# summarisation elements
return_intermediate_steps = st.checkbox("Return intermediate steps", value=True)
summarize_button = st.button("Summarize", type="primary")


if summarize_button:
    st.subheader("Combined summary")
    with st.spinner("Running..."):
        response_content = glib.get_summary(return_intermediate_steps=return_intermediate_steps)


    if return_intermediate_steps:
        st.write(response_content["output_text"])
        st.subheader("Section summaries")

        for step in response_content["intermediate_steps"]:
            st.write(step)
            st.markdown("---")
    else:
        st.write(response_content["output_text"])

add requirements: transformers

Run it: streamlit run summarization_app.py --server.port 8080

Success

docsumm

T3 Response Streaming

Backend functions

streaming_lib.py
#imports
from langchain.chains import ConversationChain
from langchain_community.llms import Bedrock

# setup llm
def get_llm(streaming_callback):
    model_kwargs = {
        "max_tokens": 4000,
        "temperature": 0,
        "p": 0.01,
        "k": 0,
        "stop_sequences": [],
        "return_likelihoods": "NONE",
        "stream": True
    }
    
    llm = Bedrock(
        model_id="cohere.command-text-v14",
        model_kwargs=model_kwargs,
        streaming=True,
        callbacks=[streaming_callback],
    )
    
    return llm

# call bedrock, stream response
def get_streaming_response(prompt, streaming_callback):
    conversation_with_summary = ConversationChain(
        llm=get_llm(streaming_callback)
    )
    return conversation_with_summary.predict(input=prompt)

Frontend UI

streaming_app.py
import streaming_lib as glib  # reference to local lib script
import streamlit as st
from langchain_community.callbacks.streamlit import StreamlitCallbackHandler # <<<<<

# titles
st.set_page_config(page_title="Response Streaming")  # HTML title
st.title("Response Streaming")  # page title

# Inputs
input_text = st.text_area("Input text", label_visibility="collapsed")
go_button = st.button("Go", type="primary")  # display a primary button

# Outputs
if go_button:  # code in this if block will be run when the button is clicked
    #use an empty container for streaming output
    st_callback = StreamlitCallbackHandler(st.container())
    streaming_response = glib.get_streaming_response(prompt=input_text, streaming_callback=st_callback)

add requirements: anthropic

Run it: streamlit run streaming_app.py --server.port 8080

Success

response

T4 Embeddings Search

This is similar to RAG setup, with one important distinction- the user query is a "search" of the vector database, and not generating a new result.

Note we're using in-memory FAISS vectorstore, in real world we'd use something more persistent.

Backend functions

embeddings_search_lib
#imports
from langchain_community.embeddings import BedrockEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders.csv_loader import CSVLoader

# create in-memory store
def get_index(): #creates and returns an in-memory vector store to be used in the application
    
    embeddings = BedrockEmbeddings() #create a Titan Embeddings client
    
    loader = CSVLoader(file_path="sagemaker_answers.csv")

    index_creator = VectorstoreIndexCreator(
        vectorstore_cls=FAISS,
        embedding=embeddings,
        text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),
    )

    index_from_loader = index_creator.from_loaders([loader])
    
    return index_from_loader

# call bedrock
def get_similarity_search_results(index, question):
    results = index.vectorstore.similarity_search_with_score(question)
    
    flattened_results = [{"content":res[0].page_content, "score":res[1]} for res in results] #flatten results for easier display and handling
    
    return flattened_results

# get embeddings
def get_embedding(text):
    embeddings = BedrockEmbeddings() #create a Titan Embeddings client
    
    return embeddings.embed_query(text)

Frontend UI

embeddings_search_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import embeddings_search_lib as glib #reference to local lib script

# titles
st.set_page_config(page_title="Embeddings Search", layout="wide") #HTML title
st.title("Embeddings Search") #page title

# add vector index to session cache
if 'vector_index' not in st.session_state: #see if the vector index hasn't been created yet
    with st.spinner("Indexing document..."): #show a spinner while the code in this with block runs
        st.session_state.vector_index = glib.get_index() #retrieve the index through the supporting library and store in the app's session cache

# inputs
input_text = st.text_input("Ask a question about Amazon SageMaker:") #display a multiline text box with no label
go_button = st.button("Go", type="primary") #display a primary button

# outputs
if go_button: #code in this if block will be run when the button is clicked
    
    with st.spinner("Working..."): #show a spinner while the code in this with block runs
        response_content = glib.get_similarity_search_results(index=st.session_state.vector_index, question=input_text)
        
        st.table(response_content) #using table so text will wrap
        
        
        raw_embedding = glib.get_embedding(input_text)
        
        with st.expander("View question embedding"):
            st.json(raw_embedding)

add requirements: anthropic

Run it: streamlit run embeddings_search_app.py --server.port 8080

Success

embeddings1

check out the embeddings values

embeddings2

T5 Personalised Recommendations

in a nutshell, user query -> RAG match query -> results go to LLM for "personalised summary".

Backend functions

recommendations_lib.py
from langchain_community.llms import Bedrock
from langchain_community.embeddings import BedrockEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import JSONLoader

# setup llm
def get_llm():
    
    model_kwargs = { #AI21
        "maxTokens": 1024, 
        "temperature": 0, 
        "topP": 0.5, 
        "stopSequences": [], 
        "countPenalty": {"scale": 0 }, 
        "presencePenalty": {"scale": 0 }, 
        "frequencyPenalty": {"scale": 0 } 
    }
    
    llm = Bedrock(
        model_id="ai21.j2-ultra-v1", #set the foundation model
        model_kwargs=model_kwargs) #configure the properties for Claude
    
    return llm

#function to identify the metadata to capture in the vectorstore and return along with the matched content
def item_metadata_func(record: dict, metadata: dict) -> dict: 

    metadata["name"] = record.get("name")
    metadata["url"] = record.get("url")

    return metadata

# in memory vectory store
def get_index(): #creates and returns an in-memory vector store to be used in the application
    
    embeddings = BedrockEmbeddings() #create a Titan Embeddings client
    
    loader = JSONLoader(
        file_path="services.json",
        jq_schema='.[]',
        content_key='description',
        metadata_func=item_metadata_func)

    text_splitter = RecursiveCharacterTextSplitter( #create a text splitter
        separators=["\n\n", "\n", ".", " "], #split chunks at (1) paragraph, (2) line, (3) sentence, or (4) word, in that order
        chunk_size=8000, #based on this content, we just want the whole item so no chunking - this could lead to an error if the content is too long
        chunk_overlap=0 #number of characters that can overlap with previous chunk
    )
    
    index_creator = VectorstoreIndexCreator( #create a vector store factory
        vectorstore_cls=FAISS, #use an in-memory vector store for demo purposes
        embedding=embeddings, #use Titan embeddings
        text_splitter=text_splitter, #use the recursive text splitter
    )
    
    index_from_loader = index_creator.from_loaders([loader]) #create an vector store index from the loaded PDF
    
    return index_from_loader #return the index to be cached by the client app

# call bedrock
def get_similarity_search_results(index, question):
    raw_results = index.vectorstore.similarity_search_with_score(question)
    
    llm = get_llm()
    
    results = []
    
    for res in raw_results:
        content = res[0].page_content
        prompt = f"{content}\n\nSummarize how the above service addresses the following needs : {question}"
        
        summary = llm.invoke(prompt)
        
        results.append({"name": res[0].metadata["name"], "url": res[0].metadata["url"], "summary": summary, "original": content})
    
    return results

Frontend UI

recommendations_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import recommendations_lib as glib #reference to local lib script

# titles
st.set_page_config(page_title="Personalized Recommendations", layout="wide") #HTML title
st.title("Personalized Recommendations") #page title

# add vector index to session cache
if 'vector_index' not in st.session_state: #see if the vector index hasn't been created yet
    with st.spinner("Indexing document..."): #show a spinner while the code in this with block runs
        st.session_state.vector_index = glib.get_index() #retrieve the index through the supporting library and store in the app's session cache

# add inputs
input_text = st.text_input("Name some key features you need from a cloud service:") #display a multiline text box with no label
go_button = st.button("Go", type="primary") #display a primary button

# add outputs
if go_button: #code in this if block will be run when the button is clicked
    
    with st.spinner("Working..."): #show a spinner while the code in this with block runs
        response_content = glib.get_similarity_search_results(index=st.session_state.vector_index, question=input_text)
        
        for result in response_content:
            st.markdown(f"### [{result['name']}]({result['url']})")
            st.write(result['summary'])
            with st.expander("Original"):
                st.write(result['original'])

add requirements: jq

Run it: streamlit run recommendations_app.py --server.port 8080

Success

You can see the recommendation summary compared to the full service documentation in the 'Original' section.

personal

T6 Extract JSON

Backend functions

json_lib.py
import json
from json import JSONDecodeError
from langchain_community.llms import Bedrock

# get llm
def get_llm():

    llm = Bedrock( #create a Bedrock llm client
        model_id="ai21.j2-ultra-v1", #use the AI21 Jurassic-2 Ultra model
        model_kwargs = {"maxTokens": 1024, "temperature": 0.0 } #for data extraction, minimum temperature is best
    )

    return llm

# convert to JSON
def validate_and_return_json(response_text):
    try:
        response_json = json.loads(response_text) #attempt to load text into JSON
        return False, response_json, None #returns has_error, response_content, err 
    
    except JSONDecodeError as err:
        return True, response_text, err #returns has_error, response_content, err 

# call bedrock
def get_json_response(input_content): #text-to-text client function
    
    llm = get_llm()

    response = llm.invoke(input_content) #the text response for the prompt
    
    return validate_and_return_json(response)

Frontend UI

json_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import json_lib as glib #reference to local lib script

# titles
st.set_page_config(page_title="Text to JSON", layout="wide")  #set the page width wider to accommodate columns
st.title("Text to JSON")  #page title
col1, col2 = st.columns(2)  #create 2 columns

# inputs, col layout left
with col1: #everything in this with block will be placed in column 1
    st.subheader("Prompt") #subhead for this column
    
    input_text = st.text_area("Input text", height=500, label_visibility="collapsed")

    process_button = st.button("Run", type="primary") #display a primary button

# output, col layout right
with col2: #everything in this with block will be placed in column 2
    st.subheader("Result") #subhead for this column
    
    if process_button: #code in this if block will be run when the button is clicked
        with st.spinner("Running..."): #show a spinner while the code in this with block runs
            has_error, response_content, err = glib.get_json_response(input_content=input_text) #call the model through the supporting library

        if not has_error:
            st.json(response_content) #render JSON if there was no error
        else:
            st.error(err) #otherwise render the error
            st.write(response_content) #and render the raw response from the model

add requirements:

Run it: streamlit run json_app.py --server.port 8080

Success

JSON

T7 Text to CSV

Backend functions

csv_lib.py
import pandas as pd
from io import StringIO
from langchain_community.llms import Bedrock

# setup llm
def get_llm():

    llm = Bedrock( #create a Bedrock llm client
        model_id="ai21.j2-ultra-v1", #use the AI21 Jurassic-2 Ultra model
        model_kwargs = {"maxTokens": 1024, "temperature": 0.0 } #for data extraction, minimum temperature is best
    )

    return llm

# convert result to pandas dataframe
def validate_and_return_csv(response_text):
    #returns has_error, response_content, err 
    try:
        csv_io = StringIO(response_text)
        return False, pd.read_csv(csv_io), None #attempt to load response CSV into a dataframe
    
    except Exception as err:
        return True, response_text, err

# call bedrock
def get_csv_response(input_content): #text-to-text client function
    
    llm = get_llm()

    response = llm.invoke(input_content) #the text response for the prompt
    
    return validate_and_return_csv(response)

Frontend UI

csv_app.py
import streamlit as st #all streamlit commands will be available through the "st" alias
import csv_lib as glib #reference to local lib script

# titles
st.set_page_config(page_title="Text to CSV", layout="wide")  #set the page width wider to accommodate columns
st.title("Text to CSV")  #page title
col1, col2 = st.columns(2)  #create 2 columns

# inputs
with col1: #everything in this with block will be placed in column 1
    st.subheader("Prompt") #subhead for this column
    
    input_text = st.text_area("Input text", height=500, label_visibility="collapsed")

    process_button = st.button("Run", type="primary") #display a primary button

# outputs col layout, result table top, raw data bottom

with col2: #everything in this with block will be placed in column 2
    st.subheader("Result") #subhead for this column
    
    if process_button: #code in this if block will be run when the button is clicked
        with st.spinner("Running..."): #show a spinner while the code in this with block runs
            has_error, response_content, err = glib.get_csv_response(input_content=input_text) #call the model through the supporting library
        
        if not has_error:
            st.dataframe(response_content)
            
            csv_content = response_content.to_csv(index = False)
            
            st.markdown("#### Raw CSV")
            st.text(csv_content)
            
        else:
            st.error(err)
            st.write(response_content)

add requirements: anthropic

Run it: streamlit run csv_app.py --server.port 8080

Success

CSV

Troubleshooting

Error Messages

When I tried python3 ./params.py "anthropic.claude-v2" "Write a haiku:"

I got this error:

Traceback (most recent call last):
  File "/home/rxhackk/.local/lib/python3.10/site-packages/langchain_community/llms/bedrock.py", line 444, in _prepare_input_and_invoke
    response = self.client.invoke_model(**request_options)
  File "/home/rxhackk/.local/lib/python3.10/site-packages/botocore/client.py", line 553, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/rxhackk/.local/lib/python3.10/site-packages/botocore/client.py", line 1009, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Malformed input request: #: extraneous key [max_tokens] is not permitted, please reformat your input and try again.

As at April 6th, 2024 the models parameters have been updated (per docs) to the following:

{
  "modelId": "anthropic.claude-v2:1",
  "contentType": "application/json",
  "accept": "*/*",
  "body": "{\"prompt\":\"\\n\\nHuman: Hello world\\n\\nAssistant:\",\"max_tokens_to_sample\":300,\"temperature\":0.5,\"top_k\":250,\"top_p\":1,\"stop_sequences\":[\"\\n\\nHuman:\"],\"anthropic_version\":\"bedrock-2023-05-31\"}"
}

I tested the other models, and their default params haven't changed:

cohere
/workshop/labs/params ❯ python3 ./params.py "cohere.command-text-v14" "Write a haiku:"                     .env at  12:11:23
 Haiku is a form of Japanese poetry that consists of three lines. The first line has five syllables, the second line has seven syllables, and the third line has five syllables. Here is an example of a haiku:

Spring rain opening
the silent flowers after
a cold, dry winter

Would you like me to write another haiku for you? 

meta
/workshop/labs/params ❯ python3 ./params.py "meta.llama2-13b-chat-v1" "Write a haiku:"          took  4s  .env at  12:16:26
The sun sets slowly
Golden hues upon the sea
Peaceful evening sky

mistral
/workshop/labs/params ❯ python3 ./params.py "mistral.mistral-7b-instruct-v0:2" "Write a haiku:"            .env at  12:17:09

Autumn leaves fall slow
Whispers of the wind’s song
Nature’s symphony

Haiku is a form of traditional Japanese poetry. It consists of three lines with a 5-7-5 syllable count. The haiku should capture a moment in nature and convey a sense of seasonality and imagery. In this haiku, I have tried to capture the feeling of autumn leaves falling slowly and the sound of the wind as it rustles through them. The phrase "Nature's symphony" is used to emphasize the beauty and harmony of the natural world during this season.

amazon
python3 ./params.py "amazon.titan-text-express-v1" "Write a haiku:"                                                                                          .env at  12:17:31
I am a
I am a bookworm
I read a lot

Enjoying the docs? Good.

The docs are the how. The newsletter is the what... as in, what the f*** — AI, power, Big Tech and the tech industry, through a Pasifika lens, from an engineer who's spent twenty-plus years working inside the machine. Fortnightly. No filter.

Leave whenever.

Get the newsletter →

Running in my own AWS account​

Enable Bedrock​

AWS Cloud9 setup​

Local Setup​

Foundational Concepts​

API​

Langchain​

Inference Parameters​

Control Response Variability​

Streaming API​

Embeddings​

Streamlit​

Model Selection​

Basic patterns​

B1 Text Generation​

B2 Image Generation​

B3 RAG​

B4 Chatbot​

Text Patterns​

T1 Chatbot RAG​

T2 Doc Summary​

T3 Response Streaming​

T4 Embeddings Search​

T5 Personalised Recommendations​

T6 Extract JSON​

T7 Text to CSV​

Troubleshooting​

Error Messages​

Running in my own AWS account

Enable Bedrock

AWS Cloud9 setup

Local Setup

Foundational Concepts

API

Langchain

Inference Parameters

Control Response Variability

Streaming API

Embeddings

Streamlit

Model Selection

Basic patterns

B1 Text Generation

B2 Image Generation

B3 RAG

B4 Chatbot

Text Patterns

T1 Chatbot RAG

T2 Doc Summary

T3 Response Streaming

T4 Embeddings Search

T5 Personalised Recommendations

T6 Extract JSON

T7 Text to CSV

Troubleshooting

Error Messages