Skip to main content

IBM watsonx.ai

LiteLLM supports all IBM watsonx.ai foundational models and embeddings.

Environment Variables

os.environ["WATSONX_URL"] = ""  # (required) Base URL of your WatsonX instance
# (required) either one of the following:
os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
os.environ["WATSONX_TOKEN"] = "" # IAM auth token
# optional - can also be passed as params to completion() or embedding()
os.environ["WATSONX_PROJECT_ID"] = "" # Project ID of your WatsonX instance
os.environ["WATSONX_DEPLOYMENT_SPACE_ID"] = "" # ID of your deployment space to use deployed models

See here for more information on how to get an access token to authenticate to watsonx.ai.

Usage

Open In Colab
import os
from litellm import completion

os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""

response = completion(
  model="watsonx/ibm/granite-13b-chat-v2",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  project_id="<my-project-id>" # or pass with os.environ["WATSONX_PROJECT_ID"]
)

response = completion(
  model="watsonx/meta-llama/llama-3-8b-instruct",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  project_id="<my-project-id>"
)

Usage - Streaming

import os
from litellm import completion

os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
os.environ["WATSONX_PROJECT_ID"] = ""

response = completion(
  model="watsonx/ibm/granite-13b-chat-v2",
  messages=[{ "content": "what is your favorite colour?","role": "user"}],
  stream=True
)
for chunk in response:
  print(chunk)

Example Streaming Output Chunk

{
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "delta": {
        "content": "I don't have a favorite color, but I do like the color blue. What's your favorite color?"
      }
    }
  ],
  "created": null,
  "model": "watsonx/ibm/granite-13b-chat-v2",
  "usage": {
    "prompt_tokens": null,
    "completion_tokens": null,
    "total_tokens": null
  }
}

Usage - Models in deployment spaces

Models that have been deployed to a deployment space (e.g.: tuned models) can be called using the deployment/<deployment_id> format (where <deployment_id> is the ID of the deployed model in your deployment space).

The ID of your deployment space must also be set in the environment variable WATSONX_DEPLOYMENT_SPACE_ID or passed to the function as space_id=<deployment_space_id>.

import litellm
response = litellm.completion(
    model="watsonx/deployment/<deployment_id>",
    messages=[{"content": "Hello, how are you?", "role": "user"}],
    space_id="<deployment_space_id>"
)

Usage - Embeddings

LiteLLM also supports making requests to IBM watsonx.ai embedding models. The credential needed for this is the same as for completion.

from litellm import embedding

response = embedding(
    model="watsonx/ibm/slate-30m-english-rtrvr",
    input=["What is the capital of France?"],
    project_id="<my-project-id>"
)
print(response)
# EmbeddingResponse(model='ibm/slate-30m-english-rtrvr', data=[{'object': 'embedding', 'index': 0, 'embedding': [-0.037463713, -0.02141933, -0.02851813, 0.015519324, ..., -0.0021367231, -0.01704561, -0.001425816, 0.0035238306]}], object='list', usage=Usage(prompt_tokens=8, total_tokens=8))

OpenAI Proxy Usage

Here's how to call IBM watsonx.ai with the LiteLLM Proxy Server

1. Save keys in your environment

export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_PROJECT_ID=""

2. Start the proxy

$ litellm --model watsonx/meta-llama/llama-3-8b-instruct

# Server running on http://0.0.0.0:4000

3. Test it

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "llama-3-8b",
      "messages": [
        {
          "role": "user",
          "content": "what is your favorite colour?"
        }
      ]
    }
'

Authentication

Passing credentials as parameters

You can also pass the credentials as parameters to the completion and embedding functions.

import os
from litellm import completion

response = completion(
            model="watsonx/ibm/granite-13b-chat-v2",
            messages=[{ "content": "What is your favorite color?","role": "user"}],
            url="",
            api_key="",
            project_id=""
)

Supported IBM watsonx.ai Models

Here are some examples of models available in IBM watsonx.ai that you can use with LiteLLM:

Mode NameCommand
Flan T5 XXLcompletion(model=watsonx/google/flan-t5-xxl, messages=messages)
Flan Ul2completion(model=watsonx/google/flan-ul2, messages=messages)
Mt0 XXLcompletion(model=watsonx/bigscience/mt0-xxl, messages=messages)
Gpt Neoxcompletion(model=watsonx/eleutherai/gpt-neox-20b, messages=messages)
Mpt 7B Instruct2completion(model=watsonx/ibm/mpt-7b-instruct2, messages=messages)
Starcodercompletion(model=watsonx/bigcode/starcoder, messages=messages)
Llama 2 70B Chatcompletion(model=watsonx/meta-llama/llama-2-70b-chat, messages=messages)
Llama 2 13B Chatcompletion(model=watsonx/meta-llama/llama-2-13b-chat, messages=messages)
Granite 13B Instructcompletion(model=watsonx/ibm/granite-13b-instruct-v1, messages=messages)
Granite 13B Chatcompletion(model=watsonx/ibm/granite-13b-chat-v1, messages=messages)
Flan T5 XLcompletion(model=watsonx/google/flan-t5-xl, messages=messages)
Granite 13B Chat V2completion(model=watsonx/ibm/granite-13b-chat-v2, messages=messages)
Granite 13B Instruct V2completion(model=watsonx/ibm/granite-13b-instruct-v2, messages=messages)
Elyza Japanese Llama 2 7B Instructcompletion(model=watsonx/elyza/elyza-japanese-llama-2-7b-instruct, messages=messages)
Mixtral 8X7B Instruct V01 Qcompletion(model=watsonx/ibm-mistralai/mixtral-8x7b-instruct-v01-q, messages=messages)

For a list of all available models in watsonx.ai, see here.

Supported IBM watsonx.ai Embedding Models

Model NameFunction Call
Slate 30membedding(model="watsonx/ibm/slate-30m-english-rtrvr", input=input)
Slate 125membedding(model="watsonx/ibm/slate-125m-english-rtrvr", input=input)

For a list of all available embedding models in watsonx.ai, see here.