Understanding Openai API Token & Request Limits
Building your first large language model-based application.
Are you embarking on your journey to create your first large language model-based application? You’re welcome :) ! This post will guide you through the basics and setup needed to tap into the OpenAI model server.
Join Semis today to network in AI & Bigtech
The Setup
Once you create your Openai platform account to make use of their API navigate to the settings>>limits section — you’ll likely encounter something like this.
Understanding Usage Tiers
Openai currently has 6 tiers of customer categories API access. New accounts default to the free tier. Keep a keen eye on the usage tier as it significantly impacts and limits your API consumption during development.
Each tier has its designated rate limit, a crucial consideration in your product development journey.
Selecting your Model -LLM
Your choice of model for the project plays a pivotal role in determining the rate limit and development timeline, especially if you’re crafting a proof of concept.
Join Semis today to network in AI & Bigtech
In this medium post, I shared some strategies to help you navigate the model selection process.
Differences between Token Limit & Request Limit
Tokens in natural language processing represent the breakdown of text into words or letters. To illustrate, take a sentence like;
Adebisi is a Teacher in my secondary school in Kano, Nigeria.
You can break this into tokens like; words or letters;
“Adebisi” can be “bisi” or we say “A”, “d”, “e”, “b”, “s”, “i” and so on.
With this understanding, your application design should chunk documents into layers of words, all while keeping an eye on the token volume. Why? Depending on your user tier on OpenAI, there are token limits.
Imagine deploying your app and constantly hitting token limit errors — no one wants that!
Join Semis today to network in AI & Bigtech
The Difference — token limit & request limit
The token limit refers to the amount or volume of words (broken down for analysis) your application intends to send to the OpenAI server.
Conversely, the request limit denotes the number of calls your application can make to the API server. Simple, right? However, these request limits could make or break your application’s success during development or production. Request limit, in other words, is the maximum number of API calls per a given time your application can make.
What’s Your Play?
First things first, decide on the user tier for your project. Each tier has caps on token limits (tpm- token per minute) and request limits (rpm/rpd — request per day or minute)
Next up, decide on the model type. Take a glance at the image above; notice how each model has different tpm and rpm/rpd?
Choose one that aligns with your project’s token request needs.
Controlling the Request & Token Limits
Yes, you should design and develop your language model application to send several requests per minute (for example, 30 requests per minute), with built-in waits or retries when it hits the limit. Always experiment.
Alternatively, you can design your application to gracefully handle set token limits — returning a statement or retrying when needed.
Join Semis today to network in AI & Bigtech
Friendly Reminder: Always check your bill settings. It’s easy to run out of OpenAI credits due to requests and token limits, leaving you confused about failing API calls.
Let’s Connect!
Thanks a lot for reading. If you’re on the lookout for a Machine Learning Engineer consultant or AI Engineer consultant, don’t hesitate to reach out. Book a session with me — — let’s dive into the exciting world of AI together.
You may also buy me a coffee to support my work.
Join Semis today to network in AI & Bigtech
Tolulade Ademisoye, ML Engineer & Consultant