Lately, I have been working on a project involving the use of the Gemini API for one of my clients. Since I am on the list of early testers of the Gemini Pro 1.5, I was attempting to implement the system using Gemini 1.5 Pro instead of Gemini 1.0 Pro. The function relying on the Gemini AI API was related to extracting key insights (like a short summary) from the long text of books.
That said, I noticed something a bit odd. Indeed, the documentation for Gemini on Google’s site (linked here) indicates that the Gemini 1.5 Pro model (preview only) has a maximum token limit of 1,048,576 and a maximum output of 8,192 tokens.
However, during my tests using normal parameters (code attached), I noticed that the model indeed does not accept more than 10,000 characters, which are roughly equivalent to +/- 5,000 tokens or less. Indeed, the model responds with an error:
import google.generativeai as genai
GOOGLE_API_KEY = "XXXXXXXX" #Your API code connected to Google Cloud project
genai.configure(api_key=GOOGLE_API_KEY)
# Set up the model
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_ONLY_HIGH"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_ONLY_HIGH"
},
]
genai.configure(api_key=GOOGLE_API_KEY)
# Set up the model
generation_config = {
"temperature": 0.7,
"top_p": 1,
"top_k": 1,
"max_output_tokens": 2000,
}
model_name = "gemini-1.5-pro-latest" #or gemini-pro
model = genai.GenerativeModel(model_name=model_name,
generation_config=generation_config,
safety_settings=safety_settings)
messages = [
guidelines,
content,
]
response = model.generate_content(messages)
Error: “The response.text quick accessor only works for simple (single-part) text responses. This response is not simple text. Use the result.parts accessor or the full result.candidates[index].content.parts lookup instead.”
This is a really strange and perplexing situation. Because when I try the same prompt but with ‘trimmed’ text, e.g., 6,000 characters, it sometimes works. In doing some research, I found that the same error is appearing for other users (here, here, and here).
Here is the intriguing question: Can the Gemini model ‘really’ process the so-called 1,048,576? If yes, how? Is it an API backend problem or a byte encoding issue (like those faced with GPT-3 Turbo)? For now, I haven’t found any answers to these questions, though I am still experimenting with this.
The temporary fix for this is to divide the long text into multiple sub-chunks with a maximum of around 5,000 tokens. However, on the other hand, the Gemini API is still limited to 2 requests per minute and 5 RPM for paid users, so you can’t really push this to production.
def text_to_binary_string(text):
return ''.join(format(ord(char), '08b') for char in text)
def get_chunked_transcript(text, limit=1000):
words = text.split()
chunked_text = ''
current_bytes = 0
for word in words:
word_bytes = len(text_to_binary_string(word + ' ')) # Including space in the byte count
if current_bytes + word_bytes <= limit:
chunked_text += word + ' '
current_bytes += word_bytes
else:
# If adding the next word exceeds the limit, check if we can add a portion of it
if word_bytes > limit:
for i in range(len(word)):
if current_bytes + len(text_to_binary_string(word[:i + 1] + ' ')) > limit:
break
chunked_text += word[i]
break # Exit the loop after adding a portion of the word
else:
break # Exit the loop if no additional words can be added without exceeding the limit
return chunked_text.strip()
We can also implement similar approch using Langchain RecursiveCharacterTextSplitter.
def text_to_docs(data): #==> data is a langchain Document
content = data.page_content
base_meta = data.metadata
if len(content)>20000:
chunk_size = 9000
chunk_overlap = 2000
else:
chunk_size = 5000
chunk_overlap = 700
text_splitter = RecursiveCharacterTextSplitter(
# Set a variable chunk size, just to show.
separators= ["[","\n\n", "\n", " ", ""],
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
#length_function=length_function,
is_separator_regex=False,
)
docs_index = text_splitter.create_documents(texts=[content], metadatas=[base_meta])
return docs_index
I don’t know why, but these limits are also so disruptive. Someone creating an app based on Gemini can’t really scale using 5 RPM; I suggest implementing a tier system like the one provided by OpenAI that scales according to the use case. Anyway, updates will come.
Read also >> ChatGPT: The Ultimate Guide to DAN Jailbreak Prompts (DAN 6.0, STAN, DUDE, and the Mongo Tom)