Troubleshooting Gemini-Pro response.text Error: Insights and Workarounds

Lately, I have been working on a project involving the use of the Gemini API for one of my clients. Since I am on the list of early testers of the Gemini Pro 1.5, I was attempting to implement the system using Gemini 1.5 Pro instead of Gemini 1.0 Pro. The function relying on the Gemini AI API was related to extracting key insights (like a short summary) from the long text of books.

That said, I noticed something a bit odd. Indeed, the documentation for Gemini on Google’s site (linked here) indicates that the Gemini 1.5 Pro model (preview only) has a maximum token limit of 1,048,576 and a maximum output of 8,192 tokens.

However, during my tests using normal parameters (code attached), I noticed that the model indeed does not accept more than 10,000 characters, which are roughly equivalent to +/- 5,000 tokens or less. Indeed, the model responds with an error:

import google.generativeai as genai

GOOGLE_API_KEY = "XXXXXXXX" #Your API code connected to Google Cloud project

genai.configure(api_key=GOOGLE_API_KEY)

# Set up the model

safety_settings = [
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_HATE_SPEECH",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
  {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_ONLY_HIGH"
  },
]
genai.configure(api_key=GOOGLE_API_KEY)

# Set up the model
generation_config = {
  "temperature": 0.7,
  "top_p": 1,
  "top_k": 1,
  "max_output_tokens": 2000,
}

model_name = "gemini-1.5-pro-latest" #or gemini-pro

model = genai.GenerativeModel(model_name=model_name,
                              generation_config=generation_config,
                              safety_settings=safety_settings)

messages = [
          guidelines,
          content,
      ]

response = model.generate_content(messages)

Error: “The response.text quick accessor only works for simple (single-part) text responses. This response is not simple text. Use the result.parts accessor or the full result.candidates[index].content.parts lookup instead.”

This is a really strange and perplexing situation. Because when I try the same prompt but with ‘trimmed’ text, e.g., 6,000 characters, it sometimes works. In doing some research, I found that the same error is appearing for other users (here, here, and here).

Here is the intriguing question: Can the Gemini model ‘really’ process the so-called 1,048,576? If yes, how? Is it an API backend problem or a byte encoding issue (like those faced with GPT-3 Turbo)? For now, I haven’t found any answers to these questions, though I am still experimenting with this.

The temporary fix for this is to divide the long text into multiple sub-chunks with a maximum of around 5,000 tokens. However, on the other hand, the Gemini API is still limited to 2 requests per minute and 5 RPM for paid users, so you can’t really push this to production.

def text_to_binary_string(text):
    return ''.join(format(ord(char), '08b') for char in text)

def get_chunked_transcript(text, limit=1000):
    words = text.split()
    chunked_text = ''
    current_bytes = 0

    for word in words:
        word_bytes = len(text_to_binary_string(word + ' '))  # Including space in the byte count
        if current_bytes + word_bytes <= limit:
            chunked_text += word + ' '
            current_bytes += word_bytes
        else:
            # If adding the next word exceeds the limit, check if we can add a portion of it
            if word_bytes > limit:
                for i in range(len(word)):
                    if current_bytes + len(text_to_binary_string(word[:i + 1] + ' ')) > limit:
                        break
                    chunked_text += word[i]
                break  # Exit the loop after adding a portion of the word
            else:
                break  # Exit the loop if no additional words can be added without exceeding the limit

    return chunked_text.strip()

We can also implement similar approch using Langchain RecursiveCharacterTextSplitter.

def text_to_docs(data): #==> data is a langchain Document
  content = data.page_content
  base_meta = data.metadata

  if len(content)>20000:
    chunk_size = 9000
    chunk_overlap = 2000
  else:
    chunk_size = 5000
    chunk_overlap = 700

  text_splitter = RecursiveCharacterTextSplitter(
    # Set a variable chunk size, just to show.
    separators= ["[","\n\n", "\n", " ", ""],
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
    #length_function=length_function,
    is_separator_regex=False,
    )
  docs_index = text_splitter.create_documents(texts=[content], metadatas=[base_meta])
  return docs_index

I don’t know why, but these limits are also so disruptive. Someone creating an app based on Gemini can’t really scale using 5 RPM; I suggest implementing a tier system like the one provided by OpenAI that scales according to the use case. Anyway, updates will come.