Google's Gemini API has a free tier that's good enough for real work — transcription, summaries, chat, even audio processing. But there's a catch: if you hit the rate limits too often, Google doesn't just throttle you. They revoke your API key entirely, and you have to create a new one.

This guide shows you how to get a free Gemini API key, what the limits actually are, and how to write code that stays within them. We'll use a pattern we developed for mono that has run for months without a single key revocation.

The short version

  • 1. Free tier: 10 requests/minute and 250 requests/day
  • 2. Don't wait for 429 errors — count requests yourself
  • 3. Save your daily count to a file so it survives restarts
  • 4. Add a 30-minute backoff if you ever do get a 429
  • 5. Have a fallback provider ready for when you hit the limit

Step 1 — Get a free API key

Go to Google AI Studio and sign in with your Google account. Click Create API key, pick a project (or create one), and copy the key.

Google AI Studio showing the Create API key dialog
Creating an API key in Google AI Studio. Name it something you'll recognize, then click Create.

That's it — no credit card, no billing setup. The key works immediately with the free tier limits.

Keep your key safe. Anyone with your key can use your quota (and get it revoked). Store it in an environment variable or a secrets manager, not in your code.

Step 2 — Understand the limits

The free tier has two limits that matter:

Limit Value Resets
RPM (requests per minute) 10 Rolling 60-second window
RPD (requests per day) 250 Midnight Pacific time

There's also a token limit (about 1 million tokens per minute), but in practice you'll hit RPM or RPD first unless you're sending huge documents.

The real risk isn't the 429. When you exceed the limit, Google returns a 429 "Too Many Requests" error. That's fine — you can retry later. The problem is that if your code keeps hitting 429s, Google's abuse detection kicks in and revokes your key. Then you need to create a new one, and your old code stops working.

Step 3 — Track your usage proactively

The key insight is: don't wait for 429 errors. Instead, count your requests yourself and stop before you hit the limit. This way Google never sees you as abusive.

Here's the pattern in Python:

# Rate limiting state
RPM_LIMIT = 9   # Stay under 10
RPD_LIMIT = 240 # Stay under 250

request_timestamps = []  # Timestamps of requests in the last 60s
daily_requests = 0
last_reset_date = None

def is_available():
    """Check if we can make a request without hitting limits."""
    global daily_requests, last_reset_date

    # Reset daily counter at midnight Pacific
    today = datetime.now(ZoneInfo("America/Los_Angeles")).date()
    if last_reset_date != today:
        daily_requests = 0
        last_reset_date = today
        request_timestamps.clear()

    # Check daily limit
    if daily_requests >= RPD_LIMIT:
        return False

    # Check per-minute limit (sliding window)
    cutoff = time.time() - 60
    while request_timestamps and request_timestamps[0] < cutoff:
        request_timestamps.pop(0)

    if len(request_timestamps) >= RPM_LIMIT:
        return False

    return True

def track_request():
    """Record that we made a request."""
    global daily_requests
    daily_requests += 1
    request_timestamps.append(time.time())

Before every API call, check is_available(). If it returns False, don't make the request — either wait, or fall back to another provider.

Why use 9 and 240 instead of 10 and 250?

Buffer room. Network timing isn't perfect, and Google's counting might differ slightly from yours. Staying a bit under the limit means you never accidentally go over.

Step 4 — Persist the counter across restarts

If your app restarts, your in-memory counter resets to zero — but Google's counter doesn't. If you've used 200 requests today and your app restarts, it thinks it has 240 left when it actually has 50. That's a fast track to a 429.

The fix: save your daily count to a file.

import json
from pathlib import Path

STATE_FILE = Path("gemini_rate.json")

def save_state():
    STATE_FILE.write_text(json.dumps({
        "date": last_reset_date.isoformat(),
        "requests": daily_requests,
    }))

def load_state():
    global daily_requests, last_reset_date
    try:
        data = json.loads(STATE_FILE.read_text())
        saved_date = date.fromisoformat(data["date"])
        today = datetime.now(ZoneInfo("America/Los_Angeles")).date()
        if saved_date == today:
            daily_requests = data["requests"]
            last_reset_date = today
    except (FileNotFoundError, json.JSONDecodeError, KeyError):
        pass  # Start fresh

Call load_state() when your app starts, and save_state() after each request. Now restarts don't lose your count.

Step 5 — Handle 429s gracefully

Even with proactive counting, you might still get a 429. Maybe Google's clock differs from yours, or there's a bug in your code. When that happens, don't just retry immediately — that makes things worse.

unavailable_until = None

def mark_rate_limited():
    """Back off for 30 minutes after a 429."""
    global unavailable_until
    unavailable_until = datetime.now() + timedelta(minutes=30)
    print(f"Got 429 — backing off until {unavailable_until}")

def is_available():
    # ... existing checks ...

    # Check if we're in a backoff period
    if unavailable_until and datetime.now() < unavailable_until:
        return False

    # ... rest of function ...

Why 30 minutes? Google's throttling after a 429 is inconsistent — sometimes requests pass, sometimes they don't. A longer backoff avoids hammering the API while it's already unhappy with you.

Step 6 — Add a fallback provider

The free tier is great, but 250 requests per day isn't infinite. For a production app, you need a fallback for when the quota runs out.

Options:

The pattern is simple:

def make_llm_request(prompt):
    if gemini.is_available():
        try:
            response = gemini.generate(prompt)
            gemini.track_request()
            return response
        except RateLimitError:
            gemini.mark_rate_limited()

    # Fall back to paid provider
    return openai.generate(prompt)

This way you use the free tier when available and automatically switch when it's not.

Complete example

Here's everything together in a single module:

import json
import time
from datetime import datetime, date, timedelta
from pathlib import Path
from zoneinfo import ZoneInfo
import httpx

GEMINI_API_KEY = "your-key-here"  # Use env var in production
API_URL = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent"
STATE_FILE = Path("gemini_rate.json")
PACIFIC = ZoneInfo("America/Los_Angeles")

RPM_LIMIT = 9
RPD_LIMIT = 240

_timestamps = []
_daily = 0
_last_date = None
_backoff_until = None

def _load():
    global _daily, _last_date
    try:
        data = json.loads(STATE_FILE.read_text())
        if date.fromisoformat(data["date"]) == datetime.now(PACIFIC).date():
            _daily = data["requests"]
            _last_date = datetime.now(PACIFIC).date()
    except:
        pass

def _save():
    STATE_FILE.write_text(json.dumps({
        "date": (_last_date or datetime.now(PACIFIC).date()).isoformat(),
        "requests": _daily
    }))

def is_available():
    global _daily, _last_date, _timestamps

    # Load state on first call
    if _last_date is None:
        _load()

    # Reset at midnight Pacific
    today = datetime.now(PACIFIC).date()
    if _last_date != today:
        _daily = 0
        _last_date = today
        _timestamps.clear()
        _save()

    # Check backoff
    if _backoff_until and datetime.now(PACIFIC) < _backoff_until:
        return False

    # Check daily limit
    if _daily >= RPD_LIMIT:
        return False

    # Check per-minute limit
    cutoff = time.time() - 60
    _timestamps = [t for t in _timestamps if t >= cutoff]
    if len(_timestamps) >= RPM_LIMIT:
        return False

    return True

def generate(prompt):
    global _daily, _backoff_until

    response = httpx.post(
        API_URL,
        json={"contents": [{"parts": [{"text": prompt}]}]},
        headers={"x-goog-api-key": GEMINI_API_KEY},
        timeout=60
    )

    if response.status_code == 429:
        _backoff_until = datetime.now(PACIFIC) + timedelta(minutes=30)
        raise Exception("Rate limited")

    response.raise_for_status()

    # Track successful request
    _daily += 1
    _timestamps.append(time.time())
    _save()

    data = response.json()
    return data["candidates"][0]["content"]["parts"][0]["text"]

FAQ

What if I need more than 250 requests per day?

You have three options: create multiple API keys (risky — Google may notice), upgrade to a paid tier (starts at $0.075 per million input tokens), or use a fallback provider for overflow.

Does the free tier work for audio transcription?

Yes. Gemini 2.5 Flash can transcribe audio directly — send the audio as base64 and ask it to transcribe. Same rate limits apply.

Why Pacific time for the daily reset?

That's when Google resets the counter. If you're in another timezone and reset at your local midnight, your count won't match Google's.

Can I use this for a production app?

With a fallback, yes. Without one, you'll hit the daily limit during heavy use. The free tier is best for personal projects or as a cost-saver alongside a paid provider.