Name: Google DeepMind Gemini 2.5 Flash (Thinking)
Brand: Google DeepMind
Price: $0.15 USD
Availability: InStock

Quick Cost Calculator

Light Usage

~$0.37

per month

100K input tokens
100K output tokens
~75 pages of text processed

Medium Usage

~$3.65

per month

1M input tokens
1M output tokens
~750 pages of text processed

Heavy Usage

~$36.5

per month

10M input tokens
10M output tokens
~7,500 pages of text processed

Model Specifications

Gemini 2.5 Flash (Thinking) by Google DeepMind is a multimodal AI model for chat, writing, and understanding images or audio.

Provider

Google DeepMind

Modality

Multimodal

Supports text, images, and audio

Context Window

2000000 tokens

Maximum tokens per request

Input Price

$0.15

per 1M input tokens

Output Price

$3.50

per 1M output tokens

Advanced Features

✓ Tool Calling

✓ JSON Mode

✓ Streaming

View official pricing documentation

Introduction

Gemini 2.5 Flash (Thinking) is a multimodal large language model from Google DeepMind designed for real‑world applications where speed, quality and cost all matter. It’s priced at $0.15 per million input tokens and $3.50 per million output tokens, so teams can estimate usage‑based spend with simple math.

The model supports a context window of about 2,000,000 tokens, which is enough for long chats, multi‑document prompts, or passing rich system instructions without constant truncation.

Because it accepts images (and in some stacks, audio), developers can build experiences like visual question‑answering, document parsing with screenshots, or voice chat that feels instant.

Customer support, education, and creative tools benefit from the faster response times and broader modality coverage.

In stack choices, many teams compare Gemini 2.5 Flash (Thinking) against ChatGPT 4o. The trade‑off usually comes down to latency tolerance, budget per request, and whether you need image understanding or deeper chain‑of‑thought style reasoning. If you’re cost‑sensitive, keep prompts concise, cache system instructions, and stream outputs so users perceive faster response. If quality is the priority, add brief exemplars and explicit success criteria in the prompt; small guidance often yields outsized gains.

For production, pair the model with guardrails (content filters, schema validators) and log prompts/responses for offline evaluation. Finally, create simple comparison tests—five to ten representative tasks from your app—to verify this model’s answers, latency and cost against your alternatives before you commit. To control spend, consider tiering workloads: route routine queries to a cheaper sibling and reserve this model for complex or customer‑visible moments. Add retries with temperature control, and prefer JSON‑mode or tool calling for structured outputs that slot directly into your pipeline without brittle parsing.

What is Gemini 2.5 Flash (Thinking)?

Gemini 2.5 Flash (Thinking) is a multimodal AI model from Google DeepMind. Pricing is $0.15 per 1M input tokens and $3.50 per 1M output tokens. The context window is roughly 2,000,000 tokens, allowing long prompts and documents. Common uses include chat assistants, summarization, knowledge search, report drafting, and, when supported, image understanding or tool use. It integrates well into web apps, backends, and automation pipelines where latency and reliability matter.

View all AI model prices

API.chat

TextBeam

Gemini 2.5 Flash (Thinking) Price

Current Pricing

Input Tokens

Output Tokens

Quick Cost Calculator

Light Usage

Medium Usage

Heavy Usage

Model Specifications

Provider

Modality

Context Window

Input Price

Output Price

Advanced Features

Introduction

What is Gemini 2.5 Flash (Thinking)?

Compare Similar AI Models

Gemini 1.5 Flash - 8B

Gemini 1.5 Flash

Gemini 2.0 Flash - Lite

Ready to reduce your AI costs?