gemini-3.5-flash β API, Pricing & Context Window | Vivgrid
gemini-3.5-flash on Vivgrid: Google's fast multimodal model with a 1M-token context window across text, image, video, audio and PDF.
gemini-3.5-flash is Google's fast, fully multimodal model, accepting text, image, video, audio, and PDF inputs within a 1M-token context window. It's a strong agent model for workflows that mix media types at scale.
On Vivgrid, gemini-3.5-flash runs as a globally centralized model reachable through the same OpenAI-compatible endpoint and unified key as every other model in the catalog.
Specifications
| Provider | |
| Model ID | gemini-3.5-flash |
| Best for | General-purpose |
| Context window | 1,000,000 tokens |
| Max output | 65,536 tokens |
| Modalities | Text, Image, Video, Audio, Pdf |
| Tool / function calling | Yes |
| Knowledge cutoff | 2025-01 |
| Acceleration | π Global (Centralized) |
Pricing
Pricing in USD per 1M tokens, matching the provider's rates.
| Input | Cached input | Output |
|---|---|---|
| $1.50 | $0.15 | $9.00 |
Quick start
Call gemini-3.5-flash through Vivgrid's unified, OpenAI-compatible endpoint. Get an API key from the Vivgrid Console.
curl https://api.vivgrid.com/v1/chat/completions \
-H "Authorization: Bearer $VIVGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-3.5-flash",
"messages": [
{ "role": "user", "content": "Say hello in English, Chinese and Spanish." }
],
"stream": true
}'Ideal use cases
- Multimodal agents combining video, audio, images, and PDFs
- Long-context understanding up to 1M tokens
- Fast, high-volume reasoning and summarization
- Media-heavy ingestion and analysis pipelines
Related models
- gemini-3.1-pro-preview β higher-end Gemini for coding
- gemini-3-flash-preview β prior flash generation
- gpt-5.4-mini β comparable fast multimodal option
gpt-4o
gpt-4o on Vivgrid: OpenAI's multimodal GPT-4o with a 128K context window and image input, accessible through one unified API key.
gemini-3.1-pro-preview
gemini-3.1-pro-preview on Vivgrid: Google's high-end multimodal coding model with a ~1M-token context window across text, image, video, audio and PDF.