Gemma

What is Gemma?

Gemma is a family of open models for developers who need adaptable AI for generation, retrieval, and safety checks. It includes Gemma 4 for text, audio, and image input, EmbeddingGemma, and ShieldGemma 2, with support for on-device and cloud deployment. The stack works with JAX, Keras, PyTorch, TensorFlow, and vLLM, and is available through Kaggle Models, Hugging Face, and Google Cloud.

Last verifiedMay 17, 2026How we evaluate

Visit Gemma

At a glance

Best for: Gemma is best for developers who need open models they can adapt and deploy across devices and cloud.
API: Yes — The page advertises low-code APIs for common AI tasks via MediaPipe Tasks, including generative AI, vision, text, and audio.

What does Gemma do?

Gemma runs lightweight open models that can be adapted for general and specialized tasks, from generation to retrieval and safety checks. The family includes Gemma 4 for text, audio, and image input, EmbeddingGemma for numerical text representations, and ShieldGemma 2 for policy-based safety evaluation. Google positions the stack around developer tools that support responsible use, customization, and deployment across applications and hardware. At scale, Gemma supports over 140 languages and a long 128K context window, with some variants reaching up to 256K. The models can run on mobile devices, hosted services, or your own hardware, and the docs point to deployment paths through Google Cloud, Android, iOS, web, and embedded environments. The ecosystem also includes Kaggle Models and Hugging Face for discovery, plus Google Cloud options such as Vertex AI Model Garden, Cloud Run, and GKE for serving and fine-tuning.

Why use Gemma?

Open model weights let teams customize Gemma for specific tasks instead of treating it as a fixed hosted service.
The same model family spans on-device, cloud, web, and embedded deployment, reducing reimplementation across targets.
Gemma 4 combines text, audio, and image input with long-context support, so teams can build multimodal experiences in one stack.
EmbeddingGemma and ShieldGemma 2 cover retrieval and safety workflows without forcing separate model families.
Google Cloud deployment paths and self-hosting support give teams flexibility over where models run and how they scale.

Who is Gemma for?

ML engineers who want open models they can tune for specific tasks and workflows.
Mobile developers who need on-device AI that works across Android and iOS.
Platform teams who want deployment options spanning cloud, web, and embedded environments.
Applied researchers who need lightweight models with long context and multilingual coverage.
Safety teams who need policy-based evaluation for generative AI outputs.

What are Gemma's key features?

Maximum compute and memory efficiency

Runs with maximum compute and memory efficiency, helping teams fit Gemma into tighter budgets and smaller deployments without sacrificing model quality.

Unprecedented intelligence-per-parameter

Delivers unprecedented intelligence-per-parameter, so buyers can get strong model performance from smaller footprints for faster inference and lower infrastructure demand.

On device

Supports on-device deployment for Android, iOS, and Chrome, keeping inference local for lower latency and better control over sensitive data.

Cross-platform

Works across Android, iOS, Chrome, Cloud Run, and Google Kubernetes Engine, letting teams ship the same model across mobile, browser, and server environments.

Multi-framework

Fits existing ML stacks with JAX, Keras, PyTorch, TensorFlow, vLLM, and MaxText, reducing migration work for teams standardizing on different frameworks.

Full AI edge stack

Provides a full AI edge stack with MediaPipe Tasks APIs for generative AI, vision, text, and audio, plus low-code deployment paths for edge apps.

Deploy custom models cross-platform

Lets teams convert, deploy, and quantize custom models for cross-platform use, including Vertex AI Model Garden, Vertex AI Training Clusters, and Hugging Face workflows.

Long context window

Handles long 128K and up to 256K context windows, making it better suited for larger prompts, document workflows, and extended conversations.

What does Gemma integrate with?

Android
iOS
Chrome
JAX
Keras
PyTorch
TensorFlow
Vertex AI Model Garden
Cloud Run
Google Kubernetes Engine
Agent Development Kit
Vertex AI Training Clusters
vLLM
MaxText
Sovereign Cloud
Kaggle Models
Hugging Face

What are Gemma's use cases?

Mobile AI on-device

Mobile developers use Gemma to ship AI features that run directly on Android and iOS, using On device and Cross-platform to keep experiences responsive without depending on a round trip to the cloud. They can also use Multi-framework support to fit existing app stacks.

Custom model tuning

ML engineers use Gemma to adapt open models for specific tasks and workflows, using open models and Deploy custom models cross-platform to move from prototype to production across cloud and edge targets. Maximum compute and memory efficiency helps keep experiments practical on limited hardware.

Long-context research

Applied researchers use Gemma to test lightweight models on multilingual and long-document workloads, using Long context window and Unprecedented intelligence-per-parameter to study performance on extended prompts. The lightweight design makes it easier to iterate on experiments without heavy infrastructure.

Policy checks for outputs

Safety teams use Gemma to evaluate generative AI outputs against policy requirements, using ShieldGemma 2 and Full AI edge stack to inspect responses in controlled deployment environments. This helps teams catch unsafe content before it reaches users.

How does Gemma work?

Pick a model in the Gemma family that matches your task, then review its size, context length, and supported modalities before you start building.
Convert the model into your preferred runtime with Multi-framework support, using JAX, Keras, PyTorch, or TensorFlow to fit existing training and inference workflows.
Quantize and optimize for Maximum compute and memory efficiency, then validate quality with your own prompts or datasets before moving to deployment.
Deploy custom models cross-platform to Android, iOS, Chrome, Cloud Run, or Google Kubernetes Engine, and use the Full AI edge stack for edge-ready delivery.

Frequently asked questions

What is Gemma?

What is Gemma used for? Who is it for?

Gemma is used for Maximum compute and memory efficiency, Unprecedented intelligence-per-parameter, and On device. It's built for ML engineers, Mobile developers, and Platform teams.

Does Gemma have an API and what does it integrate with?

The page advertises low-code APIs for common AI tasks via MediaPipe Tasks, including generative AI, vision, text, and audio. It integrates with Android, iOS, Chrome, JAX, Keras, and 12 more.

Filed under:AI Model Providers local-ai open-source self-hosted

Explore other AI Model Providers

Browse AI Model Providers

Fireworks AI

Open-model inference, tuning, and deployment in one cloud stack.

AI Model Providers

Fireworks AI runs open-model inference, tuning, and deployment with Code Assistance, Search, and Agentic Systems. Starts with $1 in free credits.

Poe

Compare AI models, generate media, and search the web in one app.

AI Model Providers

Poe lets you compare AI models, generate media, and search the web in one app with GPT-4, Claude, Gemini, Runway, and Ideogram.

Claude API

Claude gateway for developers with SDK-compatible routing and low latency.

AI Model Providers

Claude API routes Claude requests with SDK-compatible access, multi-region routing, and usage-based pricing from $0.8.

OpenAI

AI platform for chat, voice, and API workflows.

AI Model Providers

OpenAI combines ChatGPT, voice, Web search, and Containers. Pricing starts at GPT-5.5 $5.00 / 1M tokens.

Together AI

AI cloud for inference, fine-tuning, and GPU clusters.

AI Model Providers

Together AI runs serverless inference, fine-tuning, and GPU clusters for AI teams. Usage-based tiers start at $0.30 for MiniMax M2.7.