Gemma 3 in your browser
Full inference for Gemma 3 270M-Instruct (FP16), running entirely on-device via WebGPU. Optimized with KV-caching and JIT kernel fusion powered by jax-js.
View source on GitHubInference
Explore the model
Run a forward pass and inspect attention weights layer by layer. Select any attention head to see how the model attends across tokens.
Open →Chat
Talk to the model
Interact with Gemma 3 through a chat interface. Responses stream token-by-token, generated entirely on your machine.
Open →