LLM

Gemma

Gemma 3

Gemma3 270M Instruct
Gemma3 270M Instruct
ID: gemma-3-270m-it
by Google
Gemma License

A lightweight 270M parameter LLM from Google, optimized for efficiency in tasks like chatbot interactions, code assistance, and text generation directly in the browser.

Ultra-Fast
Lightweight
Chatbot
478
49
543.46 MB
WebGPU
WASM
Gemma3 1B Instruct
Gemma3 1B Instruct
ID: gemma-3-1b-it
by Google
gemma

Gemma3 1B Instruct is a medium-scale instruction-tuned language model from Google’s Gemma 3 family. With 1 billion parameters, it strikes a balance between efficiency and capability, making it suitable for chatbot dialogues, code assistance, and general-purpose text generation. While more powerful than compact models like Gemma3 270M, it still remains lightweight compared to larger-scale LLMs. Current WebAI support for this model is limited, with only quantized precision available on select devices (primarily WASM). Broader device and precision support may be added in future updates.

High Performance
Multi-Language
23
5
1018.38 MB
WASM

SmolLM

SmolLM V2

SmolLM V2 135M Instruct
SmolLM V2 135M Instruct
New
ID: smollm2-135m-instruct
by HuggingFace
Apache 2.0

An ultra-lightweight and fast compact language model designed for basic reasoning and text generation tasks. Ideal for deployment on mobile and edge devices.

Ultra-Fast
Lightweight
Mobile Friendly
72
0
112.24 MB
WebGPU
WASM
SmolLM V2 360M Instruct
SmolLM V2 360M Instruct
New
ID: smollm2-360m-instruct
by HuggingFace
Apache 2.0

With 360 million parameters, this model strikes a balance between speed and performance, making it lightweight, fast, and well-suited for a wide range of tasks on edge devices.

Fast
Lightweight
Balanced Performance
258
24
260.10 MB
WebGPU
WASM
SmolLM V2 1.7B Instruct
SmolLM V2 1.7B Instruct
New
ID: smollm2-1.7b-instruct
by HuggingFace
Apache 2.0

The most capable model in the SmolLM family, capable of handling complex language tasks and generating high-quality text, while still being fast on edge devices.

High Performance
Knowledgeble
Fast
Structured Output
876
42
1.03 GB
WebGPU

SmolLM V3

SmolLM V3 3B Instruct
SmolLM V3 3B Instruct
New
ID: smollm3-3b-instruct
by HuggingFace
Apache 2.0

SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports dual mode reasoning, 6 languages and long context.

SOTA Performance
Deep Reasoning
564
35
1.98 GB
WebGPU

Qwen

Qwen3

Qwen V3 0.6B
Qwen V3 0.6B
New
ID: qwen3-0.6b
by QwenLM
Apache 2.0

A performant LLM with 1 billion parameters, optimized for lightweight tasks like chatbot interactions and code assistance, balancing efficiency and capability.

High Performance
Deep Thinking
Structured Output
Fast
678
79
543.39 MB
WebGPU
WASM

DeepSeek

DeepSeek R1

DeepSeek R1 1.5B
DeepSeek R1 1.5B
ID: deepseek-r1-distill-qwen-1.5b
by DeepSeek AI
MIT

A powerful LLM designed for reasoning-intensive tasks. It excels at code generation, essay writing, and a variety of other applications. Optimized for speed and efficiency.

High Performance
Deep Think
Multi-Language
Fast
546
20
1.28 GB
WebGPU

Llama

LLaMA 3.2

LLaMA 3.2 1B Instruct
LLaMA 3.2 1B Instruct
ID: llama-3.2-1b-instruct
by Meta
Llama 3.2 Community

A performant LLM with 1 billion parameters, optimized for lightweight tasks like chatbot interactions and code assistance, balancing efficiency and capability.

High Performance
Multi-Language
Fast
932
41
1.15 GB
WebGPU

NLP

Text Embedding

Flag Embedding

Flag Embedding Small 1.5 (English)
Flag Embedding Small 1.5 (English)
ID: bge-small-en-v1.5
by BAAI
MIT

Lightweight and optimized for English, this model delivers fast, low-latency text embeddings—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
RAG
232
24
32.44 MB
WebGPU
WASM
Flag Embedding Small 1.5 (Chinese)
Flag Embedding Small 1.5 (Chinese)
ID: bge-small-zh-v1.5
by BAAI
MIT

Lightweight and optimized for Chinese, this model delivers fast, low-latency text embeddings—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
RAG
42
0
22.90 MB
WebGPU
WASM
Flag Embedding Base 1.5 (English)
Flag Embedding Base 1.5 (English)
ID: bge-base-en-v1.5
by BAAI
MIT

Optimized for English, this model delivers fast, low-latency text embeddings with more balanced performance—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast
Balanced Performance
Mobile Friendly
RAG
212
29
91.53 MB
WebGPU
WASM
Flag Embedding Base 1.5 (Chinese)
Flag Embedding Base 1.5 (Chinese)
ID: bge-base-zh-v1.5
by BAAI
MIT

Optimized for English, this model delivers fast, low-latency text embeddings with more balanced performance—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast
Balanced Performance
RAG
12
0
98.10 MB
WebGPU
WASM
Flag Embedding M3 (Multi-Lingual)
Flag Embedding M3 (Multi-Lingual)
ID: bge-m3
by BAAI
MIT

Supports over 100 languages, offering fast, balanced text embeddings across multilingual content—ideal for global RAG, cross-lingual search, and multilingual understanding.

High Performance
100+ Languages
RAG
Fast
782
43
543.30 MB
WebGPU
WASM

Gemma 3

Gemma Embedding 300M (Multi-Lingual)
Gemma Embedding 300M (Multi-Lingual)
ID: gemma-embedding-300m
by Google
Gemma License

A compact 300M parameter embedding model from Google, optimized for multilingual text embeddings with low latency—perfect for RAG, semantic search, clustering, and recommendation systems.

SOTA
Ultra-Fast
100+ Languages
RAG
1.4k
20
167.28 MB
WebGPU
WASM

Qwen3

Qwen3 Embedding 0.6B (Multi-Lingual)
Qwen3 Embedding 0.6B (Multi-Lingual)
ID: qwen3-embedding-0.6b
by QwenLM
Apache 2.0

Compact 0.6B parameter embedding model optimized for efficiency and speed. Delivers quality text embeddings for RAG applications, semantic search, and content understanding with minimal resource requirements.

SOTA
100+ Languages
RAG
Fast
1.9k
62
541.17 MB
WebGPU
WASM

Jina Embeddings

Jina Embeddings V2 Small (English)
Jina Embeddings V2 Small (English)
ID: jina-embeddings-v2-small-en
by Jina AI
Apache 2.0

Designed for English, this lightweight model provides fast, efficient embeddings with strong semantic performance—ideal for local RAG, search, and retrieval tasks on resource-constrained systems.

Ultra-Fast
Mobile Friendly
Semantic Search
363
5
91.53 MB
WebGPU
WASM

MiniLM

All MiniLM L6 V2
All MiniLM L6 V2
ID: all-minilm-l6-v2
by HuggingFace
Apache 2.0

One of the fastest and most efficient text embedding models, very popular on HuggingFace, ideal for semantic search, clustering, and classification tasks. It provides high-quality embeddings with low latency.

Ultra-Small
Ultra-Fast
Mobile Friendly
Semantic Search
1.2k
47
21.91 MB
WebGPU
WASM

Cross-encoder Reranker

Flag Reranker

Flag Reranker Base (English & Chinese)
Flag Reranker Base (English & Chinese)
ID: bge-reranker-base
by BAAI
MIT

A cross-encoder reranker model that improves the ranking of documents by re-evaluating them in the context of a given query, specifically designed for English and Chinese languages.

Ultra-Fast
High Performance
RAG
1.9k
73
266.36 MB
WebGPU
WASM
Flag Reranker Large
Flag Reranker Large
ID: bge-reranker-large
by BAAI
MIT

A cross-encoder reranker model that improves the ranking of documents by re-evaluating them in the context of a given query, specifically designed for English and Chinese languages.

Ultra-Fast
High Performance
RAG
86
10
536.86 MB
WebGPU
WASM

Sentiment Analysis

BERT

BERT Sentiment Analysis (English)
BERT Sentiment Analysis (English)
ID: distilbert-base-uncased-finetuned-sst-2-english
by HuggingFace
Apache 2.0

Ultra-fast and lightweight BERT model fine-tuned on sst-2 for sentiment analysis in English, ideal for real-time applications with low latency requirements.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
2.3k
72
64.25 MB
WebGPU
WASM

RoBERTa

Financial Sentiment Analysis (English)
Financial Sentiment Analysis (English)
ID: distilroberta-finetuned-financial-news-sentiment-analysis
by Manuel Romero
Apache 2.0

Ultra-fast and lightweight RoBERTa model fine-tuned on financial news for sentiment analysis in English, ideal for real-time applications with low latency requirements.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
Finance
72
4
78.91 MB
WebGPU
WASM

Audio

Speech Recognition

OpenAI Whisper

Whisper Tiny
Whisper Tiny
ID: whisper-tiny
by OpenAI
MIT

The smallest and fastest whisper model with suprisingly good accuracy, ideal for real-time audio transcription, note-takings in resource-constrained environments, support word-level timestamps.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
Multi-Language
338
12
38.94 MB
WebGPU
WASM
Whisper Tiny (English Only)
Whisper Tiny (English Only)
ID: whisper-tiny-en
by OpenAI
MIT

Optimized specifically for English, this is the fastest whisper model—perfect for real-time English transcription, note-takings in resource-constrained environments, support word-level timestamps.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
225
28
38.94 MB
WebGPU
WASM
Whisper Base
Whisper Base
ID: whisper-base
by OpenAI
MIT

Delivers better accuracy than Whisper-tiny while still running at high speed — ideal for noisier audio or more demanding transcription tasks, support word-level timestamps.

Lightweight
Ultra-Fast
Balanced Accuracy
Multi-Language
2.1k
69
73.31 MB
WebGPU
WASM
Whisper Base (English Only)
Whisper Base (English Only)
ID: whisper-base-en
by OpenAI
MIT

Specifically optimized for English, offering improved English transcription accuracy while still maintaining high-speed performance, support word-level timestamps.

Lightweight
Ultra-Fast
Balanced Accuracy
115
1
73.31 MB
WebGPU
WASM
Whisper Large V3 Turbo
Whisper Large V3 Turbo
ID: whisper-large-v3-turbo
by OpenAI
MIT

The most powerful and accurate Whisper model — hardware-intensive yet still fast, delivering top-tier performance for accuracy-critical tasks like subtitles and dubbing, support word-level timestamps.

Most Powerful
Most Accurate
80+ Languages
Fast
4.7k
126
537.49 MB
WebGPU
Whisper Large V3 Fast
Whisper Large V3 Fast
ID: whisper-large-v3-turbo-fast
by OpenAI
MIT

A highly optimized Whisper model for speed — delivers top-quality transcription, but does not support word-level timestamps.

Most Powerful
Most Accurate
80+ Languages
Fast
1.4k
36
537.25 MB
WebGPU

Moonshine

Moonshine Tiny Real-Time (English Only)
Moonshine Tiny Real-Time (English Only)
New
ID: moonshine-tiny-en
by Useful Sensors
MIT

An ultra-lightweight, ultra-fast model for English speech — outperforms Whisper-tiny in both speed and accuracy. Ideal for real-time transcription, on-device apps, and low-power environments, do not support timestamp return for now.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
Real-Time
138
2
26.81 MB
WebGPU
WASM
Moonshine Base Real-Time (English Only)
Moonshine Base Real-Time (English Only)
New
ID: moonshine-base-en
by Useful Sensors
MIT

High performance and fast English ASR model. Ideal for real-time transcription, on-device apps, and low-power environments, do not support timestamp return for now.

Ultra-Fast
High Performance
Mobile Friendly
Real-Time
32
2
60.00 MB
WebGPU
WASM

Synhetic Speech

Kokoro TTS

Kokoro TTS V1 82M (English)
Kokoro TTS V1 82M (English)
New
ID: kokoro-tts-82m
by Hexgrad
Apache 2.0

High performance text-to-speech model with fast inference speed, supports English language with different voices and accents, ideal for real-time applications.

High Performance
Ultra-Fast
Real-Time
Multi-Voices
1.7k
10
88.08 MB
WebGPU
WASM

Image

Background Remover

Portrait BG Remover - MODNet
Portrait BG Remover - MODNet
ID: rmbg-modnet
by Zhanghan Ke et al.
Apache 2.0

An ultra-lightweight, real-time matting model designed to separate foregrounds from backgrounds in images and videos—ideal for background removal, virtual try-on, and video conferencing with limited compute.

Ultra-Fast
Ultra-Lightweight
Mobile Friendly
Video BG
1.3k
97
6.32 MB
WebGPU
WASM
BG Remover - BEN2
BG Remover - BEN2
New
ID: rmbg-ben2
by PramaLLC
MIT

BEN2 (Background Erase Network) is a fast and high-quality background remover that quickly separates people or objects from any background. It delivers clean, accurate results in real time, making it perfect for videos, photos, and apps where speed and visual quality matter.

Top Accuracy
Ultra-Fast
Video BG
604
23
223.85 MB
WebGPU
BG Remover - ORMBG
BG Remover - ORMBG
New
ID: rmbg-ormbg
by Marvin Schirrmacher
Apache 2.0

ORMBG is a high-quality background remover optimized for images with humans. It was trained on synthetic images and fine-tuned on real-world images to deliver accurate and clean results. ORMBG is suitable for various applications, including photo editing, virtual try-on, and video conferencing.

High Accuracy
Light-weight
Video BG
563
36
42.26 MB
WASM

Image Classification

CLIP

CLIP Base (ViT-B/16)
CLIP Base (ViT-B/16)
ID: clip-vit-base-patch16
by OpenAI
MIT

A vision-language model that encodes images and text into a shared embedding space—optimized for zero-shot classification, cross-modal retrieval, and visual search. With finer patch granularity, Patch16 offers improved spatial detail and precision, ideal for tasks requiring nuanced visual understanding and quick inference.

Ultra-Fast
Zero-Shot Classification
Real-time Inference
890
32
145.00 MB
WebGPU
WASM
CLIP Base (ViT-B/32)
CLIP Base (ViT-B/32)
ID: clip-vit-base-patch32
by OpenAI
MIT

A vision-language model that embeds images and text into a shared representation space—well-suited for zero-shot classification, cross-modal retrieval, and scalable visual search. Patch32 provides broader visual context with faster processing, making it effective for large-scale or real-time applications with balanced performance.

Ultra-Fast
Zero-Shot Classification
Real-time Inference
261
12
146.58 MB
WebGPU
WASM

Object Detection

DETR

DEtection TRansformer (DETR) ResNet 50
DEtection TRansformer (DETR) ResNet 50
ID: detr-resnet-50
by Facebook
Apache 2.0

A powerful object detection model that uses a transformer architecture to detect objects in images, providing high accuracy and efficiency. Ideal for real-time applications and resource-constrained environments.

Ultra-Fast
Mobile Friendly
Real-Time Inference
872
34
41.11 MB
WebGPU
WASM

Stay tuned for more models!