LLM

Gemma

Gemma 3

Gemma3 270M Instruct

ID: gemma-3-270m-it

by Google •

Gemma License

A lightweight 270M parameter LLM from Google, optimized for efficiency in tasks like chatbot interactions, code assistance, and text generation directly in the browser.

Ultra-Fast

Lightweight

Chatbot

478

543.46 MB

WebGPU

WASM

Gemma3 1B Instruct

ID: gemma-3-1b-it

by Google •

gemma

Gemma3 1B Instruct is a medium-scale instruction-tuned language model from Google’s Gemma 3 family. With 1 billion parameters, it strikes a balance between efficiency and capability, making it suitable for chatbot dialogues, code assistance, and general-purpose text generation. While more powerful than compact models like Gemma3 270M, it still remains lightweight compared to larger-scale LLMs. Current WebAI support for this model is limited, with only quantized precision available on select devices (primarily WASM). Broader device and precision support may be added in future updates.

High Performance

Multi-Language

1018.38 MB

WASM

SmolLM

SmolLM V2

SmolLM V2 135M Instruct

✨ New

ID: smollm2-135m-instruct

by HuggingFace •

Apache 2.0

An ultra-lightweight and fast compact language model designed for basic reasoning and text generation tasks. Ideal for deployment on mobile and edge devices.

Ultra-Fast

Lightweight

Mobile Friendly

112.24 MB

WebGPU

WASM

SmolLM V2 360M Instruct

✨ New

ID: smollm2-360m-instruct

by HuggingFace •

Apache 2.0

With 360 million parameters, this model strikes a balance between speed and performance, making it lightweight, fast, and well-suited for a wide range of tasks on edge devices.

Fast

Lightweight

Balanced Performance

258

260.10 MB

WebGPU

WASM

SmolLM V2 1.7B Instruct

✨ New

ID: smollm2-1.7b-instruct

by HuggingFace •

Apache 2.0

The most capable model in the SmolLM family, capable of handling complex language tasks and generating high-quality text, while still being fast on edge devices.

High Performance

Knowledgeble

Fast

Structured Output

876

1.03 GB

WebGPU

SmolLM V3

SmolLM V3 3B Instruct

✨ New

ID: smollm3-3b-instruct

by HuggingFace •

Apache 2.0

SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports dual mode reasoning, 6 languages and long context.

SOTA Performance

Deep Reasoning

564

1.98 GB

WebGPU

Qwen

Qwen3

Qwen V3 0.6B

✨ New

ID: qwen3-0.6b

by QwenLM •

Apache 2.0

A performant LLM with 1 billion parameters, optimized for lightweight tasks like chatbot interactions and code assistance, balancing efficiency and capability.

High Performance

Deep Thinking

Structured Output

Fast

678

543.39 MB

WebGPU

WASM

DeepSeek

DeepSeek R1

DeepSeek R1 1.5B

ID: deepseek-r1-distill-qwen-1.5b

by DeepSeek AI •

MIT

A powerful LLM designed for reasoning-intensive tasks. It excels at code generation, essay writing, and a variety of other applications. Optimized for speed and efficiency.

High Performance

Deep Think

Multi-Language

Fast

546

1.28 GB

WebGPU

Llama

LLaMA 3.2

LLaMA 3.2 1B Instruct

ID: llama-3.2-1b-instruct

by Meta •

Llama 3.2 Community

A performant LLM with 1 billion parameters, optimized for lightweight tasks like chatbot interactions and code assistance, balancing efficiency and capability.

High Performance

Multi-Language

Fast

932

1.15 GB

WebGPU

NLP

Text Embedding

Flag Embedding

Flag Embedding Small 1.5 (English)

ID: bge-small-en-v1.5

by BAAI •

MIT

Lightweight and optimized for English, this model delivers fast, low-latency text embeddings—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

RAG

232

32.44 MB

WebGPU

WASM

Flag Embedding Small 1.5 (Chinese)

ID: bge-small-zh-v1.5

by BAAI •

MIT

Lightweight and optimized for Chinese, this model delivers fast, low-latency text embeddings—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

RAG

22.90 MB

WebGPU

WASM

Flag Embedding Base 1.5 (English)

ID: bge-base-en-v1.5

by BAAI •

MIT

Optimized for English, this model delivers fast, low-latency text embeddings with more balanced performance—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast

Balanced Performance

Mobile Friendly

RAG

212

91.53 MB

WebGPU

WASM

Flag Embedding Base 1.5 (Chinese)

ID: bge-base-zh-v1.5

by BAAI •

MIT

Optimized for English, this model delivers fast, low-latency text embeddings with more balanced performance—ideal for local RAG, semantic search, recommendations, and language understanding.

Ultra-Fast

Balanced Performance

RAG

98.10 MB

WebGPU

WASM

Flag Embedding M3 (Multi-Lingual)

ID: bge-m3

by BAAI •

MIT

Supports over 100 languages, offering fast, balanced text embeddings across multilingual content—ideal for global RAG, cross-lingual search, and multilingual understanding.

High Performance

100+ Languages

RAG

Fast

782

543.30 MB

WebGPU

WASM

Gemma 3

Gemma Embedding 300M (Multi-Lingual)

ID: gemma-embedding-300m

by Google •

Gemma License

A compact 300M parameter embedding model from Google, optimized for multilingual text embeddings with low latency—perfect for RAG, semantic search, clustering, and recommendation systems.

SOTA

Ultra-Fast

100+ Languages

RAG

1.4k

167.28 MB

WebGPU

WASM

Qwen3

Qwen3 Embedding 0.6B (Multi-Lingual)

ID: qwen3-embedding-0.6b

by QwenLM •

Apache 2.0

Compact 0.6B parameter embedding model optimized for efficiency and speed. Delivers quality text embeddings for RAG applications, semantic search, and content understanding with minimal resource requirements.

SOTA

100+ Languages

RAG

Fast

1.9k

541.17 MB

WebGPU

WASM

Jina Embeddings

Jina Embeddings V2 Small (English)

ID: jina-embeddings-v2-small-en

by Jina AI •

Apache 2.0

Designed for English, this lightweight model provides fast, efficient embeddings with strong semantic performance—ideal for local RAG, search, and retrieval tasks on resource-constrained systems.

Ultra-Fast

Mobile Friendly

Semantic Search

363

91.53 MB

WebGPU

WASM

MiniLM

All MiniLM L6 V2

ID: all-minilm-l6-v2

by HuggingFace •

Apache 2.0

One of the fastest and most efficient text embedding models, very popular on HuggingFace, ideal for semantic search, clustering, and classification tasks. It provides high-quality embeddings with low latency.

Ultra-Small

Ultra-Fast

Mobile Friendly

Semantic Search

1.2k

21.91 MB

WebGPU

WASM

Cross-encoder Reranker

Flag Reranker

Flag Reranker Base (English & Chinese)

ID: bge-reranker-base

by BAAI •

MIT

A cross-encoder reranker model that improves the ranking of documents by re-evaluating them in the context of a given query, specifically designed for English and Chinese languages.

Ultra-Fast

High Performance

RAG

1.9k

266.36 MB

WebGPU

WASM

Flag Reranker Large

ID: bge-reranker-large

by BAAI •

MIT

A cross-encoder reranker model that improves the ranking of documents by re-evaluating them in the context of a given query, specifically designed for English and Chinese languages.

Ultra-Fast

High Performance

RAG

536.86 MB

WebGPU

WASM

Sentiment Analysis

BERT

BERT Sentiment Analysis (English)

ID: distilbert-base-uncased-finetuned-sst-2-english

by HuggingFace •

Apache 2.0

Ultra-fast and lightweight BERT model fine-tuned on sst-2 for sentiment analysis in English, ideal for real-time applications with low latency requirements.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

2.3k

64.25 MB

WebGPU

WASM

RoBERTa

Financial Sentiment Analysis (English)

ID: distilroberta-finetuned-financial-news-sentiment-analysis

by Manuel Romero •

Apache 2.0

Ultra-fast and lightweight RoBERTa model fine-tuned on financial news for sentiment analysis in English, ideal for real-time applications with low latency requirements.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

Finance

78.91 MB

WebGPU

WASM

Translation

NLLB 200 Distilled 600M

ID: tsl-200-600m

by Facebook •

CC-BY-NC-4.0

A compact multilingual translation model supporting 200+ languages. Part of Meta's 'No Language Left Behind' project, this distilled 600M parameter version delivers efficient translation across diverse languages and scripts, optimized for web-based and resource-constrained environments.

196

399.70 MB

WASM

Audio

Speech Recognition

OpenAI Whisper

Whisper Tiny

ID: whisper-tiny

by OpenAI •

MIT

The smallest and fastest whisper model with suprisingly good accuracy, ideal for real-time audio transcription, note-takings in resource-constrained environments, support word-level timestamps.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

Multi-Language

338

38.94 MB

WebGPU

WASM

Whisper Tiny (English Only)

ID: whisper-tiny-en

by OpenAI •

MIT

Optimized specifically for English, this is the fastest whisper model—perfect for real-time English transcription, note-takings in resource-constrained environments, support word-level timestamps.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

225

38.94 MB

WebGPU

WASM

Whisper Base

ID: whisper-base

by OpenAI •

MIT

Delivers better accuracy than Whisper-tiny while still running at high speed — ideal for noisier audio or more demanding transcription tasks, support word-level timestamps.

Lightweight

Ultra-Fast

Balanced Accuracy

Multi-Language

2.1k

73.31 MB

WebGPU

WASM

Whisper Base (English Only)

ID: whisper-base-en

by OpenAI •

MIT

Specifically optimized for English, offering improved English transcription accuracy while still maintaining high-speed performance, support word-level timestamps.

Lightweight

Ultra-Fast

Balanced Accuracy

115

73.31 MB

WebGPU

WASM

Whisper Large V3 Turbo

ID: whisper-large-v3-turbo

by OpenAI •

MIT

The most powerful and accurate Whisper model — hardware-intensive yet still fast, delivering top-tier performance for accuracy-critical tasks like subtitles and dubbing, support word-level timestamps.

Most Powerful

Most Accurate

80+ Languages

Fast

4.7k

126

537.49 MB

WebGPU

Whisper Large V3 Fast

ID: whisper-large-v3-turbo-fast

by OpenAI •

MIT

A highly optimized Whisper model for speed — delivers top-quality transcription, but does not support word-level timestamps.

Most Powerful

Most Accurate

80+ Languages

Fast

1.4k

537.25 MB

WebGPU

Moonshine

Moonshine Tiny Real-Time (English Only)

✨ New

ID: moonshine-tiny-en

by Useful Sensors •

MIT

An ultra-lightweight, ultra-fast model for English speech — outperforms Whisper-tiny in both speed and accuracy. Ideal for real-time transcription, on-device apps, and low-power environments, do not support timestamp return for now.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

Real-Time

138

26.81 MB

WebGPU

WASM

Moonshine Base Real-Time (English Only)

✨ New

ID: moonshine-base-en

by Useful Sensors •

MIT

High performance and fast English ASR model. Ideal for real-time transcription, on-device apps, and low-power environments, do not support timestamp return for now.

Ultra-Fast

High Performance

Mobile Friendly

Real-Time

60.00 MB

WebGPU

WASM

Synhetic Speech

Kokoro TTS

Kokoro TTS V1 82M (English)

✨ New

ID: kokoro-tts-82m

by Hexgrad •

Apache 2.0

High performance text-to-speech model with fast inference speed, supports English language with different voices and accents, ideal for real-time applications.

High Performance

Ultra-Fast

Real-Time

Multi-Voices

1.7k

88.08 MB

WebGPU

WASM

Image

Background Remover

Portrait BG Remover - MODNet

ID: rmbg-modnet

by Zhanghan Ke et al. •

Apache 2.0

An ultra-lightweight, real-time matting model designed to separate foregrounds from backgrounds in images and videos—ideal for background removal, virtual try-on, and video conferencing with limited compute.

Ultra-Fast

Ultra-Lightweight

Mobile Friendly

Video BG

1.3k

6.32 MB

WebGPU

WASM

BG Remover - BEN2

✨ New

ID: rmbg-ben2

by PramaLLC •

MIT

BEN2 (Background Erase Network) is a fast and high-quality background remover that quickly separates people or objects from any background. It delivers clean, accurate results in real time, making it perfect for videos, photos, and apps where speed and visual quality matter.

Top Accuracy

Ultra-Fast

Video BG

604

223.85 MB

WebGPU

BG Remover - ORMBG

✨ New

ID: rmbg-ormbg

by Marvin Schirrmacher •

Apache 2.0

ORMBG is a high-quality background remover optimized for images with humans. It was trained on synthetic images and fine-tuned on real-world images to deliver accurate and clean results. ORMBG is suitable for various applications, including photo editing, virtual try-on, and video conferencing.

High Accuracy

Light-weight

Video BG

563

42.26 MB

WASM

BG Remover - BriaAI RMBG 1.4

ID: rmbg-briaai-1.4

by BriaAI •

CC-BY-NC-4.0

BriaAI RMBG 1.4 is a fast and accurate background remover that cleanly separates people or objects from any background. It delivers precise, production-quality results in real time, making it perfect for e-commerce, photo editing, videos, and apps where speed and visual quality matter.

High Accuracy

Ultra-Fast

Video BG

42.35 MB

WebGPU

WASM

Image Classification

CLIP

CLIP Base (ViT-B/16)

ID: clip-vit-base-patch16

by OpenAI •

MIT

A vision-language model that encodes images and text into a shared embedding space—optimized for zero-shot classification, cross-modal retrieval, and visual search. With finer patch granularity, Patch16 offers improved spatial detail and precision, ideal for tasks requiring nuanced visual understanding and quick inference.

Ultra-Fast

Zero-Shot Classification

Real-time Inference

890

145.00 MB

WebGPU

WASM

CLIP Base (ViT-B/32)

ID: clip-vit-base-patch32

by OpenAI •

MIT

A vision-language model that embeds images and text into a shared representation space—well-suited for zero-shot classification, cross-modal retrieval, and scalable visual search. Patch32 provides broader visual context with faster processing, making it effective for large-scale or real-time applications with balanced performance.

Ultra-Fast

Zero-Shot Classification

Real-time Inference

261

146.58 MB

WebGPU

WASM

Object Detection

DETR

DEtection TRansformer (DETR) ResNet 50

ID: detr-resnet-50

by Facebook •

Apache 2.0

A powerful object detection model that uses a transformer architecture to detect objects in images, providing high accuracy and efficiency. Ideal for real-time applications and resource-constrained environments.

Ultra-Fast

Mobile Friendly

Real-Time Inference

872

41.11 MB

WebGPU

WASM

Stay tuned for more models!