STT (Speech-to-Text)
293 repos
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Robust Speech Recognition via Large-Scale Weak Supervision
Port of OpenAI's Whisper model in C/C++
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
Distribute and run LLMs with a single file.
The open-source AI voice studio. Clone, dictate, create.
Faster Whisper transcription with CTranslate2
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A free, open source, and extensible speech-to-text application that works completely offline.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees.
🧠 Leon is your open-source personal assistant.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
kaldi-asr/kaldi is the official location of the Kaldi project.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
AI that sees your screen, listens to your conversations and tells you what to do
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
A PyTorch-based Speech Toolkit
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows.
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Simultaneous speech-to-text models
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Multilingual Voice Understanding Model
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Silero Models: pre-trained text-to-speech models made embarrassingly simple
Transcribe on your own!
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
On-device wake word detection powered by deep learning
Low-latency AI engine for mobile devices & wearables
Build local voice agents with open-source models
An Open Source text-to-speech system built by inverting Whisper.
Mac app for crushing tech interviews with AI
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
A nearly-live implementation of OpenAI's Whisper.
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Whisper realtime streaming for long speech-to-text transcription and translation
🤖 A Telegram bot that integrates with OpenAI's official ChatGPT APIs to provide answers, written in Python
An open source AI wearable device that captures what you say and hear in the real world and then transcribes and stores it on your own server. You can then chat with Adeus using the app, and it will have all the right context about what you want to talk about - a truly personalized, personal AI.
Instantly generate AI-powered subtitles on your device. Works standalone or connects to DaVinci Resolve.
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Real time transcription with OpenAI Whisper.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
A Web UI for easy subtitle using whisper model.
Self-hosted AI audio transcription
Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.
🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI
Automatically generate and overlay subtitles for any video.
:microphone: React Native Voice Recognition library for iOS and Android (Online and Offline Support)
Record voice notes & transcribe, summarize, and get tasks
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Whisper as a Service (GUI and API with queuing for OpenAI Whisper)
WhisperPlus: Faster, Smarter, and More Capable 🚀
Voice activity detector (VAD) for the browser with a simple API
Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)
Simple, hackable offline speech to text - using the VOSK-API.
Cross-Platform, GPU Accelerated Whisper 🏎️
Pure C inference of Mistral Voxtral Realtime 4B speech to text model
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
OBS plugin for local speech recognition and captioning using AI
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
SALMONN family: A suite of advanced multi-modal LLMs
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
A free & open tool for transcribing audio interviews
🎙️ AI Dictation App - Open Source and Local-first ⚡ Type 3x faster, no keyboard needed. 🆓 Powered by open source models, works offline, fast and accurate.
Natural (2-way) voice conversations with Claude Code
A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.
A GUI tool for offline transcription of speech recordings, including speaker diarization, utilizing state-of-the-art machine learning models.
Native speech-to-text for Linux - Fast, accurate and private system-wide dictation
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models
Whisper.net. Speech to text made simple using Whisper Models
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Open source voice dictation technology
Optimized Whisper models for streaming and on-device use
Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
Fully local, private and cross platform Speech-to-Text with LLM Post-processing
A 100% private AI voice transcription app that converts speech to text in 100+ languages. Built with Compose Multiplatform for Android & iOS using Whisper AI - no cloud uploads, all processing happens on-device for complete privacy.
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
Conversational voice AI agents
Real-time transcription using faster-whisper
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
Android Input Method Editor (IME) based on Whisper
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
Take notes with your voice & transform them with AI
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
End-to-end platform for building voice first multimodal agents
This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.
🎙️ Speak with AI - Run locally using Ollama, OpenAI, Anthropic or xAI - Speech uses SparkTTS, OpenAI, ElevenLabs, Kokoro, Typecast or xAI
The best way to use AI is on your own computer. Use local or paid API models, and ctrl+k to show/hide the chat UI. Experience the future of AI, and help build it too!
A simple GUI to use Whisper.
A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.
Free on-device web app for audio transcribing and rendering subtitles
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what you said!
Wyoming protocol server for faster whisper speech to text system
A lightweight Python package for Automatic Speech Recognition using ONNX models
AudioBench: A Universal Benchmark for Audio Large Language Models
Android Input Method Editor (IME) based on RTranslators Whisper implementation
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
Private voice keyboard, AI chat, images, webcam, recordings, voice control with >= 4 GiB of VRAM.
Free, open-source, 100% offline voice dictation for Linux. Speak and type anywhere via whisper.cpp, Whisper & VOSK engines, GPU-accelerated, works on X11 + Wayland!
See where Claude Code is burning tokens - turn raw JSONL transcripts into local cost analytics, hotspot views, and session-level usage insight.
Get started using Deepgram's Live Transcription with this Next.js demo app
VOXD is a speech-to-text, voice-typing, dictation software for linux distributions. It is an open-source, free of charge, USER-FRIENDLY software, for as many linux distros as possible.
Transcribe audio and add subtitles to videos using Whisper in ComfyUI
🎬 Auto-subtitle videos with AI transcription, translation, voice cloning, professional rendering, background image and music generator
On-device Speech Recognition for Android
A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.
A powerful Whisper AI keyboard for reliable speech transcription
workbench for learning and practicing on-device AI technology in real scenario with online-TV on Android phone, powered by ggml(llama.cpp,whisper.cpp...) and FFmpeg and opencv-mobile
PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file
Packages whisper.cpp into pre-built, pip-installable wheels, for macOS and Linux.
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak with local LLMs via llama.cpp.
Modern GUI application that transcribes and translate audio files using OpenAI Whisper.
Like ChatGPT's voice conversations with an AI, but entirely offline/private/trade-secret-friendly, using local AI models such as LLama 2 and Whisper
Open models for Coqui STT
MeetEval - A meeting transcription evaluation toolkit
State-of-the-art offline (or networked) voice typing everywhere + text terminals (Linux or WFL session on Windows.) with a simple bash script. Usable with X. Does not require X.
An AI prompt project that uses AI to extract wisdom from all sorts of text, from podcast transcripts, conversations, talks, lectures, papers, articles, blog posts, essays, presentations, or whatever you can get into text form.
Use Home Assistant Assist on the desktop. Compatible with Windows, MacOS, and Linux
Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive
💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisper on CPU, Nvidia GPU and Apple MLX.
Wayland Speech-to-Text Tool - A minimal signal-driven speech-to-text tool for Wayland environments with PipeWire audio
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Offline voice input panel & keyboard with punctuation for Android.
Gnome shell extension for accurate OFFLINE speech to text input in Linux using whisper.cpp. Input text from speech anywhere.
Curated list of open-source speech-to-text and voice typing tools for Linux, macOS, Windows, Android, and iOS. Offline, local, and cloud.
WhisperSubs is a mpv lua script to generate subtitles at runtime with whisper.cpp on Linux
Speak to AI • Native Linux Speech-to-Text (STT) • Offline, Privacy-Focused
Create subtitles in various languages in mere minutes using Whisper and Qwen3-32b via Groq's lightning-fast inference.
End-to-end workflow to automatically generate show notes from audio/video transcripts
Voice typing for the Linux desktop.
A curated list of awesome disfluency detection publications along with the released code and bibliographical information
קול — Professional Transcription Studio. Hebrew-first, 4 engines, YouTube support, correction studio.
Real-time speech recognition & AI-powered note-taking app for macOS with offline/online modes, multilingual transcription, and Japanese translation support.
Hebrew whisper powerful transcription and translation tool
A very simple whsper Python FastAPI for OpenAI API, Android voice-typing (konele), Home Assistant (wyoming), and a voice-typing script on Linux and MacOS!
Record, transcribe, and transform voice notes into structured insights. Leverage Whisper or AssemblyAI and ChatGPT to fill in gaps, generate summaries, and visualize ideas — all seamlessly integrated within Obsidian.
This is a python script using whisper to type with your voice
Simple GUI around whisper.cpp for voice-to-text on Linux
Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.
a comfyui cuatom node for audio subtitling based on whisperX and translators
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
Wyoming protocol server for the Whisper API speech to text system
Desktop application for Linux and Windows that utilizes distil-whisper models from HuggingFace, to enable real-time offline speech-to-text dictation.
Benchmarking STT service TTFB and semantic WER for real-time AI applications
A lightweight transcript editor for editing and correcting STT generated timed transcripts
Super STT enables effortless voice-to-text in any application, using the most advanced speech models.
🎯 AI-powered voice assistant for TickTick, enabling natural language task management through speech. Built with OpenAI's speech recognition and TickTick's API integration, this assistant helps you manage your todos hands-free - create tasks, set reminders, and organize your schedule using just your voice.
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
A node module to generate subtitles by segmenting a list of time-coded text - BBC News Labs
An MCP Server for audio transcription using OpenAI
A flutter library for offline speech-to-text conversion which use whisper.cpp models implementation for Android、iOS、macOS.
This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.
insanely-fast-whisper with support for AMD GPU's with rocm 6.1 - 7.1
VoiceTyper-Pro is an advanced speech-to-text dictation tool built with Python and powered by the Deepgram API. Alternative to Mac Whisper, Voice Access, and other voice typing tools.
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
Speech-to-text typing for Linux/Wayland using Whisper.
Linux Voice Assistant for to Make Your Work Easier
A real-time, offline voice assistant for Linux and Raspberry Pi. Uses local LLMs (via Ollama), speech-to-text (Vosk), and text-to-speech (Piper) for fast, wake-free voice interaction. No cloud. No APIs. Just Python, a mic, and your voice.
audiov is a speech-to-text, voice-typing, dictation software for linux distributions.
Faster whisper Running on AMD GPUs with modified CTranslate 2 Libraries served up with Wyoming protocol
A fully local, open-source voice-to-text tool that acts as a system-wide AI dictation layer, converting speech into clean, formatted text.
Speech-to-text, text-to-speech with ElevenLabs
SuperWhisper-like voice dictation for Linux with waveform UI
Claude Code Skills for podcast/video editing: transcription, content editing, rough/fine cut, final polish
Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using Whisper.
AgenticSeek is a fully local, voice-enabled AI assistant designed to autonomously browse the web, write code, and plan tasks while ensuring complete privacy by keeping all data on your device. Tailored for local reasoning models, it runs entirely on your hardware, eliminating any cloud dependency.
Dockerized Whisper C++ speech-to-text API for easy deployment and rapid integration. Offering the latest stable and nightly builds for efficient audio transcription.
Effortless Push-to-Talk Transcription, Anywhere.
A cross-platform desktop application that records audio and transcribes it to text using OpenAI's Whisper API or compatible services. Perfect for dictation, note-taking, and accessibility.
sherpa-onnx Go package for speech recognition without network access, supporting Linux, macOS, Windows
WhisperX-powered voice transcription tool that types text directly at your cursor position. Hold F9 to record, release to transcribe.
🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.
A curated list of voice AI agent frameworks, tools, resources, and best practices
A stand-alone application with GUI for OpenAI's Whisper
Speech-to-text for Linux that just works
This repository contains code for fine-tuning the Whisper speech-to-text model.
Prompt Management System for Interaction with the ChatGPT API
Handy voice dictation using whisper.
Privacy‑first, real‑time speech‑to‑text dictation. 100% local inference in Rust; hotkey to dictate anywhere (macOS, Linux, Windows).
Automatically create subtitles for any video using google speech to text cloud api.
The only tool that replays Claude, Codex, Cursor, AND Gemini AI coding sessions in one unified UI. Vibe coding companion for reviewing, searching, and sharing your AI pair programming transcripts.
An interactive AI voice agent that can capture and transcribe speech in real-time, generate intelligent responses using the DeepSeek R1 (7B model) AI, and convert the responses back to natural speech for immediate playback. The agent maintains conversation context and supports cross-platform usage on macOS, Linux, and Windows.
The best Android keyboard for offline speech recognition, using OpenAI's whisper model through whisper.cpp for fast and accurate output.
Turn CAPSLOCK key into Dictation Key
RunPod Serverless worker for WhisperX
A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.
A Deepgram client for Dart and Flutter, supporting all Speech-to-Text and Text-to-Speech features on every platform.
A lightweight library for normalizing speech transcripts before computing WER
Sonori is a fully local STT app for Linux (Wayland).
Real-time voice input software using the Whisper model.
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.
A small script that types what you say using whisper while holding a hotkey
expands the boundaries of speech recognition technology for documentation productivity on the Linux PC. With dictation and transcription capabilities as well as control over your system written in Python using whisper.
Linux-based voice-to-text tool using AI (Whisper/DeepGram) for real-time speech transcription. Command-line interface for easy recording, processing, and text output. Ideal for accessibility, dictation, and hands-free text input in Linux environments.
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
Privacy-first meeting transcription and voice-to-text tool for Linux. 100% local AI processing with faster-whisper and Ollama.
Node.js app that transcribes WhatsApp voice notes to text using OpenAI's Whisper API. The text can also be translated to the user's preferred language and sent back to their WhatsApp account.
Linux virtual keyboard driver which types what you say using Deepgram Flux STT API
Chrome extension that allows dictating anywhere using OpenAI Whisper
Privacy-first voice dictation for Linux Wayland — press a key to talk, release to type. Powered by Whisper AI, 100% offline, no subscription required.
Cross-platform voice-to-text dictation for Linux and macOS. Local/private STT using Parakeet-TDT 1.1B with NVIDIA CUDA or Apple CoreML acceleration.
Real-time AI dictation using faster-whisper—type anywhere with instant, accurate speech-to-text conversion.
A voice recording and transcription tool for Hyprland, using Whisper for speech-to-text and copying results to clipboard. It's using Faster Whisper (optimized for CPU) and runs fully locally.
Voice to Text Online Notepad Professional, Accurate & Free Speech Recognition Text Editor Distraction-Free, Fast, Easy to Use Web App for Dictation & Typing
This is my custom scripts to use Whisper / OpenAI by keyboard shortcuts and voice input.
Whisper Flutter Example Speech To Text Offline Android Linux Without Api Key Without FFMPEG
An audio/video transcriber with diarization and transcription editing.
Fast, accurate voice typing for Linux — IBus input method engine with streaming STT, Whisper refinement, and CUDA acceleration
A fully local, offline first speech-to-text application made for Linux!
A MCP server that provides audio transcription capabilities using OpenAI's Whisper API
A dictation application on linux using openai's whisper. Currently only used on KDE wayland.
Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing
Convert audio files (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm) to SRT subtitles with OpenAI Whisper. Easy script for fast, accurate transcription.
Transform your voice into text effortlessly with Whisper Notes
Kivywhisper is a cross platform Python GUI for OpenAI's Whisper.
Wisper - Voice dictation app for Linux. Type directly at cursor with AI-powered transcription.
Add voice-to-text capabilities to Claude Code using OpenAI Whisper for speech recognition.
One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, then paste. Never lose your audio files anymore!
A Linux first system-wide dictation tool to transcribe Speech To Text (STT) . Super accurate, fast, and free.
Fine-tuned whisper that transcribe Hebrew audio into IPA
Pure C wrapper library to use Whisper.cpp with Linux and Windows as simple as possible.
The ultimate PyQt6 application that integrates the power of OpenAI, Google Gemini, Claude, and other open-source AI models
A voice transcription tool using faster-whisper that records audio and converts speech to text on Linux systems.
A powerful, real-time dictation system for Linux
Browser-Based AI Assistant: Speech-to-Text with Whisper and Local AI Answers
Local realtime transcription tool powered by Voxtral Mini
TalkType is a cross-platform application built with Electron, supporting Windows, macOS, and Linux. By combining Automatic Speech Recognition (ASR) with Large Language Models (LLM), it goes beyond simple dictation to offer "Understanding", "Polishing", and "Q&A" capabilities — your all-in-one voice writing assistant.
Rudimentary program for speech transcription, manipulation, and redaction.
AudioWrite: Effortless voice dictation powered by Google's Gemini API. Record, transcribe, and transform rambling audio into polished, multi-language notes. PWA ready.
A modern, lightweight note-taking app powered by Whisper
🎙️ Lightning-fast voice dictation Desktop Web App powered by Groq's Whisper Turbo - Open-source, privacy-first, with real-time audio visualization and intuitive click controls
A Linux / Gnome dictation app which uses fast whisper to do speach to text.
Real-Time Transcription System for Niri - MacOS-like dictation for Linux Wayland environments
Twidi Speech To Text (openai, push to talk, linux, wayland, deepgram)
Voice dictation for Linux/Wayland (like wisprflow). 100% offline, GPU-accelerated, and actually works with Wayland compositors.
🗣️ Whispers Talk. Recall. Repeat. A blazing-fast voice journal that remembers everything you say — searchable with AI. ✨ What is Whispers? Whispers is a voice-first journaling app powered by: 🧠 <300ms Latency Streaming Transcription (AssemblyAI) 🔍 Algolia MCP for instant search of your thoughts
An offline-first desktop app to automatically transcribe and edit video subtitles using OpenAI Whisper. Full control over text, timing, and advanced styling in a powerful, intuitive editor.
WhisperVoice: Covert voice notes. Encrypts text and hides it via LLM-generated acrostic sentences. Murf.ai creates natural audio. Browser extension decrypts with passcode, revealing hidden message or playing decoy for unauthorized listeners. Uses LLM, Murf.ai, STT APIs
Press F9. Speak. Paste. A blazing-fast, offline voice transcription tool for Linux using Whisper.cpp, bound to a global hotkey.
Speech-to-Text/Code using a fast local LLM, for Linux, uses Whisper
Voice input tool for Ubuntu 25.04 with Wayland. Record speech with hotkey, transcribe via Nexara API, and copy to clipboard.
A fully offline, high-performance, streaming speech-to-text tool for developers on Linux.
GPU-accelerated speech-to-text service that types what you say, powered by OpenAI's Whisper AI
type 10x faster with ai assisted voice typing
Your voice - VocalFlow dictation, harnessing Whisper and faster-whisper for real-time transcription, adaptive learning, and NLP. Built with Python, it spans Linux, Windows, and macOS, boosting productivity through voice-assisted workflows.
A user-friendly voice dictation application for Linux that supports multiple languages.
Real-time desktop audio transcription using OpenAI Whisper for Arch Linux with CUDA acceleration
A powerful audio transcription server that seamlessly transcribes meeting recordings, generates notes, and intelligently splits audio files for efficient management. Open-source and built with FastMCP and Groq/OpenAI Whisper
MCP server for real-time audio transcription using OpenAI Whisper
Simple Python Tkinter GUI App for linux that uses whisper from openai for transcription.
A local, real-time speech-to-text (STT) input tool for Linux, powered by RealtimeSTT and Faster-Whisper. Press a hotkey to dictate directly into any application.
A 100% private AI voice transcription app that converts speech to text in 50+ languages. Built with Compose Multiplatform for Android using Whisper AI - no cloud uploads, all processing happens on-device for complete privacy.
Push-to-talk voice dictation for Linux. Record with PipeWire, transcribe locally via whisper.cpp, and type text into any app using ydotool. Fast, private, and works system-wide with a single hotkey.
Open-Source Speech-to-Text Evaluation Framework
A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.
A push-to-talk wisper-flow service to support voice-base vibe-coding with claude
Speech to text for linux using whisper
A fast, lightweight Linux tool that converts speech to text and types it into any window using OpenAI's Whisper API.
Voice-to-text input daemon for Linux using OpenAI Whisper
Offline Voice Dictation & Text Enhancement A lightweight, 100% local Linux tool for real-time voice‑to‑text transcription and LLM‑powered writing improvements.
from microphone directly to your app
macOS-style dictation for Ubuntu using Whisper. Press double-Ctrl, speak, and your words are transcribed to text locally with faster-whisper. Supports clipboard output, customizable hotkeys, and offline models for speed and privacy.
Linux Live Dictation - Real-time speech-to-text with Whisper
Linux voice transcription with hotkey using faster-whisper (local) with optional GPT-4o mini polishing
Linux log interpreter using AI
Langflow-based LLM agent that keeps track of my personal projects. Based on integration with WhatsApp voice messages, Whisper, OpenAI/Mistral models and local MCP.
A Model Context Protocol (MCP) server that provides ASR(Automatic Speech Recognition) capabilities using the whisper engine. This server exposes TTS functionality through MCP tools, making it easy to integrate speech synthesis into your applications.
Whisper + TTS + As many MCP servers as I can stuff in
Blazingly fast audio transcription MCP server using Whisper with Flash Attention 2
mcp server for whisper-cli
App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.
Automation of Whisper fine tuning using ClearML
Desktop version of Whisper API called program, meant for quick, decent ASR for Linux and Windows.
Transcribe text using whisper.cpp on linux with a key combo & auto-type it
Sample implementation to whisperai for Linux with real time transcription
A Linux utility that provides system-wide speech-to-text functionality by connecting to a remote Whisper API server.