TTS (Text-to-Speech)

109 repos
Web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
★ 62,969Pythonupdated 2026-04-20agentdeepseekfine-tuninggemmagemma3
Clone a voice in 5 seconds to generate arbitrary speech in real-time
★ 59,641Pythonupdated 2026-03-09deep-learningpythonpytorchtensorflowtts
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
★ 56,969Pythonupdated 2026-04-19text-to-speechttsvitsvoice-clonevoice-cloneai
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
★ 45,835Goupdated 2026-04-20agentsaiapiaudio-generationdecentralized
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
★ 45,178Pythonupdated 2024-08-16deep-learningglow-ttshifiganmelganmulti-speaker-tts
A generative speech model for daily dialogue.
★ 39,153Pythonupdated 2026-04-10agentchatchatgptchatttschinese
Instant voice cloning by MIT and MyShell. Audio foundation model.
★ 36,336Pythonupdated 2025-04-19text-to-speechttsvoice-clonezero-shot-tts
SOTA Open Source TTS
★ 29,936Pythonupdated 2026-04-06llamatransformerttsvallevits
SoTA open-source TTS
★ 24,474Pythonupdated 2026-03-26
The open-source AI voice studio. Clone, dictate, create.
★ 23,403TypeScriptupdated 2026-04-20aicudamlxqwen3-ttsqwen3-tts-ui
From the team behind Gatsby, Mastra is a framework for building AI-powered applications and agents with a modern TypeScript stack.
★ 23,325TypeScriptupdated 2026-04-20agentsaichatbotsevalsjavascript
A TTS model capable of generating ultra-realistic dialogue in one pass.
★ 19,284Pythonupdated 2025-11-19aiopen-weighttext-to-speech
🧠 Leon is your open-source personal assistant.
★ 17,188TypeScriptupdated 2026-04-19aiai-assistantartificial-intelligenceassistantautomation
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
★ 17,126Pythonupdated 2026-04-20asrdeeplearninggenerative-aimachine-translationneural-networks
A multi-voice TTS system trained with an emphasis on quality
★ 14,844Jupyter Notebookupdated 2024-11-19
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
★ 14,380Pythonupdated 2026-04-20
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
★ 12,594Pythonupdated 2026-04-15asrcode-switchconformerkwspunctuation-restoration
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
★ 11,826C++updated 2026-04-20aarch64androidarm32asrcpp
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.
★ 10,962Pythonupdated 2026-03-17
A fast, local neural text to speech system
★ 10,858C++updated 2025-08-26speech-synthesistext-to-speechtts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
★ 10,696Pythonupdated 2026-03-22speech-synthesistext-to-speechtts
AI Agent Engineering Platform built on an Open Source TypeScript AI Agent Framework
★ 8,483TypeScriptupdated 2026-04-15agentsaiai-agentsai-agents-frameworkaiagentframework
Zero-Shot Speech Editing and Text-to-Speech in the Wild
★ 8,476Jupyter Notebookupdated 2025-03-15
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
★ 8,470Pythonupdated 2024-08-13aideep-learningemotionemotivoicemulti-speaker
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
★ 7,857Cupdated 2026-04-20intent-recognitionsttttsvoicevoice-recognition
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
★ 7,688Pythonupdated 2026-04-17agentic-aiagentsaiai-agentsrealtime
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
★ 6,708Pythonupdated 2025-12-05audiobookfaster-whispergradiokaraokepodcasts
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
★ 6,243Pythonupdated 2024-08-10adversarial-trainingdeep-learningdiffusion-modelsganlatent-diffusion
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
★ 6,167Pythonupdated 2025-06-04aiaudio-generationdeep-learningfoundation-modelsgpt
Towards Human-Sounding Speech
★ 6,108Pythonupdated 2025-12-05llmrealtimetts
Silero Models: pre-trained text-to-speech models made embarrassingly simple
★ 5,888Jupyter Notebookupdated 2026-04-16armenianazerbaijanibelaruscolabgeorgian
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
★ 5,820TypeScriptupdated 2026-04-20chatgptchatgpt-apideep-learningfew-shot-learninggpt
250+ Fine-tuning & RL Notebooks for text, vision, audio, embedding, TTS models.
★ 5,279Jupyter Notebookupdated 2026-04-18unsloth
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
★ 4,764Pythonupdated 2026-01-04fastapihuggingface-spaceskokorokokoro-ttsonnx
An Open Source text-to-speech system built by inverting Whisper.
★ 4,595Jupyter Notebookupdated 2025-12-14pytorchspeech-synthesistts
High-Quality Voice Cloning TTS for 600+ Languages
★ 4,292Pythonupdated 2026-04-20
A nearly-live implementation of OpenAI's Whisper.
★ 3,984Pythonupdated 2026-04-16dictationobsopenaiopenvinoopenvino-intel
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.
★ 3,247Pythonupdated 2026-04-24agentagentic-aiaiclaudecopilot
A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!
★ 3,094TypeScriptupdated 2026-04-19ace-stepaiaudio-generationcosyvoicegenerative-ai
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
★ 2,845C++updated 2026-01-22cppcsharpgoiosjava
TTS with kokoro and onnx runtime
★ 2,500Pythonupdated 2026-01-30kokoroonnxruntimepythontts
AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
★ 2,331HTMLupdated 2026-01-09
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
★ 2,159updated 2024-06-06audio-datasetaudio-datasetsdatadatasetdatasets
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
★ 1,492TypeScriptupdated 2025-07-23aiassistant-chat-botscomputer-visionllmspeech-recognition
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
★ 1,466Pythonupdated 2026-02-18ai-audioai-ttsai-voiceai-voice-cloneai-voice-clonining
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
★ 1,443Pythonupdated 2026-04-15image-generationimage-to-videomcpmcp-servermcp-tools
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
★ 1,442C++updated 2026-04-15asrflatpak-applicationslinux-desktopmachine-translationnmt
A Python/Pytorch app for easily synthesising human voices
★ 1,440Pythonupdated 2024-12-02deep-learningpythonpytorchtacotron2text-to-speech
Run local LLMs like llama, deepseek-distill, kokoro and more inside your browser
★ 1,404TypeScriptupdated 2026-04-22agentsaillamallmllm-inference
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
★ 1,314Pythonupdated 2026-02-10aiai-artartasset-generatorchatbot
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
★ 1,187Pythonupdated 2026-04-02aiapi-serveraudio-generationchatterboxchatterbox-tts
Natural (2-way) voice conversations with Claude Code
★ 1,125Pythonupdated 2026-04-19anthropicasrclaudeclaudecodekokoro
A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.
★ 1,119Pythonupdated 2025-11-22
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
★ 913Pythonupdated 2026-04-09audio-editingcross-lingualemotion-controlparalinguisticsreinforcement-learning
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
★ 891Pythonupdated 2026-04-17ai-audioaudioaudio-editingaudio-generationaudio-processing
Inworld TTS
★ 716Pythonupdated 2026-04-14
High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs.
★ 686Pythonupdated 2025-07-05ai-text-to-speechai-voiceartificial-intelligence
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
★ 650C++updated 2026-03-18androidasrautomatic-speech-recognitionembeddedmobile
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
★ 647Pythonupdated 2026-04-27agent-skillsai-videobilibiliclaude-codeclaude-code-skill
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
★ 551Pythonupdated 2025-04-21audiobookaudiobook-creatoraudiobook-makeraudiobookscustomtkinterprojects
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
★ 548HTMLupdated 2025-04-04aigclarge-language-modelslarge-vision-language-modelsllmlvlm
Run Orpheus 3B Locally With LM Studio
★ 535Pythonupdated 2025-03-20aipythontext-to-speechtts
🎙️ Speak with AI - Run locally using Ollama, OpenAI, Anthropic or xAI - Speech uses SparkTTS, OpenAI, ElevenLabs, Kokoro, Typecast or xAI
★ 427Pythonupdated 2026-04-14ai-speechai-voiceai-voice-agentanthropic-claudeconversational-ai
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.
★ 379Kotlinupdated 2026-03-15ai-personasandroidgguf-modelsjetpack-composekotlin
VoxNovel: generate audiobooks giving each character a different voice actor.
★ 360Pythonupdated 2025-06-08audiobook-creatoraudiobooksbooknlpepubgenerative-ai
Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)
★ 353Pythonupdated 2025-04-10
EaseVoice Trainer is a simple and user-friendly voice cloning and speech model trainer.
★ 351Pythonupdated 2025-04-23speech-trainingtext-to-speechttsvoice-cloning
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
★ 295Pythonupdated 2025-07-09assistanthacktoberfesthome-automationiotjarvis
A local implementation of the Kokoro Text-to-Speech model, featuring dynamic module loading, automatic dependency management, and a web interface.
★ 293Pythonupdated 2026-03-07
AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.
★ 284TypeScriptupdated 2026-04-27ai-sdkai-videoclaude-codecursorelevenlabs
🎬 Auto-subtitle videos with AI transcription, translation, voice cloning, professional rendering, background image and music generator
★ 215JavaScriptupdated 2026-03-13
Blueprint by Mozilla.ai for generating podcasts from documents using local AI
★ 175Pythonupdated 2026-04-13local-aipodcasttext-to-speechtext-to-text
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak with local LLMs via llama.cpp.
★ 172Shellupdated 2025-07-25accessibilityaibloat-freechatbotcli
Like ChatGPT's voice conversations with an AI, but entirely offline/private/trade-secret-friendly, using local AI models such as LLama 2 and Whisper
★ 161Pythonupdated 2024-08-20ai-assistantai-assistantsandroiddesktopkoboldai
Automatically generate engaging AI podcasts from nothing but an episode title.
★ 144Pythonupdated 2025-07-28anthropicelevenlabsgeminilangchainllm
Use Home Assistant Assist on the desktop. Compatible with Windows, MacOS, and Linux
★ 132Svelteupdated 2026-01-15assistcross-platformdesktophome-assistanthome-assistant-assist
A simple to use python library for creating podcasts with support for many LLM and TTS providers
★ 113Pythonupdated 2026-03-03
The official implementation of "A Language Modeling Approach to Diacritic-Free Hebrew TTS"
★ 109Pythonupdated 2025-06-12aihebrewslmstts
ComfyUI Chatterbox TTS & Voice Conversion Node
★ 93Pythonupdated 2025-08-21
AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI API. This project leverages advanced text-to-speech technology to create dynamic, multi-speaker conversations with customizable voices.
★ 55Pythonupdated 2024-12-16gemini-2-0-flash-expgenerative-language-apigoogle-aigoogleapis
OpenAI-compatible TTS API that unifies multiple backends with smart chunking for unlimited-length generation
★ 49Pythonupdated 2025-12-08
Installation script for an AI applications using ROCm on Linux.
★ 45Shellupdated 2026-04-183daiamdamdgpuaudio
Mission to create a Hebrew TTS model as powerful and user-friendly as WaveNet
★ 40Pythonupdated 2025-01-05hebrewisraelpytorchtts
A real-time, offline voice assistant for Linux and Raspberry Pi. Uses local LLMs (via Ollama), speech-to-text (Vosk), and text-to-speech (Piper) for fast, wake-free voice interaction. No cloud. No APIs. Just Python, a mic, and your voice.
★ 37Pythonupdated 2026-04-20androidchatbotdeep-learningechoesp-idf
Speech-to-text, text-to-speech with ElevenLabs
★ 35Pythonupdated 2023-12-21elevenlabspyside6pytorchspeech-to-texttext-to-speeh
Langchain Voice Agent with Inworld TTS
★ 34TypeScriptupdated 2026-04-16
Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using Whisper.
★ 30Pythonupdated 2023-05-27
AgenticSeek is a fully local, voice-enabled AI assistant designed to autonomously browse the web, write code, and plan tasks while ensuring complete privacy by keeping all data on your device. Tailored for local reasoning models, it runs entirely on your hardware, eliminating any cloud dependency.
★ 30Pythonupdated 2025-08-27ai-agentsai-assistantautonomous-web-browsingchromedrivercoding-assistance
A curated list of voice AI agent frameworks, tools, resources, and best practices
★ 25updated 2026-04-06agentsrealtime-chatsttttsvad
Aivis Voice Model File (.aivm/.aivmx) Utility Library
★ 25Pythonupdated 2025-10-17aivis-projectaivmonnxpythonsafetensors
Text-to-speech plugin for Claude Code — multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly, Azure, Kitten, local system TTS) on macOS, Linux, and Windows
★ 24Shellupdated 2026-04-15accessibilityaudioclaude-codeclaude-code-pluginelevenlabs
Claude Code Changelog Tracker with AI analysis, TTS, and email notifications
★ 21TypeScriptupdated 2026-01-09
A Deepgram client for Dart and Flutter, supporting all Speech-to-Text and Text-to-Speech features on every platform.
★ 19Dartupdated 2025-09-12dartdeepgram-apiflutteropen-sourcesdk
★ 15Pythonupdated 2025-09-10g2phebrewtts
Chrome extension that allows dictating anywhere using OpenAI Whisper
★ 13JavaScriptupdated 2023-09-29chrome-extensiondictationopenaiopenai-apitext-to-speech
🔊 Intelligent voice notifications for Claude Code using ElevenLabs TTS
★ 12Shellupdated 2025-08-12
Xiaomi Mimo TTS Custom Component for Home Assistant
★ 9Pythonupdated 2026-03-21
OpenClaw TTS Provider for Xiaomi MiMo (mimo-v2-tts)
★ 8Shellupdated 2026-03-22mimomimo-v2-ttsopenclaw-skillsttsxiaomi
Voice-cloned smart attention TTS notifications for Claude Code. AI summarizes deep work session responses, speaks in your cloned voice. MLX Chatterbox Turbo on Apple Silicon. Zero config, works out of the box.
★ 8Pythonupdated 2026-01-15apple-siliconchatterboxclaude-codeclaude-code-pluginmlx
A command line utility to easily finetune XTTS models in a fully automated way. Developed for Pandrator.
★ 7Pythonupdated 2025-03-19fine-tuningpandratorttstts-enginextts
The ultimate PyQt6 application that integrates the power of OpenAI, Google Gemini, Claude, and other open-source AI models
★ 7Pythonupdated 2025-12-07agentchatgptclaudedalleevaluator-optimizer-workflow
[NVIDIA, MAC, ROCM] Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech application (Minimum Requirements 8GB VRAM / 32 GB RAM, Recommended Requirements 16GB VRAM 24GB RAM)
★ 6JavaScriptupdated 2025-12-22pinokiotts
Serverless implementation of Text-To-Speech
★ 6Pythonupdated 2025-02-21
Speech-to-Text/Code using a fast local LLM, for Linux, uses Whisper
★ 4Pythonupdated 2025-11-15linuxttswhisperwhisper-ai
Easy one shot installer for configuring Chatterbox's TTS models (Original and Turbo)
★ 3Pythonupdated 2025-12-17
🚨 Israeli Home Front Command real-time alerts via OpenClaw - WhatsApp + TTS, no Home Assistant needed
★ 1Pythonupdated 2026-03-02
XTTS fine-tuning via CLI
★ 1Pythonupdated 2025-10-16aiai-trainingaudioaudio-processingcoqui
A Model Context Protocol (MCP) server that provides ASR(Automatic Speech Recognition) capabilities using the whisper engine. This server exposes TTS functionality through MCP tools, making it easy to integrate speech synthesis into your applications.
★ 1Pythonupdated 2025-03-31
Whisper + TTS + As many MCP servers as I can stuff in
★ 1Pythonupdated 2025-05-01