Audio (general)

182 repos
๐Ÿค— Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
โ˜… 159,926Pythonupdated 2026-04-20audiodeep-learningdeepseekgemmaglm
A feature-rich command-line audio/video downloader
โ˜… 158,626Pythonupdated 2026-04-19clidownloaderpythonsponsorblockyoutube-dl
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
โ˜… 45,835Goupdated 2026-04-20agentsaiapiaudio-generationdecentralized
Video.js - open source HTML5 video player
โ˜… 39,713JavaScriptupdated 2026-03-11dashhlshtmlhtml5html5-audio
๐Ÿ”Š Text-Prompted Generative Audio Model
โ˜… 39,092Jupyter Notebookupdated 2024-08-19
Instant voice cloning by MIT and MyShell. Audio foundation model.
โ˜… 36,336Pythonupdated 2025-04-19text-to-speechttsvoice-clonezero-shot-tts
๐Ÿค— Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
โ˜… 33,452Pythonupdated 2026-04-18deep-learningdiffusionfluximage-generationimage2image
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181, with codec support for H.264, H.265, AV1, VP9, AAC, Opus, and G.711.
โ˜… 28,769C++updated 2026-04-19audiocc-plus-plusdashhevc
The most advanced free and open-source browser fingerprinting library
โ˜… 26,991TypeScriptupdated 2026-04-14audio-fingerprintingbrowserbrowser-fingerprintbrowser-fingerprintingdetection
GUI for a Vocal Remover that uses Deep Neural Networks.
โ˜… 24,423Pythonupdated 2025-03-13audioinstrumentalkaraokekareokeemusic
๐ŸŽง Your Personal Streaming Service
โ˜… 20,701Goupdated 2026-04-20airsonicmadsonicmedia-servermusicmusic-server
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
โ˜… 18,900Pythonupdated 2026-04-19whisper
Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
โ˜… 18,581Goupdated 2026-04-19gogolanghlsmedia-serverobs-studio
The free and privacy-friendly screen recorder with no limits ๐ŸŽฅ
โ˜… 18,138JavaScriptupdated 2026-04-08annotationannotation-toolaudiocamerachrome-extension
Audio Editor
โ˜… 16,885C++updated 2026-04-28audiocross-platformeditorgplv2wxwidgets-applications
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
โ˜… 15,852Pythonupdated 2026-03-17audio-visual-speech-recognitionconformerdfsmnparaformerpretrained-model
A PyTorch-based Speech Toolkit
โ˜… 11,475Pythonupdated 2026-04-03asraudioaudio-processingdeep-learninghuggingface
A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion
โ˜… 10,218TypeScriptupdated 2025-11-13audiodailymotiondashfacebookhls
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
โ˜… 9,830Jupyter Notebookupdated 2026-04-16overlapped-speech-detectionpretrained-modelspytorchspeaker-change-detectionspeaker-diarization
๐ŸŽคโŒจ๏ธ Acoustic keyboard eavesdropping
โ˜… 8,998C++updated 2023-01-15acousticeavesdropmicrophone-audio-capture
HTML5 <audio> or <video> player with support for MP4, WebM, and MP3 as well as HLS, Dash, YouTube, Facebook, SoundCloud and others with a common HTML5 MediaElement API, enabling a consistent UI in all browsers.
โ˜… 8,298JavaScriptupdated 2025-11-12dashflashhlshtml5html5-audio
Multilingual Voice Understanding Model
โ˜… 8,041Pythonupdated 2025-12-30aiaigcasraudio-event-classificationcross-lingual
Text-audio foundation model from Boson AI
โ˜… 8,029Pythonupdated 2026-01-18
Mumble is an open-source, low-latency, high quality voice chat software.
โ˜… 7,954C++updated 2026-04-19audioclientcmakecross-platformgaming
Synchronous multiroom audio player
โ˜… 7,595C++updated 2026-03-10audioaudio-playeraudio-streaminglmsmultiroom-audio
An extensible, plugin-oriented, HTML5-first media player for the web
โ˜… 7,451JavaScriptupdated 2026-04-20clapprdashhlshtml5-audiohtml5-video
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
โ˜… 6,708Pythonupdated 2025-12-05audiobookfaster-whispergradiokaraokepodcasts
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
โ˜… 6,241Pythonupdated 2025-12-09elevenlabsgeminigenainotebooklmopenai
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
โ˜… 6,167Pythonupdated 2025-06-04aiaudio-generationdeep-learningfoundation-modelsgpt
GNU Radio โ€“ the Free and Open Software Radio Ecosystem
โ˜… 6,052C++updated 2026-04-21c-plus-pluscybersecuritydspgnugnuradio
Download web video and audio
โ˜… 5,616C#updated 2026-04-20csharpdownloaderflathubgnomegtk4
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
โ˜… 5,437Rustupdated 2026-04-20ai-engineeringai-pipelinearrowartificial-intelligencebig-data
250+ Fine-tuning & RL Notebooks for text, vision, audio, embedding, TTS models.
โ˜… 5,279Jupyter Notebookupdated 2026-04-18unsloth
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
โ˜… 4,590Pythonupdated 2025-06-21
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
โ˜… 4,522Jupyter Notebookupdated 2026-02-13
Random digital audio effects
โ˜… 4,339Cupdated 2026-02-26
A curated list of awesome data labeling tools
โ˜… 4,314updated 2024-06-173d-annotationannotationannotation-toolaudio-annotationaudio-annotation-tool
Effort free video editing!
โ˜… 4,206Nimupdated 2026-04-17audioaudio-editingaudio-processingautomaticnim
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
โ˜… 4,098Pythonupdated 2025-08-14audiobandwidth-extensiondeep-learningnoise-suppressionpytorch
Noise supression using deep filtering
โ˜… 4,095Pythonupdated 2024-10-17audiodeep-learningnoise-suppressionpytorchrust
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
โ˜… 4,073Pythonupdated 2025-01-08audiospeech-recognitionwhisper
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
โ˜… 4,063Pythonupdated 2026-04-20agiaudio-evaluationbenchmarkevaluationlarge-language-models
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
โ˜… 3,981Jupyter Notebookupdated 2025-06-12
Soundcloud Music Downloader
โ˜… 3,980Pythonupdated 2026-04-14downloadermusicpythonsoundcloudsoundcloud-music-downloader
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
โ˜… 3,705Jupyter Notebookupdated 2026-01-08
Cross-platform audio I/O library in pure Rust
โ˜… 3,678Rustupdated 2026-04-19audiorustsound
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
โ˜… 3,485TypeScriptupdated 2026-04-19accessibilityanalyticsaudiohlshtml
A library for audio and music analysis, feature extraction.
โ˜… 3,302Cupdated 2026-03-06audioaudio-analysisaudio-featuresaudio-processingdeep-learning
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
โ˜… 3,099Shellupdated 2026-04-13agent-toolsai-agentsai-artai-musicai-video
A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!
โ˜… 3,094TypeScriptupdated 2026-04-19ace-stepaiaudio-generationcosyvoicegenerative-ai
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
โ˜… 2,979Svelteupdated 2025-08-15aiaudio-to-textgolangspeech-recognitionspeech-to-text
Data manipulation and transformation for audio signal processing, powered by PyTorch
โ˜… 2,869Pythonupdated 2026-04-20audioaudio-processingiomachine-learningpython
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
โ˜… 2,858TypeScriptupdated 2026-03-06aimusicsunosuno-aisuno-ai-api
Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
โ˜… 2,776Javaupdated 2026-04-20epsexificciptcjava
Self-hosted AI audio transcription
โ˜… 2,605Goupdated 2026-03-22aiaudiotranscripttranscription
๐ŸŽš๏ธ Open Source Audio Matching and Mastering
โ˜… 2,500Pythonupdated 2026-04-19audiodocker-imagedspequalizerfilter
ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help you build multimodal agents faster.
โ˜… 2,184TypeScriptupdated 2026-04-15agentsaiaudiocomponentselevenlabs
๐Ÿ”Š A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
โ˜… 2,159updated 2024-06-06audio-datasetaudio-datasetsdatadatasetdatasets
An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
โ˜… 2,146Jupyter Notebookupdated 2025-12-30
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
โ˜… 2,091Cupdated 2026-02-02audioautomatic-speech-recognitionconversational-aireal-timesilero-vad
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
โ˜… 2,064Pythonupdated 2025-04-21
Voice activity detector (VAD) for the browser with a simple API
โ˜… 1,945TypeScriptupdated 2026-01-30onnxruntimesilero-vadspeech-to-texttypescriptvoice-activity-detection
Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)
โ˜… 1,876Pythonupdated 2026-04-17audio-transcriptionfaster-whisperinterviewpyannotequalitative-research
Cross-Platform, GPU Accelerated Whisper ๐ŸŽ๏ธ
โ˜… 1,800TypeScriptupdated 2024-02-27audiomachine-learningrustspeech-recognitionwebgpu
MLT Multimedia Framework
โ˜… 1,765Cupdated 2026-04-20audioaudio-processingcc-plus-plusffmpeg
Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.
โ˜… 1,688Perlupdated 2026-04-16logitech-media-serverlyrionlyrion-music-servermusicperl
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
โ˜… 1,591Pythonupdated 2025-01-01deep-learninglanguage-modelmachine-learningmulti-modal-learningnatural-language-processing
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
โ˜… 1,466Pythonupdated 2026-02-18ai-audioai-ttsai-voiceai-voice-cloneai-voice-clonining
SALMONN family: A suite of advanced multi-modal LLMs
โ˜… 1,412updated 2026-04-20audioaudio-processingaudio-visual-understandingbytedanceiclr2024
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
โ˜… 1,399Pythonupdated 2026-03-16
A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.
โ˜… 1,291Vueupdated 2026-04-08audiocomfyuiextensionfile-explorerfile-server
Teensy Audio Library
โ˜… 1,233C++updated 2026-04-26
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a reference image and audio.
โ˜… 1,232Pythonupdated 2026-01-20aigcavatar-generatorvideo-generation
Unofficial PyTorch implementation of Google AI's VoiceFilter system
โ˜… 1,201Pythonupdated 2024-07-25audio-separationpytorchsource-separationspeech-separationvoicefilter
A free & open tool for transcribing audio interviews
โ˜… 1,192JavaScriptupdated 2026-04-16
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
โ˜… 1,187Pythonupdated 2026-04-02aiapi-serveraudio-generationchatterboxchatterbox-tts
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
โ˜… 1,154Pythonupdated 2026-04-20
Open source audio annotation tool for humans
โ˜… 1,133TypeScriptupdated 2026-02-03annotation-toolaudio-annotationaudio-processingdatasetsmachine-learning
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
โ˜… 1,103updated 2025-12-15audio-captioningaudio-language-modelsaudio-question-answeringaudio-reasoningmultimodal-large-language-models
MiMo-Audio: Audio Language Models are Few-Shot Learners
โ˜… 1,029Pythonupdated 2026-03-03
Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS
โ˜… 957Pythonupdated 2024-10-02aispeech-recognitionspeech-to-textwebsocket
AI Audio Datasets (AI-ADS) ๐ŸŽต, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
โ˜… 930updated 2025-07-08aigcartificial-intelligenceaudioaudio-effectaudio-generation
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports OpenAI, Azure, Perplexity, LLaMA, and more, with features like streaming, interactive chat, prompt files, image/audio I/O, MCP tool calls, and an experimental agent mode for safe, multi-step automation.
โ˜… 917Goupdated 2026-03-22agentagentic-aiazurechatgptcli
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
โ˜… 913Pythonupdated 2026-04-09audio-editingcross-lingualemotion-controlparalinguisticsreinforcement-learning
Audio Large Language Models
โ˜… 912Pythonupdated 2025-07-05audio-languageaudio-processingaudio-understanding
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
โ˜… 891Pythonupdated 2026-04-17ai-audioaudioaudio-editingaudio-generationaudio-processing
Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration
โ˜… 809JavaScriptupdated 2023-03-16expressjsgpulibretranslatemachine-learningnodejs
๐Ÿ”ˆ Sonos Media Player Interface/Client
โ˜… 719JavaScriptupdated 2026-04-15home-automationjavascriptmusicnodejssonos
A 100% private AI voice transcription app that converts speech to text in 100+ languages. Built with Compose Multiplatform for Android & iOS using Whisper AI - no cloud uploads, all processing happens on-device for complete privacy.
โ˜… 684C++updated 2026-04-07androidaudio-playercompose-ioscompose-multiplatformcompose-multiplatform-sample
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
โ˜… 658Pythonupdated 2026-02-26audio-language-modeldeep-learninglarge-language-modelsmultimodal-large-language-modelsvision-language-model
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
โ˜… 614JavaScriptupdated 2024-02-12bbc-news-labskaldinews-labsreactstt
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
โ˜… 603Pythonupdated 2025-10-17agentsaudio-editingaudio-understandingllm-agentsnotebooklm
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
โ˜… 551Pythonupdated 2025-04-21audiobookaudiobook-creatoraudiobook-makeraudiobookscustomtkinterprojects
๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
โ˜… 548HTMLupdated 2025-04-04aigclarge-language-modelslarge-vision-language-modelsllmlvlm
ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3
โ˜… 504Pythonupdated 2025-08-12aigroqgroq-apillama3replit
Open Audio Watermarking Tool
โ˜… 491Pythonupdated 2025-12-22
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
โ˜… 487Cupdated 2025-07-15androidaudio-processingdeep-neural-networksdnngmm
Open Source Multiroom Audio Streamer based on Raspberry Pi & Snapcast
โ˜… 460Shellupdated 2026-04-19airplayaudiocamilladspdebianlibrespot
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
โ˜… 443Pascalupdated 2026-04-18aiaudio-to-textblu-raycaptionseditor
This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.
โ˜… 430JavaScriptupdated 2026-01-18ai-avatarsdigital-humanelevenlabslip-synclipsync
A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.
โ˜… 396TypeScriptupdated 2026-04-19diarizationdictationdockerfaster-whisperlinux
Free on-device web app for audio transcribing and rendering subtitles
โ˜… 363ReScriptupdated 2026-02-01airescriptsubtitleswebcodecswhisper
Collection of LADSPA/LV2/VST/JACK audio plugins for high-quality processing
โ˜… 306C++updated 2026-04-05audio-plugindpflv2lv2-pluginvst
AudioBench: A Universal Benchmark for Audio Large Language Models
โ˜… 303Pythonupdated 2025-06-17audio-scene-understandingspeechspeech-question-answeringspeech-recognition
Your faithful, impartial partner for audio evaluation โ€” know yourself, know your rivals. ็œŸๅฎž่ฏ„ๆต‹๏ผŒ็Ÿฅๅทฑ็Ÿฅๅฝผใ€‚
โ˜… 290Pythonupdated 2026-04-08evaluationspeech-recognitionspeech-to-speechspeech-to-text
OpenShot Audio Library (libopenshot-audio) is a free, open-source project that enables high-quality editing and playback of audio, and is based on the amazing JUCE library.
โ˜… 289C++updated 2026-03-20audioaudio-effectsaudio-libraryaudio-processingc-plus-plus
The (official) Music Assistant Mobile app is a cross-platform client application designed for Android, iOS, and Java runtime environments. Developed using Kotlin Multiplatform (KMP) and Compose Multiplatform frameworks, this project aims to provide a unified codebase for seamless music management across multiple platforms.
โ˜… 285Kotlinupdated 2026-04-15androidandroid-apphome-assistantiosios-app
An audio recording helper for React. Provides a component and a hook to help with audio recording.
โ˜… 260TypeScriptupdated 2024-06-02audioaudio-recorderaudio-recordingdownloadnextjs
Transcribe audio and add subtitles to videos using Whisper in ComfyUI
โ˜… 229Pythonupdated 2026-01-02comfyuistable-diffusionwhisper-ai
The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio ๐ŸŽ™๏ธ๐Ÿ“œ
โ˜… 224Pythonupdated 2025-03-05ag2aiaudio-generationautogenelevenlabs
Snapcast client for Android
โ˜… 222Javaupdated 2025-09-23androidmultiroom-audiosnapcastsonos
A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.
โ˜… 200Pythonupdated 2025-09-13
Natural language โ†’ ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download. Supports txt2img, img2img, txt2vid, img2vid, audio, 3D generation across SD1.5/SDXL/SD3/FLUX/Wan2.2/HunyuanVideo/LTXV/Mochi/Cosmos + LLM integration. Works as a skill for Claude Code, Cursor, and other AI coding agents.
โ˜… 190updated 2026-04-09
PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file
โ˜… 189Pythonupdated 2024-05-05auto-captionauto-subtitlecaptionsffmpeggoogle-translate
Modern GUI application that transcribes and translate audio files using OpenAI Whisper.
โ˜… 168Pythonupdated 2024-08-12
โ˜… 164Pythonupdated 2025-11-20
The BEST music separation model with help of A.I. ... to my ears ! ๐Ÿ‘‚๐Ÿ‘‚
โ˜… 149Pythonupdated 2024-06-10artificial-intelligenceaudioinstrumentalinstrumentalskaraoke
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
โ˜… 140Pythonupdated 2025-09-19
Cross-platform audio recorder designed for real-time speech audio processing
โ˜… 129Cupdated 2026-04-18audiocdotnetgolangnodejs
Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive
โ˜… 128HTMLupdated 2026-04-27karaokekaraoke-makerlyricsmusicvideo
๐Ÿ’ฌ Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAIโ€™s Whisper on CPU, Nvidia GPU and Apple MLX.
โ˜… 121Pythonupdated 2026-04-06asrautomatic-speech-recognitionmlxmlx-audiospeech-recognition
Wayland Speech-to-Text Tool - A minimal signal-driven speech-to-text tool for Wayland environments with PipeWire audio
โ˜… 120Rustupdated 2026-03-21
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
โ˜… 114Pythonupdated 2026-03-03
End-to-end workflow to automatically generate show notes from audio/video transcripts
โ˜… 94TypeScriptupdated 2026-02-25assembly-aichatgptclaudedeepgramgemini
Vapi Blocks is a library of components & api snips to copy and paste into React applications built with TailwindCSS for integrating Voice AI into your application using Vapi.ai. Vapi let's you develop voice AI fast, Vapi Blocks helps you implement faster.
โ˜… 84TypeScriptupdated 2024-12-20aiaudio-visualizerconversational-aimicrophoneuikit
AI-powered tool for automatic podcast script and audio generation.
โ˜… 83Pythonupdated 2025-08-14artificial-intelligencechatgptpodcast
Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.
โ˜… 63TypeScriptupdated 2026-04-16ffmpegpodcastprofanity-filterspeech-to-texttransformersjs
SVAR - Simple Voice Activated Recorder
โ˜… 63Cupdated 2025-12-01alsaaudio-recordermp3ogg-opusogg-vorbis
a comfyui cuatom node for audio subtitling based on whisperX and translators
โ˜… 62Pythonupdated 2025-04-01srt-subtitlessutitlestranslationwhisper
A comprehensive framework to test audio comprehension of Large Audio Language Models.
โ˜… 61Pythonupdated 2026-04-19
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
โ˜… 61Svelteupdated 2026-04-07geminigemini-flashspeaker-diarizationspeech-to-textsveltekit
AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI API. This project leverages advanced text-to-speech technology to create dynamic, multi-speaker conversations with customizable voices.
โ˜… 55Pythonupdated 2024-12-16gemini-2-0-flash-expgenerative-language-apigoogle-aigoogleapis
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
โ˜… 53TypeScriptupdated 2026-03-17ffmpegopenaiopenai-whispersubtitle-generatorsubtitles
An MCP Server for audio transcription using OpenAI
โ˜… 52Pythonupdated 2025-10-16
A user-friendly Raspberry Pi baby monitor with cry detection and audio/video streaming.
โ˜… 52PHPupdated 2026-02-03
This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.
โ˜… 50Pythonupdated 2025-03-30
Installation script for an AI applications using ROCm on Linux.
โ˜… 45Shellupdated 2026-04-183daiamdamdgpuaudio
MCP server for Fal.ai - Generate images, videos, music and audio with Claude
โ˜… 43Pythonupdated 2026-03-30ai-toolsclaudefal-aiimage-generationllm
Snapcast Multiroom audio docker image
โ˜… 43Dockerfileupdated 2026-03-28dockerdocker-imagelinuxservermopidymultiroom
A set of bash scripts to convert audio files into M4B audiobooks with chapter markers, customizable bitrate, book metadata and embedded cover art.
โ˜… 42Shellupdated 2026-04-03audiobookbashm4bm4b-bookm4b-tool
Android application for data transfer, using sound waves
โ˜… 32Javaupdated 2025-04-19android-developmentdata-transferfrequency-modulationjavasound
Removes silence segments from wav audio files
โ˜… 30Pythonupdated 2020-02-29
Learn how multimodal AI merges text, image, and audio for smarter models
โ˜… 30Jupyter Notebookupdated 2025-01-21dalle-3deepgramdiffusiondreamboothgenerative-ai
Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using Whisper.
โ˜… 30Pythonupdated 2023-05-27
The Multi-Language Automatic Translation, Subtitling, and Voice Rendering System uses third party software to automatically convert audio to text, translate text, render text to video, and render text to audio.
โ˜… 29PHPupdated 2024-07-29audiolanguagephpspeechsrt
Dockerized Whisper C++ speech-to-text API for easy deployment and rapid integration. Offering the latest stable and nightly builds for efficient audio transcription.
โ˜… 28C++updated 2026-02-28apiaudio-transcriptiondockermachine-learningspeech-to-text
A cross-platform desktop application that records audio and transcribes it to text using OpenAI's Whisper API or compatible services. Perfect for dictation, note-taking, and accessibility.
โ˜… 27C#updated 2026-03-21
Audio Cleaner using DeepFilterNet, hosted through Streamlit
โ˜… 27Pythonupdated 2025-05-04audio-processingnoise-reduction
A curated list of tools for building AI with rich context from screen recordings, audio, and personal data
โ˜… 24updated 2026-01-25
Text-to-speech plugin for Claude Code โ€” multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly, Azure, Kitten, local system TTS) on macOS, Linux, and Windows
โ˜… 24Shellupdated 2026-04-15accessibilityaudioclaude-codeclaude-code-pluginelevenlabs
Prompt Management System for Interaction with the ChatGPT API
โ˜… 23JavaScriptupdated 2026-04-07aiaudio-transcribingimage-generationprompt-databaseprompts
A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.
โ˜… 19Pythonupdated 2026-01-02asrgeminigemini-apitranscribe
International Public Radio Directory, bringing diversity into audio. Public listing of internet radios from all around the world.
โ˜… 18Pythonupdated 2026-04-27
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
โ˜… 17Pythonupdated 2025-03-22
Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.
โ˜… 17Goupdated 2026-01-08ffmpegffmpeg-wrapperfyneguilocal
An audio/video transcriber with diarization and transcription editing.
โ˜… 10JavaScriptupdated 2026-03-17
A MCP server that provides audio transcription capabilities using OpenAI's Whisper API
โ˜… 9JavaScriptupdated 2025-03-25
Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing
โ˜… 9Rustupdated 2026-03-21ailocalaimacosopen-sourcetranscribe
Convert audio files (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm) to SRT subtitles with OpenAI Whisper. Easy script for fast, accurate transcription.
โ˜… 9Pythonupdated 2024-06-11
ืืชืจ ื—ื™ืคื•ืฉ ื•ื”ื•ืจื“ืช ืฉื™ืจื™ื
โ˜… 8Pythonupdated 2026-04-19downloaderisraeljewishmusicsingles
One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, then paste. Never lose your audio files anymore!
โ˜… 8Pythonupdated 2026-03-23chatgptlinuxllmollamaopen-source
OmniEvalKit is an evaluation framework designed for omni-modal large language models, with a focus on audio and audio-visual understanding. Based on OmniEvalKit, you can quickly reproduce benchmarks, implement your own models or datasets, and conduct fair comparisons with other open-source models. MiniCPM-o is evaluated using this framework.
โ˜… 7Pythonupdated 2026-03-27
[ACL 2025] Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
โ˜… 7Pythonupdated 2025-05-29
Fine-tuned whisper that transcribe Hebrew audio into IPA
โ˜… 7Pythonupdated 2026-04-08g2phebrewipawhisper
A voice transcription tool using faster-whisper that records audio and converts speech to text on Linux systems.
โ˜… 7Pythonupdated 2025-02-20
Rudimentary program for speech transcription, manipulation, and redaction.
โ˜… 5Pythonupdated 2024-07-17audiocensorcensorshippydubredaction
AudioWrite: Effortless voice dictation powered by Google's Gemini API. Record, transcribe, and transform rambling audio into polished, multi-language notes. PWA ready.
โ˜… 5TypeScriptupdated 2026-01-05aiaudio-recorderdictationfrontendgemini-api
LTX-2.3 video generation skill โ€” setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model
โ˜… 4Pythonupdated 2026-03-27
๐ŸŽ™๏ธ Lightning-fast voice dictation Desktop Web App powered by Groq's Whisper Turbo - Open-source, privacy-first, with real-time audio visualization and intuitive click controls
โ˜… 4Rustupdated 2026-03-14desktop-appdesktop-web-based-appgroqlinuxreal-time
WhisperVoice: Covert voice notes. Encrypts text and hides it via LLM-generated acrostic sentences. Murf.ai creates natural audio. Browser extension decrypts with passcode, revealing hidden message or playing decoy for unauthorized listeners. Uses LLM, Murf.ai, STT APIs
โ˜… 4JavaScriptupdated 2025-06-29murf-aimurf-ai-hackathon
Modern NVR with object/motion/audio detection, push notifications, multi-location, and encrypted local and cloud-based storage support built in.
โ˜… 4updated 2024-10-06aicamerahome-assistanthome-automationip-camera
AI-powered music production in REAPER via the Model Context Protocol โ€” 163 tools for composition, MIDI, FX, mixing, and mastering.
โ˜… 3Pythonupdated 2026-04-17anthropicaudioclaudecompositiondaw
Real-time desktop audio transcription using OpenAI Whisper for Arch Linux with CUDA acceleration
โ˜… 3Pythonupdated 2025-08-05
A powerful audio transcription server that seamlessly transcribes meeting recordings, generates notes, and intelligently splits audio files for efficient management. Open-source and built with FastMCP and Groq/OpenAI Whisper
โ˜… 3Pythonupdated 2025-06-13
MCP server for real-time audio transcription using OpenAI Whisper
โ˜… 3TypeScriptupdated 2025-10-08
Synchronous multiroom audio player
โ˜… 2C++updated 2025-04-08
A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.
โ˜… 2Pythonupdated 2025-06-10
A deep learning application that classifies the reason for a baby's cry (hunger, pain, etc.) from live or uploaded audio. Built with a TensorFlow/Keras CNN, Librosa for audio processing, and a responsive Flask web UI with real-time recording and visualization. Helps caregivers understand an infant's needs instantly.
โ˜… 2updated 2025-08-01
Python API for controlling Snapcast, a multi-room synchronous audio solution.
โ˜… 1Pythonupdated 2025-05-19
XTTS fine-tuning via CLI
โ˜… 1Pythonupdated 2025-10-16aiai-trainingaudioaudio-processingcoqui
Blazingly fast audio transcription MCP server using Whisper with Flash Attention 2
โ˜… 1Pythonupdated 2025-12-04
App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.
โ˜… 1Pythonupdated 2025-05-26openai-apistreamlit