Audio (general)

182 repos

Sort by

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

★ 159,926Pythonupdated 2026-04-20audiodeep-learningdeepseekgemmaglm

yt-dlp/yt-dlp

A feature-rich command-line audio/video downloader

★ 158,626Pythonupdated 2026-04-19clidownloaderpythonsponsorblockyoutube-dl

mudler/LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

★ 45,835Goupdated 2026-04-20agentsaiapiaudio-generationdecentralized

videojs/video.js

Video.js - open source HTML5 video player

★ 39,713JavaScriptupdated 2026-03-11dashhlshtmlhtml5html5-audio

suno-ai/bark

🔊 Text-Prompted Generative Audio Model

★ 39,092Jupyter Notebookupdated 2024-08-19

myshell-ai/OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

★ 36,336Pythonupdated 2025-04-19text-to-speechttsvoice-clonezero-shot-tts

huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

★ 33,452Pythonupdated 2026-04-18deep-learningdiffusionfluximage-generationimage2image

ossrs/srs

SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181, with codec support for H.264, H.265, AV1, VP9, AAC, Opus, and G.711.

★ 28,769C++updated 2026-04-19audiocc-plus-plusdashhevc

fingerprintjs/fingerprintjs

The most advanced free and open-source browser fingerprinting library

★ 26,991TypeScriptupdated 2026-04-14audio-fingerprintingbrowserbrowser-fingerprintbrowser-fingerprintingdetection

Anjok07/ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

★ 24,423Pythonupdated 2025-03-13audioinstrumentalkaraokekareokeemusic

navidrome/navidrome

🎧 Your Personal Streaming Service

★ 20,701Goupdated 2026-04-20airsonicmadsonicmedia-servermusicmusic-server

chidiwilliams/buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

★ 18,900Pythonupdated 2026-04-19whisper

bluenviron/mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.

★ 18,581Goupdated 2026-04-19gogolanghlsmedia-serverobs-studio

alyssaxuu/screenity

The free and privacy-friendly screen recorder with no limits 🎥

★ 18,138JavaScriptupdated 2026-04-08annotationannotation-toolaudiocamerachrome-extension

audacity/audacity

Audio Editor

★ 16,885C++updated 2026-04-28audiocross-platformeditorgplv2wxwidgets-applications

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

★ 15,852Pythonupdated 2026-03-17audio-visual-speech-recognitionconformerdfsmnparaformerpretrained-model

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

★ 11,475Pythonupdated 2026-04-03asraudioaudio-processingdeep-learninghuggingface

cookpete/react-player

A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion

★ 10,218TypeScriptupdated 2025-11-13audiodailymotiondashfacebookhls

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

★ 9,830Jupyter Notebookupdated 2026-04-16overlapped-speech-detectionpretrained-modelspytorchspeaker-change-detectionspeaker-diarization

ggerganov/kbd-audio

🎤⌨️ Acoustic keyboard eavesdropping

★ 8,998C++updated 2023-01-15acousticeavesdropmicrophone-audio-capture

mediaelement/mediaelement

HTML5 <audio> or <video> player with support for MP4, WebM, and MP3 as well as HLS, Dash, YouTube, Facebook, SoundCloud and others with a common HTML5 MediaElement API, enabling a consistent UI in all browsers.

★ 8,298JavaScriptupdated 2025-11-12dashflashhlshtml5html5-audio

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

★ 8,041Pythonupdated 2025-12-30aiaigcasraudio-event-classificationcross-lingual

boson-ai/higgs-audio

Text-audio foundation model from Boson AI

★ 8,029Pythonupdated 2026-01-18

mumble-voip/mumble

Mumble is an open-source, low-latency, high quality voice chat software.

★ 7,954C++updated 2026-04-19audioclientcmakecross-platformgaming

snapcast/snapcast

Synchronous multiroom audio player

★ 7,595C++updated 2026-03-10audioaudio-playeraudio-streaminglmsmultiroom-audio

clappr/clappr

An extensible, plugin-oriented, HTML5-first media player for the web

★ 7,451JavaScriptupdated 2026-04-20clapprdashhlshtml5-audiohtml5-video

abus-aikorea/voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

★ 6,708Pythonupdated 2025-12-05audiobookfaster-whispergradiokaraokepodcasts

souzatharsis/podcastfy

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

★ 6,241Pythonupdated 2025-12-09elevenlabsgeminigenainotebooklmopenai

multimodal-art-projection/YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

★ 6,167Pythonupdated 2025-06-04aiaudio-generationdeep-learningfoundation-modelsgpt

gnuradio/gnuradio

GNU Radio – the Free and Open Software Radio Ecosystem

★ 6,052C++updated 2026-04-21c-plus-pluscybersecuritydspgnugnuradio

NickvisionApps/Parabolic

Download web video and audio

★ 5,616C#updated 2026-04-20csharpdownloaderflathubgnomegtk4

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

★ 5,437Rustupdated 2026-04-20ai-engineeringai-pipelinearrowartificial-intelligencebig-data

unslothai/notebooks

250+ Fine-tuning & RL Notebooks for text, vision, audio, embedding, TTS models.

★ 5,279Jupyter Notebookupdated 2026-04-18unsloth

MoonshotAI/Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

★ 4,590Pythonupdated 2025-06-21

AllenDowney/ThinkDSP

Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

★ 4,522Jupyter Notebookupdated 2026-02-13

torvalds/AudioNoise

Random digital audio effects

★ 4,339Cupdated 2026-02-26

HumanSignal/awesome-data-labeling

A curated list of awesome data labeling tools

★ 4,314updated 2024-06-173d-annotationannotationannotation-toolaudio-annotationaudio-annotation-tool

WyattBlue/auto-editor

Effort free video editing!

★ 4,206Nimupdated 2026-04-17audioaudio-editingaudio-processingautomaticnim

modelscope/ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

★ 4,098Pythonupdated 2025-08-14audiobandwidth-extensiondeep-learningnoise-suppressionpytorch

Rikorose/DeepFilterNet

Noise supression using deep filtering

★ 4,095Pythonupdated 2024-10-17audiodeep-learningnoise-suppressionpytorchrust

huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

★ 4,073Pythonupdated 2025-01-08audiospeech-recognitionwhisper

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

★ 4,063Pythonupdated 2026-04-20agiaudio-evaluationbenchmarkevaluationlarge-language-models

QwenLM/Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

★ 3,981Jupyter Notebookupdated 2025-06-12

scdl-org/scdl

Soundcloud Music Downloader

★ 3,980Pythonupdated 2026-04-14downloadermusicpythonsoundcloudsoundcloud-music-downloader

QwenLM/Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

★ 3,705Jupyter Notebookupdated 2026-01-08

RustAudio/cpal

Cross-platform audio I/O library in pure Rust

★ 3,678Rustupdated 2026-04-19audiorustsound

vidstack/player

UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.

★ 3,485TypeScriptupdated 2026-04-19accessibilityanalyticsaudiohlshtml

libAudioFlux/audioFlux

A library for audio and music analysis, feature extraction.

★ 3,302Cupdated 2026-03-06audioaudio-analysisaudio-featuresaudio-processingdeep-learning

SamurAIGPT/Generative-Media-Skills

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

★ 3,099Shellupdated 2026-04-13agent-toolsai-agentsai-artai-musicai-video

rsxdalv/TTS-WebUI

A single Gradio + React WebUI with extensions for ACE-Step, OmniVoice, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, MusicGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, and Bark!

★ 3,094TypeScriptupdated 2026-04-19ace-stepaiaudio-generationcosyvoicegenerative-ai

pluja/whishper

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

★ 2,979Svelteupdated 2025-08-15aiaudio-to-textgolangspeech-recognitionspeech-to-text

pytorch/audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

★ 2,869Pythonupdated 2026-04-20audioaudio-processingiomachine-learningpython

gcui-art/suno-api

Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.

★ 2,858TypeScriptupdated 2026-03-06aimusicsunosuno-aisuno-ai-api

drewnoakes/metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

★ 2,776Javaupdated 2026-04-20epsexificciptcjava

rishikanthc/Scriberr

Self-hosted AI audio transcription

★ 2,605Goupdated 2026-03-22aiaudiotranscripttranscription

sergree/matchering

🎚️ Open Source Audio Matching and Mastering

★ 2,500Pythonupdated 2026-04-19audiodocker-imagedspequalizerfilter

elevenlabs/ui

ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help you build multimodal agents faster.

★ 2,184TypeScriptupdated 2026-04-15agentsaiaudiocomponentselevenlabs

jim-schwoebel/voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

★ 2,159updated 2024-06-06audio-datasetaudio-datasetsdatadatasetdatasets

dscripka/openWakeWord

An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.

★ 2,146Jupyter Notebookupdated 2025-12-30

TEN-framework/ten-vad

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

★ 2,091Cupdated 2026-02-02audioautomatic-speech-recognitionconversational-aireal-timesilero-vad

QwenLM/Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

★ 2,064Pythonupdated 2025-04-21

ricky0123/vad

Voice activity detector (VAD) for the browser with a simple API

★ 1,945TypeScriptupdated 2026-01-30onnxruntimesilero-vadspeech-to-texttypescriptvoice-activity-detection

kaixxx/noScribe

Cutting edge AI technology for automated audio transcription. A nice GUI for OpenAIs Whisper and pyannote (speaker identification)

★ 1,876Pythonupdated 2026-04-17audio-transcriptionfaster-whisperinterviewpyannotequalitative-research

FL33TW00D/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

★ 1,800TypeScriptupdated 2024-02-27audiomachine-learningrustspeech-recognitionwebgpu

mltframework/mlt

MLT Multimedia Framework

★ 1,765Cupdated 2026-04-20audioaudio-processingcc-plus-plusffmpeg

LMS-Community/slimserver

Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.

★ 1,688Perlupdated 2026-04-16logitech-media-serverlyrionlyrion-music-servermusicperl

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

★ 1,591Pythonupdated 2025-01-01deep-learninglanguage-modelmachine-learningmulti-modal-learningnatural-language-processing

Enemyx-net/VibeVoice-ComfyUI

A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.

★ 1,466Pythonupdated 2026-02-18ai-audioai-ttsai-voiceai-voice-cloneai-voice-clonining

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

★ 1,412updated 2026-04-20audioaudio-processingaudio-visual-understandingbytedanceiclr2024

stepfun-ai/Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

★ 1,399Pythonupdated 2026-03-16

zanllp/infinite-image-browsing

A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.

★ 1,291Vueupdated 2026-04-08audiocomfyuiextensionfile-explorerfile-server

PaulStoffregen/Audio

Teensy Audio Library

★ 1,233C++updated 2026-04-26

Francis-Rings/StableAvatar

We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a reference image and audio.

★ 1,232Pythonupdated 2026-01-20aigcavatar-generatorvideo-generation

maum-ai/voicefilter

Unofficial PyTorch implementation of Google AI's VoiceFilter system

★ 1,201Pythonupdated 2024-07-25audio-separationpytorchsource-separationspeech-separationvoicefilter

oTranscribe/oTranscribe

A free & open tool for transcribing audio interviews

★ 1,192JavaScriptupdated 2026-04-16

devnen/Chatterbox-TTS-Server

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.

★ 1,187Pythonupdated 2026-04-02aiapi-serveraudio-generationchatterboxchatterbox-tts

nomadkaraoke/python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

★ 1,154Pythonupdated 2026-04-20

midas-research/audino

Open source audio annotation tool for humans

★ 1,133TypeScriptupdated 2026-02-03annotation-toolaudio-annotationaudio-processingdatasetsmachine-learning

NVIDIA/audio-flamingo

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

★ 1,103updated 2025-12-15audio-captioningaudio-language-modelsaudio-question-answeringaudio-reasoningmultimodal-large-language-models

XiaomiMiMo/MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

★ 1,029Pythonupdated 2026-03-03

alesaccoia/VoiceStreamAI

Near-Realtime audio transcription using self-hosted Whisper and WebSocket in Python/JS

★ 957Pythonupdated 2024-10-02aispeech-recognitionspeech-to-textwebsocket

Yuan-ManX/ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

★ 930updated 2025-07-08aigcartificial-intelligenceaudioaudio-effectaudio-generation

kardolus/chatgpt-cli

ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports OpenAI, Azure, Perplexity, LLaMA, and more, with features like streaming, interactive chat, prompt files, image/audio I/O, MCP tool calls, and an experimental agent mode for safe, multi-step automation.

★ 917Goupdated 2026-03-22agentagentic-aiazurechatgptcli

stepfun-ai/Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

★ 913Pythonupdated 2026-04-09audio-editingcross-lingualemotion-controlparalinguisticsreinforcement-learning

AudioLLMs/Awesome-Audio-LLM

Audio Large Language Models

★ 912Pythonupdated 2025-07-05audio-languageaudio-processingaudio-understanding

diodiogod/TTS-Audio-Suite

A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools

★ 891Pythonupdated 2026-04-17ai-audioaudioaudio-editingaudio-generationaudio-processing

mayeaux/generate-subtitles

Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration

★ 809JavaScriptupdated 2023-03-16expressjsgpulibretranslatemachine-learningnodejs

bencevans/node-sonos

🔈 Sonos Media Player Interface/Client

★ 719JavaScriptupdated 2026-04-15home-automationjavascriptmusicnodejssonos

Notely-Voice/NotelyVoice

A 100% private AI voice transcription app that converts speech to text in 100+ languages. Built with Compose Multiplatform for Android & iOS using Whisper AI - no cloud uploads, all processing happens on-device for complete privacy.

★ 684C++updated 2026-04-07androidaudio-playercompose-ioscompose-multiplatformcompose-multiplatform-sample

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

★ 658Pythonupdated 2026-02-26audio-language-modeldeep-learninglarge-language-modelsmultimodal-large-language-modelsvision-language-model

bbc/react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress

★ 614JavaScriptupdated 2024-02-12bbc-news-labskaldinews-labsreactstt

HKUDS/VideoAgent

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

★ 603Pythonupdated 2025-10-17agentsaudio-editingaudio-understandingllm-agentsnotebooklm

lukaszliniewicz/Pandrator

Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.

★ 551Pythonupdated 2025-04-21audiobookaudiobook-creatoraudiobook-makeraudiobookscustomtkinterprojects

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

★ 548HTMLupdated 2025-04-04aigclarge-language-modelslarge-vision-language-modelsllmlvlm

Bklieger/ScribeWizard

ScribeWizard: Generate organized notes from audio using Groq, Whisper, and Llama3

★ 504Pythonupdated 2025-08-12aigroqgroq-apillama3replit

resemble-ai/Perth

Open Audio Watermarking Tool

★ 491Pythonupdated 2025-12-22

gkonovalov/android-vad

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

★ 487Cupdated 2025-07-15androidaudio-processingdeep-neural-networksdnngmm

byrdsandbytes/beatnik-pi

Open Source Multiroom Audio Streamer based on Raspberry Pi & Snapcast

★ 460Shellupdated 2026-04-19airplayaudiocamilladspdebianlibrespot

URUWorks/TeroSubtitler

Tero Subtitler is an open source, cross-platform, and free subtitle editing software.

★ 443Pascalupdated 2026-04-18aiaudio-to-textblu-raycaptionseditor

asanchezyali/talking-avatar-with-ai

This project is a digital human that can talk and listen to you. It uses OpenAI's GPT to generate responses, OpenAI's Whisper to transcript the audio, Eleven Labs to generate voice and Rhubarb Lip Sync to generate the lip sync.

★ 430JavaScriptupdated 2026-01-18ai-avatarsdigital-humanelevenlabslip-synclipsync

homelab-00/TranscriptionSuite

A fully local and private Speech-To-Text app with cross-platform support, speaker diarization, Audio Notebook mode, LM Studio integration, and both longform and live transcription.

★ 396TypeScriptupdated 2026-04-19diarizationdictationdockerfaster-whisperlinux

dmtrKovalenko/subtitler

Free on-device web app for audio transcribing and rendering subtitles

★ 363ReScriptupdated 2026-02-01airescriptsubtitleswebcodecswhisper

zamaudio/zam-plugins

Collection of LADSPA/LV2/VST/JACK audio plugins for high-quality processing

★ 306C++updated 2026-04-05audio-plugindpflv2lv2-pluginvst

AudioLLMs/AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

★ 303Pythonupdated 2025-06-17audio-scene-understandingspeechspeech-question-answeringspeech-recognition

OpenBMB/UltraEval-Audio

Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。

★ 290Pythonupdated 2026-04-08evaluationspeech-recognitionspeech-to-speechspeech-to-text

OpenShot/libopenshot-audio

OpenShot Audio Library (libopenshot-audio) is a free, open-source project that enables high-quality editing and playback of audio, and is based on the amazing JUCE library.

★ 289C++updated 2026-03-20audioaudio-effectsaudio-libraryaudio-processingc-plus-plus

music-assistant/mobile-app

The (official) Music Assistant Mobile app is a cross-platform client application designed for Android, iOS, and Java runtime environments. Developed using Kotlin Multiplatform (KMP) and Compose Multiplatform frameworks, this project aims to provide a unified codebase for seamless music management across multiple platforms.

★ 285Kotlinupdated 2026-04-15androidandroid-apphome-assistantiosios-app

samhirtarif/react-audio-recorder

An audio recording helper for React. Provides a component and a hook to help with audio recording.

★ 260TypeScriptupdated 2024-06-02audioaudio-recorderaudio-recordingdownloadnextjs

yuvraj108c/ComfyUI-Whisper

Transcribe audio and add subtitles to videos using Whisper in ComfyUI

★ 229Pythonupdated 2026-01-02comfyuistable-diffusionwhisper-ai

leopiney/neuralnoise

The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio 🎙️📜

★ 224Pythonupdated 2025-03-05ag2aiaudio-generationautogenelevenlabs

snapcast/snapdroid

Snapcast client for Android

★ 222Javaupdated 2025-09-23androidmultiroom-audiosnapcastsonos

phongthanhbuiit/whisper-realtime-gui

A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.

★ 200Pythonupdated 2025-09-13

twwch/comfyui-workflow-skill

Natural language → ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download. Supports txt2img, img2img, txt2vid, img2vid, audio, 3D generation across SD1.5/SDXL/SD3/FLUX/Wan2.2/HunyuanVideo/LTXV/Mochi/Cosmos + LLM integration. Works as a skill for Claude Code, Cursor, and other AI coding agents.

★ 190updated 2026-04-09

botbahlul/PyAutoSRT

PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file

★ 189Pythonupdated 2024-05-05auto-captionauto-subtitlecaptionsffmpeggoogle-translate

rudymohammadbali/OpenAI-Whisper-GUI

Modern GUI application that transcribes and translate audio files using OpenAI Whisper.

★ 168Pythonupdated 2024-08-12

MoonshotAI/Kimi-Audio-Evalkit

★ 164Pythonupdated 2025-11-20

Captain-FLAM/KaraFan

The BEST music separation model with help of A.I. ... to my ears ! 👂👂

★ 149Pythonupdated 2024-06-10artificial-intelligenceaudioinstrumentalinstrumentalskaraoke

XiaomiMiMo/MiMo-Audio-Tokenizer

A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.

★ 140Pythonupdated 2025-09-19

Picovoice/pvrecorder

Cross-platform audio recorder designed for real-time speech audio processing

★ 129Cupdated 2026-04-18audiocdotnetgolangnodejs

nomadkaraoke/karaoke-gen

Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive

★ 128HTMLupdated 2026-04-27karaokekaraoke-makerlyricsmusicvideo

tsmdt/whisply

💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisper on CPU, Nvidia GPU and Apple MLX.

★ 121Pythonupdated 2026-04-06asrautomatic-speech-recognitionmlxmlx-audiospeech-recognition

sevos/waystt

Wayland Speech-to-Text Tool - A minimal signal-driven speech-to-text tool for Wayland environments with PipeWire audio

★ 120Rustupdated 2026-03-21

NVIDIA/audio-intelligence

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

★ 114Pythonupdated 2026-03-03

autoshow/autoshow

End-to-end workflow to automatically generate show notes from audio/video transcripts

★ 94TypeScriptupdated 2026-02-25assembly-aichatgptclaudedeepgramgemini

cameronking4/VapiBlocks

Vapi Blocks is a library of components & api snips to copy and paste into React applications built with TailwindCSS for integrating Voice AI into your application using Vapi.ai. Vapi let's you develop voice AI fast, Vapi Blocks helps you implement faster.

★ 84TypeScriptupdated 2024-12-20aiaudio-visualizerconversational-aimicrophoneuikit

aastroza/ai-podcast-generator

AI-powered tool for automatic podcast script and audio generation.

★ 83Pythonupdated 2025-08-14artificial-intelligencechatgptpodcast

neonwatty/bleep-that-shit

Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.

★ 63TypeScriptupdated 2026-04-16ffmpegpodcastprofanity-filterspeech-to-texttransformersjs

arkq/svar

SVAR - Simple Voice Activated Recorder

★ 63Cupdated 2025-12-01alsaaudio-recordermp3ogg-opusogg-vorbis

AIFSH/ComfyUI-WhisperX

a comfyui cuatom node for audio subtitling based on whisperX and translators

★ 62Pythonupdated 2025-04-01srt-subtitlessutitlestranslationwhisper

ServiceNow/AU-Harness

A comprehensive framework to test audio comprehension of Large Audio Language Models.

★ 61Pythonupdated 2026-04-19

mikeesto/gemini-transcribe

Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash

★ 61Svelteupdated 2026-04-07geminigemini-flashspeaker-diarizationspeech-to-textsveltekit

agituts/gemini-2-tts

AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI API. This project leverages advanced text-to-speech technology to create dynamic, multi-speaker conversations with customizable voices.

★ 55Pythonupdated 2024-12-16gemini-2-0-flash-expgenerative-language-apigoogle-aigoogleapis

Eyevinn/auto-subtitles

Automatically generate subtitles from an input audio or video file using OpenAI Whisper

★ 53TypeScriptupdated 2026-03-17ffmpegopenaiopenai-whispersubtitle-generatorsubtitles

arcaputo3/mcp-server-whisper

An MCP Server for audio transcription using OpenAI

★ 52Pythonupdated 2025-10-16

lars-frogner/OpenBabyMonitor

A user-friendly Raspberry Pi baby monitor with cry detection and audio/video streaming.

★ 52PHPupdated 2026-02-03

TahaBakhtari/SubtitleGenerator

This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.

★ 50Pythonupdated 2025-03-30

Mateusz-Dera/ROCm-AI-Installer

Installation script for an AI applications using ROCm on Linux.

★ 45Shellupdated 2026-04-183daiamdamdgpuaudio

raveenb/fal-mcp-server

MCP server for Fal.ai - Generate images, videos, music and audio with Claude

★ 43Pythonupdated 2026-03-30ai-toolsclaudefal-aiimage-generationllm

sweisgerber/docker-snapcast

Snapcast Multiroom audio docker image

★ 43Dockerfileupdated 2026-03-28dockerdocker-imagelinuxservermopidymultiroom

weak-head/m4b-maker

A set of bash scripts to convert audio files into M4B audiobooks with chapter markers, customizable bitrate, book metadata and embedded cover art.

★ 42Shellupdated 2026-04-03audiobookbashm4bm4b-bookm4b-tool

MrLaki5/Data-over-sound

Android application for data transfer, using sound waves

★ 32Javaupdated 2025-04-19android-developmentdata-transferfrequency-modulationjavasound

mauriciovander/silence-removal

Removes silence segments from wav audio files

★ 30Pythonupdated 2020-02-29

sinanuozdemir/oreilly-multimodal-ai

Learn how multimodal AI merges text, image, and audio for smarter models

★ 30Jupyter Notebookupdated 2025-01-21dalle-3deepgramdiffusiondreamboothgenerative-ai

sushant-t/tts-trainer

Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using Whisper.

★ 30Pythonupdated 2023-05-27

guardian/language-system

The Multi-Language Automatic Translation, Subtitling, and Voice Rendering System uses third party software to automatically convert audio to text, translate text, render text to video, and render text to audio.

★ 29PHPupdated 2024-07-29audiolanguagephpspeechsrt

ErcinDedeoglu/WhisperDock

Dockerized Whisper C++ speech-to-text API for easy deployment and rapid integration. Offering the latest stable and nightly builds for efficient audio transcription.

★ 28C++updated 2026-02-28apiaudio-transcriptiondockermachine-learningspeech-to-text

V0v1kkk/WhisperVoiceInput

A cross-platform desktop application that records audio and transcribes it to text using OpenAI's Whisper API or compatible services. Perfect for dictation, note-taking, and accessibility.

★ 27C#updated 2026-03-21

chuck1z/AudioCleaner

Audio Cleaner using DeepFilterNet, hosted through Streamlit

★ 27Pythonupdated 2025-05-04audio-processingnoise-reduction

louis030195/awesome-context-ai

A curated list of tools for building AI with rich context from screen recordings, audio, and personal data

★ 24updated 2026-01-25

MatiousCorp/claude-tts

Text-to-speech plugin for Claude Code — multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly, Azure, Kitten, local system TTS) on macOS, Linux, and Windows

★ 24Shellupdated 2026-04-15accessibilityaudioclaude-codeclaude-code-pluginelevenlabs

tubsn/gpt-buddy

Prompt Management System for Interaction with the ChatGPT API

★ 23JavaScriptupdated 2026-04-07aiaudio-transcribingimage-generationprompt-databaseprompts

cxyfer/GeminiASR

A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.

★ 19Pythonupdated 2026-01-02asrgeminigemini-apitranscribe

iprd-org/iprd

International Public Radio Directory, bringing diversity into audio. Public listing of internet radios from all around the world.

★ 18Pythonupdated 2026-04-27

BigUncle/Fast-Whisper-MCP-Server

A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.

★ 17Pythonupdated 2025-03-22

schnoddelbotz/whisper-ui

Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.

★ 17Goupdated 2026-01-08ffmpegffmpeg-wrapperfyneguilocal

gsu-library/whisper-scribe

An audio/video transcriber with diarization and transcription editing.

★ 10JavaScriptupdated 2026-03-17

Ichigo3766/audio-transcriber-mcp

A MCP server that provides audio transcription capabilities using OpenAI's Whisper API

★ 9JavaScriptupdated 2025-03-25

openresearchtools/transcribeoffline

Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing

★ 9Rustupdated 2026-03-21ailocalaimacosopen-sourcetranscribe

imAbdelhadi/audio2srt

Convert audio files (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm) to SRT subtitles with OpenAI Whisper. Easy script for fast, accurate transcription.

★ 9Pythonupdated 2024-06-11

NHLOCAL/shir-bot

אתר חיפוש והורדת שירים

★ 8Pythonupdated 2026-04-19downloaderisraeljewishmusicsingles

RemiFabre/voice2clipboard

One-key voice-to-transcription tool: record speech, transcribe locally with Whisper, then paste. Never lose your audio files anymore!

★ 8Pythonupdated 2026-03-23chatgptlinuxllmollamaopen-source

OpenBMB/OmniEvalKit

OmniEvalKit is an evaluation framework designed for omni-modal large language models, with a focus on audio and audio-visual understanding. Based on OmniEvalKit, you can quickly reproduce benchmarks, implement your own models or datasets, and conduct fair comparisons with other open-source models. MiniCPM-o is evaluated using this framework.

★ 7Pythonupdated 2026-03-27

KuofengGao/ADU-Bench

[ACL 2025] Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

★ 7Pythonupdated 2025-05-29

thewh1teagle/whisper-heb-ipa

Fine-tuned whisper that transcribe Hebrew audio into IPA

★ 7Pythonupdated 2026-04-08g2phebrewipawhisper

shinglyu/WhisperNow

A voice transcription tool using faster-whisper that records audio and converts speech to text on Linux systems.

★ 7Pythonupdated 2025-02-20

thunderpoot/audio-censor

Rudimentary program for speech transcription, manipulation, and redaction.

★ 5Pythonupdated 2024-07-17audiocensorcensorshippydubredaction

hoomanick/AudioWrite

AudioWrite: Effortless voice dictation powered by Google's Gemini API. Record, transcribe, and transform rambling audio into polished, multi-language notes. PWA ready.

★ 5TypeScriptupdated 2026-01-05aiaudio-recorderdictationfrontendgemini-api

broomva/ltx-video

LTX-2.3 video generation skill — setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model

★ 4Pythonupdated 2026-03-27

lliWcWill/maVoice-Linux

🎙️ Lightning-fast voice dictation Desktop Web App powered by Groq's Whisper Turbo - Open-source, privacy-first, with real-time audio visualization and intuitive click controls

★ 4Rustupdated 2026-03-14desktop-appdesktop-web-based-appgroqlinuxreal-time

SarwadnyaMahajan/WhisperVoice

WhisperVoice: Covert voice notes. Encrypts text and hides it via LLM-generated acrostic sentences. Murf.ai creates natural audio. Browser extension decrypts with passcode, revealing hidden message or playing decoy for unauthorized listeners. Uses LLM, Murf.ai, STT APIs

★ 4JavaScriptupdated 2025-06-29murf-aimurf-ai-hackathon

streamshuttle/docker-compose

Modern NVR with object/motion/audio detection, push notifications, multi-location, and encrypted local and cloud-based storage support built in.

★ 4updated 2024-10-06aicamerahome-assistanthome-automationip-camera

xDarkzx/Reaper-MCP

AI-powered music production in REAPER via the Model Context Protocol — 163 tools for composition, MIDI, FX, mixing, and mastering.

★ 3Pythonupdated 2026-04-17anthropicaudioclaudecompositiondaw

CGAlei/FasterWhisper

Real-time desktop audio transcription using OpenAI Whisper for Arch Linux with CUDA acceleration

★ 3Pythonupdated 2025-08-05

97k/mcp-audio-server

A powerful audio transcription server that seamlessly transcribes meeting recordings, generates notes, and intelligently splits audio files for efficient management. Open-source and built with FastMCP and Groq/OpenAI Whisper

★ 3Pythonupdated 2025-06-13

pmerwin/audio-transcription-mcp

MCP server for real-time audio transcription using OpenAI Whisper

★ 3TypeScriptupdated 2025-10-08

SantiagoSotoC/snapcast

Synchronous multiroom audio player

★ 2C++updated 2025-04-08

Anewryzm/transcript-generator-mcp-server

A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.

★ 2Pythonupdated 2025-06-10

Binyameensn/AI-Powered-Infant-Cry-Detector

A deep learning application that classifies the reason for a baby's cry (hunger, pain, etc.) from live or uploaded audio. Built with a TensorFlow/Keras CNN, Librosa for audio processing, and a responsive Flask web UI with real-time recording and visualization. Helps caregivers understand an infant's needs instantly.

★ 2updated 2025-08-01

SantiagoSotoC/python-snapcast

Python API for controlling Snapcast, a multi-room synchronous audio solution.

★ 1Pythonupdated 2025-05-19

veralvx/xtts-finetune

XTTS fine-tuning via CLI

★ 1Pythonupdated 2025-10-16aiai-trainingaudioaudio-processingcoqui

samihalawa/insanely-fast-whisper-mcp

Blazingly fast audio transcription MCP server using Whisper with Flash Attention 2

★ 1Pythonupdated 2025-12-04

zeglicz/subtitles-generator

App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.

★ 1Pythonupdated 2025-05-26openai-apistreamlit