Video & Media

232 repos

Sort by

yt-dlp/yt-dlp

A feature-rich command-line audio/video downloader

★ 158,626Pythonupdated 2026-04-19clidownloaderpythonsponsorblockyoutube-dl

Genymobile/scrcpy

Display and control your Android device

★ 139,017Cupdated 2026-04-19androidcffmpeglibavmirroring

immich-app/immich

High performance self-hosted photo and video management solution.

★ 98,632TypeScriptupdated 2026-04-20backup-toolfluttergoogle-photosgoogle-photos-alternativejavascript

hacksider/Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

★ 92,306Pythonupdated 2026-04-19aiai-deep-fakeai-faceai-webcamartificial-intelligence

jellyfin/jellyfin

The Free Software Media System - Server Backend & API

★ 50,847C#updated 2026-04-27csharpdotnethacktoberfestjellyfin

ruvnet/RuView

π RuView: WiFi DensePose turns commodity WiFi signals into real-time human pose estimation, vital sign monitoring, and presence detection — all without a single pixel of video.

★ 50,293Rustupdated 2026-04-20agentic-aidenseposeesp32firmwaremcu

huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

★ 49,184Rubyupdated 2026-04-19agentautomationfeedfeedgeneratorhuginn

mudler/LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

★ 45,835Goupdated 2026-04-20agentsaiapiaudio-generationdecentralized

remotion-dev/remotion

🎥 Make videos programmatically with React

★ 45,095TypeScriptupdated 2026-04-27javascriptreactvideo

videojs/video.js

Video.js - open source HTML5 video player

★ 39,713JavaScriptupdated 2026-03-11dashhlshtmlhtml5html5-audio

xinntao/Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

★ 35,193Pythonupdated 2024-08-06aminedenoiseesrganimage-restorationjpeg-compression

huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

★ 33,452Pythonupdated 2026-04-18deep-learningdiffusionfluximage-generationimage2image

ossrs/srs

SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181, with codec support for H.264, H.265, AV1, VP9, AAC, Opus, and G.711.

★ 28,769C++updated 2026-04-19audiocc-plus-plusdashhevc

ArchiveBox/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

★ 27,292Pythonupdated 2026-04-19archiveboxbackupsbookmark-archiverbrowser-bookmarkschromium

invoke-ai/InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

★ 27,049TypeScriptupdated 2026-04-20ai-artartificial-intelligencegenerative-artimage-generationimg2img

enescingoz/awesome-n8n-templates

280+ free n8n automation templates — ready-to-use workflows for Gmail, Telegram, Slack, Discord, WhatsApp, Google Drive, Notion, OpenAI, and more. AI agents, RAG chatbots, email automation, social media, DevOps, and document processing. The largest open-source n8n template collection.

★ 21,599updated 2026-04-09ai-agentsai-automationautomationautomation-templatesawesome

xbmc/xbmc

Kodi is an award-winning free and open source home theater/media center software and entertainment hub for digital media. With its beautiful interface and powerful skinning engine, it's available for Android, BSD, Linux, macOS, iOS, tvOS and Windows.

★ 20,703C++updated 2026-04-24androidc-plus-plusentertainment-hubhacktoberfesthome-theater

navidrome/navidrome

🎧 Your Personal Streaming Service

★ 20,701Goupdated 2026-04-20airsonicmadsonicmedia-servermusicmusic-server

k4yt3x/video2x

A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.

★ 19,670C++updated 2026-03-07anime4kframe-interpolationmachine-learningneural-networksrealcugan

bluenviron/mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.

★ 18,581Goupdated 2026-04-19gogolanghlsmedia-serverobs-studio

livekit/livekit

End-to-end realtime stack for connecting humans and AI

★ 18,378Goupdated 2026-04-20golangmedia-serversfuvideovoice

alyssaxuu/screenity

The free and privacy-friendly screen recorder with no limits 🎥

★ 18,138JavaScriptupdated 2026-04-08annotationannotation-toolaudiocamerachrome-extension

AaronFeng753/Waifu2x-Extension-GUI

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.

★ 16,459C++updated 2026-04-19animeanime4kesrganframe-interpolationimage-enlarger

Wan-Video/Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

★ 15,911Pythonupdated 2026-03-05aigcvideogeneration

Tyrrrz/YoutubeDownloader

Downloads videos and playlists from YouTube

★ 14,803C#updated 2026-04-19downloaddownloaderffmpegmp3mp4

HumanAIGC/AnimateAnyone

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation

★ 14,782updated 2025-09-20

oxnr/awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

★ 14,366updated 2026-02-05awesomeawesome-listbigdatadatadata-analytics

owainlewis/awesome-artificial-intelligence

A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.

★ 13,518updated 2025-08-12aiartificial-intelligencedeep-learningintelligent-machinesintelligent-systems

AlexxIT/go2rtc

Ultimate camera streaming application

★ 12,882Goupdated 2026-03-23ffmpeggogolanghassiohls

zai-org/CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

★ 12,679Pythonupdated 2025-11-04cogvideoximage-to-videollmsoratext-to-video

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

★ 12,594Pythonupdated 2026-04-15asrcode-switchconformerkwspunctuation-restoration

streamlink/streamlink

Streamlink is a CLI utility which pipes video streams from various services into a video player

★ 11,433Pythonupdated 2026-04-26clilivestreampythonstreamingstreaming-services

QwenLM/Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.

★ 10,962Pythonupdated 2026-03-17

heygen-com/hyperframes

Write HTML. Render video. Built for agents.

★ 10,875TypeScriptupdated 2026-04-20aianimationffmpegframeworkgsap

perspective-dev/perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

★ 10,457C++updated 2026-04-23analyticsbidata-visualizationjavascriptjupyter

TEN-framework/ten-framework

Open-source framework for conversational voice AI agents

★ 10,447Pythonupdated 2026-04-14aimulti-modalreal-timevideovoice

livekit/agents

A framework for building realtime voice AI agents 🤖🎙️📹

★ 10,222Pythonupdated 2026-04-20agentsaiopenaireal-timevideo

cookpete/react-player

A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion

★ 10,218TypeScriptupdated 2025-11-13audiodailymotiondashfacebookhls

leeoniya/uPlot

📈 A small, fast chart for time series, lines, areas, ohlc & bars

★ 10,102JavaScriptupdated 2026-04-22analyticschartchartsdata-visualizationgraph

Lightricks/LTX-Video

Official repository for LTX-Video

★ 10,102Pythonupdated 2026-01-05diffusion-modelsditimage-to-videoimage-to-video-generationtext-to-video

pikvm/pikvm

Open and inexpensive DIY IP-KVM based on Raspberry Pi

★ 9,965updated 2026-04-06atxhardwarehdmiip-kvmipkvm

H-M-H/Weylus

Use your tablet as graphic tablet/touch screen on your computer.

★ 9,120Rustupdated 2026-04-14androidandroid-applicationappbrowserffmpeg

YaoFANGUK/video-subtitle-extractor

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

★ 8,695Pythonupdated 2026-04-09deep-learningextracthardsubocrripper

Anil-matcha/Open-Generative-AI

Uncensored, open-source alternative to Higgsfield AI, Freepik AI, Krea AI, Openart AI — Free, unrestricted AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.

★ 8,455JavaScriptupdated 2026-04-23ai-art-generatorai-image-generationai-video-generationcreative-toolsflux-1

mediaelement/mediaelement

HTML5 <audio> or <video> player with support for MP4, WebM, and MP3 as well as HLS, Dash, YouTube, Facebook, SoundCloud and others with a common HTML5 MediaElement API, enabling a consistent UI in all browsers.

★ 8,298JavaScriptupdated 2025-11-12dashflashhlshtml5html5-audio

nadermx/backgroundremover

Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.

★ 7,852Pythonupdated 2026-03-21aibackground-removalbackground-removerbackgroundremoverphoto-editing

GetStream/Vision-Agents

Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

★ 7,688Pythonupdated 2026-04-17agentic-aiagentsaiai-agentsrealtime

snapcast/snapcast

Synchronous multiroom audio player

★ 7,595C++updated 2026-03-10audioaudio-playeraudio-streaminglmsmultiroom-audio

clappr/clappr

An extensible, plugin-oriented, HTML5-first media player for the web

★ 7,451JavaScriptupdated 2026-04-20clapprdashhlshtml5-audiohtml5-video

vladmandic/sdnext

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

★ 7,075Pythonupdated 2026-04-20ai-artcaptiondiffusersgenerative-artpython

enricoros/big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

★ 6,935TypeScriptupdated 2026-04-20agiai-agentsai-suiteai-workspaceanthropic-api

4gray/iptvnator

:tv: Cross-platform IPTV player application with multiple features, such as support of m3u and m3u8 playlists, favorites, TV guide, TV archive/catchup and more.

★ 5,899TypeScriptupdated 2026-04-19chromeoselectronepgfair-sourceiptv

promptslab/Awesome-Prompt-Engineering

This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc

★ 5,820TypeScriptupdated 2026-04-20chatgptchatgpt-apideep-learningfew-shot-learninggpt

alpkeskin/mosint

An automated e-mail OSINT tool

★ 5,794Goupdated 2024-02-02automationdata-breachemailemail-checkergo

OpenShot/openshot-qt

OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.

★ 5,695Pythonupdated 2026-04-18c-plus-plusffmpeggplv3openshotpython

NickvisionApps/Parabolic

Download web video and audio

★ 5,616C#updated 2026-04-20csharpdownloaderflathubgnomegtk4

showlab/Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

★ 5,610updated 2026-04-03awesomediffusion-modelsmotion-customizationvideo-editingvideo-generation

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

★ 5,437Rustupdated 2026-04-20ai-engineeringai-pipelinearrowartificial-intelligencebig-data

open-mmlab/mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

★ 5,002Pythonupdated 2026-03-18action-recognitionavabenchmarkdeep-learningi3d

datarhei/restreamer

The Restreamer is a complete streaming server solution for self-hosting. It has a visually appealing user interface and no ongoing license costs. Upload your live stream to YouTube, Twitch, Facebook, Vimeo, or other streaming solutions like Wowza. Receive video data from OBS and publish it with the RTMP and SRT server.

★ 4,995HTMLupdated 2025-12-29ffmpegffmpeg-apiffmpeg-serverh264hls

KDE/kdenlive

Free and open source video editor, based on MLT Framework and KDE Frameworks

★ 4,948C++updated 2026-04-20

microsoft/VoTT

Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.

★ 4,430TypeScriptupdated 2021-12-06annotation-toolcntkdeep-learningdetectiondetection-model

HumanSignal/awesome-data-labeling

A curated list of awesome data labeling tools

★ 4,314updated 2024-06-173d-annotationannotationannotation-toolaudio-annotationaudio-annotation-tool

WyattBlue/auto-editor

Effort free video editing!

★ 4,206Nimupdated 2026-04-17audioaudio-editingaudio-processingautomaticnim

Clooos/Bubble-Card

Bubble Card is a minimalist card collection for Home Assistant with a nice pop-up touch.

★ 4,141JavaScriptupdated 2026-04-16buttoncardcardscustom-cardcustom-cards

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

★ 4,063Pythonupdated 2026-04-20agiaudio-evaluationbenchmarkevaluationlarge-language-models

QwenLM/Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

★ 3,981Jupyter Notebookupdated 2025-06-12

AzuraCast/AzuraCast

A self-hosted web radio management suite, including turnkey installer tools for the full radio software stack and a modern, easy-to-use web app to manage your stations.

★ 3,824PHPupdated 2026-04-17icecastliquidsoapradioradio-stationshoutcast

QwenLM/Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

★ 3,705Jupyter Notebookupdated 2026-01-08

umlx5h/LLPlayer

The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!

★ 3,657C#updated 2026-04-19asrcsharpflyleaflanguage-learningllm

ufal/whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

★ 3,603Pythonupdated 2025-11-12

facebookresearch/pytorchvideo

A deep learning library for video understanding research.

★ 3,554Pythonupdated 2026-01-12

Lightricks/ComfyUI-LTXVideo

LTX-Video Support for ComfyUI

★ 3,517Pythonupdated 2026-04-13comfyuidiffusion-modelsditimage-to-videoimage-to-video-generation

vidstack/player

UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.

★ 3,485TypeScriptupdated 2026-04-19accessibilityanalyticsaudiohlshtml

SeaDve/Kooha

Elegantly record your screen

★ 3,344Rustupdated 2026-04-20gnomegstreamergtk-rsgtk4gtk4-rs

calesthio/OpenMontage

World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.

★ 3,247Pythonupdated 2026-04-24agentagentic-aiaiclaudecopilot

yunlong10/Awesome-LLMs-for-Video-Understanding

🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.

★ 3,162updated 2026-03-28

SamurAIGPT/Generative-Media-Skills

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

★ 3,099Shellupdated 2026-04-13agent-toolsai-agentsai-artai-musicai-video

Djdefrag/QualityScaler

QualityScaler - image/video AI upscaler app

★ 3,036Pythonupdated 2026-04-05amdanimecompression-artifact-reductiondeep-learningdirectx-12

roflcoopter/viseron

Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

★ 3,011Pythonupdated 2026-04-17coralcudadarknetedgetpuface-recognition

ErlichLiu/DeepClaude

Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. 🌊 Streaming & Non-Streaming Support. ✨ Experience the Future of AI – Today! Click to Try Now! ✨

★ 2,790Pythonupdated 2026-02-23aiclaude-3-7-sonnetdeepseekgemini

drewnoakes/metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files

★ 2,776Javaupdated 2026-04-20epsexificciptcjava

datachain-ai/datachain

Data context layer for unstructured data - images, video, sensor data, text and PDFs

★ 2,737Pythonupdated 2026-04-20claude-codecodexdata-context-layerdata-processingharness

HKUDS/ViMax

"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"

★ 2,711Pythonupdated 2026-03-29agentic-aigcvideo-generation

wendy7756/AI-Video-Transcriber

Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.

★ 2,548Pythonupdated 2026-03-07aitooltiktoktranscribevideototextyoutube

bytewax/awesome-public-real-time-datasets

A list of publicly available datasets with real-time data maintained by the team at bytewax.io

★ 2,431updated 2026-04-13awesome-listdatadata-sciencedata-visualizationdatasets

szTheory/exifcleaner

Cross-platform desktop GUI app to clean image metadata

★ 2,419Perlupdated 2026-04-03concurrencydark-modedesktop-appelectronexif

numz/ComfyUI-SeedVR2_VideoUpscaler

Official SeedVR2 Video Upscaler for ComfyUI

★ 2,380Pythonupdated 2025-12-24aicomfyuicomfyui-nodesupscalervideo-processing

fal-ai-community/video-starter-kit

Enable AI models for video production in the browser

★ 2,345TypeScriptupdated 2025-06-12aimediavideo

google-research-datasets/Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

★ 2,327Jupyter Notebookupdated 2026-03-063d3d-reconstruction3d-visionaiaugmented-reality

m1guelpf/auto-subtitle

Automatically generate and overlay subtitles for any video.

★ 2,203Pythonupdated 2024-07-12ffmpegopenai-whispersubtitle-generatorsubtitlessubtitles-generator

leaperone/MultiPost-Extension

A browser extension that helps users publish content to multiple social media platforms with one click.

★ 2,183TypeScriptupdated 2026-03-03articleautomationcontent-platformmarketingmarketing-automation

Alfredredbird/tookie-osint

Tookie is a advanced OSINT information gathering tool that finds social media accounts based on inputs.

★ 2,148Pythonupdated 2026-04-09cyber-securitycybersecurityhacking-toolhacking-toolsinformation-gathering

TNTwise/REAL-Video-Enhancer

Interpolate, Upscale, Decompress, and Denoise videos easily on Linux/Windows/MacOS.

★ 1,911Pythonupdated 2026-04-19guiinterpolationlinuxmacosreal-esrgan

vzhd1701/gridplayer

Play videos side-by-side

★ 1,907Pythonupdated 2026-01-13libvlclivestreamplayerplayer-videopyqt

Tencent-Hunyuan/HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

★ 1,810Pythonupdated 2026-04-07diffusion-modelsimage-to-videoimage-to-video-generationvideogeneration

mltframework/mlt

MLT Multimedia Framework

★ 1,765Cupdated 2026-04-20audioaudio-processingcc-plus-plusffmpeg

Daisy-Zhang/Awesome-Deepfakes-Detection

A list of tools, papers and code related to Deepfake Detection.

★ 1,763updated 2025-09-02awesomecodedatasetdeepfake-detectiondeepfakes

szczyglis-dev/py-gpt

Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, speech synthesis and recognition, web search, memory, presets, assistants,and more. Linux, Windows, Mac

★ 1,749Pythonupdated 2026-02-06aiai-assistantartificial-intelligenceautonomous-agentchatbot

kalkih/mini-media-player

Minimalistic media card for Home Assistant Lovelace UI

★ 1,696TypeScriptupdated 2026-03-06automationcustomhacktoberfesthassiohome-assistant

LMS-Community/slimserver

Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.

★ 1,688Perlupdated 2026-04-16logitech-media-serverlyrionlyrion-music-servermusicperl

scottlamb/moonfire-nvr

Moonfire NVR, a security camera network video recorder

★ 1,688Rustupdated 2026-04-07cameraip-camerajavascriptnvrrtsp

jrottenberg/ffmpeg

Docker build for FFmpeg on Ubuntu / Alpine / Centos / Scratch / nvidia / vaapi

★ 1,627Pythonupdated 2026-04-12alpinecentosdockerffmpegnvidia

Kosinkadink/ComfyUI-VideoHelperSuite

Nodes related to video workflows

★ 1,603Pythonupdated 2026-04-14

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

★ 1,591Pythonupdated 2025-01-01deep-learninglanguage-modelmachine-learningmulti-modal-learningnatural-language-processing

MiniMax-AI/cli

Generate text, images, video, speech, and music by MiniMax.

★ 1,578TypeScriptupdated 2026-04-20ai

music-assistant/server

Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistant and must run on an always-on device like a Raspberry Pi, a NAS or an Intel NUC or alike.

★ 1,520Pythonupdated 2026-04-20

akai-katto/dandere2x

Dandere2x - Fast Waifu2x Video Upscaling.

★ 1,507C++updated 2023-08-17compressionfastfastervideowaifu2x

royshil/obs-localvocal

OBS plugin for local speech recognition and captioning using AI

★ 1,458C++updated 2026-04-09ailive-streaminglivestreamobsobs-studio

sczhou/Upscale-A-Video

[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

★ 1,457Pythonupdated 2024-09-27aigc-enhancementdeflickervideo-diffusion-modelvideo-super-resolution

MiniMax-AI/MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

★ 1,443Pythonupdated 2026-04-15image-generationimage-to-videomcpmcp-servermcp-tools

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

★ 1,412updated 2026-04-20audioaudio-processingaudio-visual-understandingbytedanceiclr2024

tin2tin/Pallaidium

PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.

★ 1,368Pythonupdated 2026-04-03aiaicinemablenderchatterboxdiffusion

video-db/Director

AI video agents framework for next-gen video interactions and workflows.

★ 1,367Pythonupdated 2026-01-23agentagent-frameworkai-agentsframeworkllm

pagedjs/pagedjs

Display paginated content in the browser and generate print books using web technology

★ 1,310HTMLupdated 2026-04-23htmlpaged-mediapdfpolyfillprinting

zanllp/infinite-image-browsing

A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.

★ 1,291Vueupdated 2026-04-08audiocomfyuiextensionfile-explorerfile-server

originalankur/awesome-django-admin

Curated List of Awesome Django Admin Panel Articles, Libraries/Packages, Books, Themes, Videos, Resources.

★ 1,236updated 2026-02-24articleawesomeawesome-listdjangodjango-admin

Francis-Rings/StableAvatar

We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a reference image and audio.

★ 1,232Pythonupdated 2026-01-20aigcavatar-generatorvideo-generation

sugyan/claude-code-webui

Web-based interface for Claude CLI with streaming chat responses

★ 1,053TypeScriptupdated 2025-11-03claudeclaude-cliweb-ui

kerberos-io/agent

An open and scalable video surveillance system for anyone making this world a better and more peaceful place.

★ 1,019Goupdated 2026-04-13dockergolangipcameramotiondetectionmotiondetector

chigwell/telegram-mcp

Telegram MCP server powered by Telethon to let MCP clients read chats, manage groups, and send/modify messages, media, contacts, and settings.

★ 1,004Pythonupdated 2026-04-12adminapichat-managementcontactsgroups

AJaySi/ALwrity

ALwrity - AI Digital Marketing Platform. (WIP)

★ 1,000Pythonupdated 2026-04-20ai-content-generationai-content-marketingai-digital-marketingai-seo-toolsai-social-media

ivandokov/phockup

Media sorting tool to organize photos and videos from your camera in folders by year, month and day.

★ 995Pythonupdated 2024-05-06cameraexiftoolorganize-media-filesorganize-photosphotobackup

octimot/StoryToolkitAI

An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models

★ 936Pythonupdated 2025-02-26aichatgptdavinci-resolveeditingfilm-editing

kardolus/chatgpt-cli

ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports OpenAI, Azure, Perplexity, LLaMA, and more, with features like streaming, interactive chat, prompt files, image/audio I/O, MCP tool calls, and an experimental agent mode for safe, multi-step automation.

★ 917Goupdated 2026-03-22agentagentic-aiazurechatgptcli

HHK1/PryntTrimmerView

A set of tools to trim, crop and select frames inside a video

★ 908Swiftupdated 2024-12-11cropcroppingiosswiftthumbnail

thumbsup/thumbsup

Generate static HTML photo / video galleries

★ 852JavaScriptupdated 2026-02-28galleryphotographyphotosstatic-site-generatorstatic-website

TheStageAI/TheWhisper

Optimized Whisper models for streaming and on-device use

★ 829Pythonupdated 2026-04-09apple-siliconcoremlmlxnvidia-gpuon-device-ai

ableplayer/ableplayer

fully accessible cross-browser HTML5 media player.

★ 815JavaScriptupdated 2026-04-20

mayeaux/generate-subtitles

Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration

★ 809JavaScriptupdated 2023-03-16expressjsgpulibretranslatemachine-learningnodejs

nmatter1/smallville

Generative Agents for video games. Based on Generative Agents: Interactive Simulacra of Human Behavior

★ 760Javaupdated 2023-10-11generative-agents

bencevans/node-sonos

🔈 Sonos Media Player Interface/Client

★ 719JavaScriptupdated 2026-04-15home-automationjavascriptmusicnodejssonos

ModelTC/LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

★ 711Pythonupdated 2026-04-01awqbenchmarkdeepseek-v3deploymentevaluation

infinilabs/coco-app

🥥 Coco AI App - Search, Connect, Collaborate, Personal AI Search and Assistant, all in one space.

★ 690TypeScriptupdated 2026-04-17ai-searchai-search-engineassistantcmd-kdeepseek

idefasoft/Emora-Project

Emora is an OSINT tool like sherlock but with a GUI, which search for accounts by username across social networks

★ 688C#updated 2026-02-07analysisautomationcsharpcybersecuritydiscovery

Haervwe/open-webui-tools

Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. With a suite of over 15 specialized tools, function pipelines, and filters, this project supports academic research, agentic autonomy, multimodal creativity, workflows, and more

★ 681Pythonupdated 2026-04-20academic-researchai-agentsai-workstationarxivcomfyui

jonigl/mcp-client-for-ollama

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.

★ 665Pythonupdated 2026-04-15agentic-aiaicommand-line-toolgenerative-ailinux

talesofai/comfyui-browser

An image/video/workflow browser and manager for ComfyUI.

★ 659Svelteupdated 2024-11-11comfyuicomfyui-browsercomfyui-managerstable-diffusionworkflows

Agents365-ai/video-podcast-maker

AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.

★ 647Pythonupdated 2026-04-27agent-skillsai-videobilibiliclaude-codeclaude-code-skill

QmiAI/Qmedia

An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos. Allows full local deployment (web app, RAG server, LLM server). Supports multi-modal RAG content Q&A.

★ 619TypeScriptupdated 2026-04-09contentcontent-searchragsearchsearch-engine

bbc/react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress

★ 614JavaScriptupdated 2024-02-12bbc-news-labskaldinews-labsreactstt

HKUDS/VideoAgent

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

★ 603Pythonupdated 2025-10-17agentsaudio-editingaudio-understandingllm-agentsnotebooklm

bacnet-stack/bacnet-stack

BACnet Protocol Stack library provides a BACnet application layer, network layer and media access (MAC) layer communications services.

★ 553Cupdated 2026-04-20avrbacnetbacnet-clientbacnet-ipbacnet-library

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

★ 548HTMLupdated 2025-04-04aigclarge-language-modelslarge-vision-language-modelsllmlvlm

zeenolife/ai-baby-monitor

Local Video-LLM powered AI Baby Monitor

★ 500Pythonupdated 2025-05-22baby-monitorvideo-llm

SWHL/RapidVideOCR

🎦 Extract video hard subtitles and automatically generate corresponding srt files.

★ 498Pythonupdated 2025-09-11ocrrapid-videocrsubtitlevideovideosubfinder

ralph-irving/squeezelite

Lightweight headless squeezebox player for Lyrion Media Server

★ 491Cupdated 2026-04-20

hollowaykeanho/Upscaler

A consolidation of various compiled open-source AI image/video upscaling product for a working CLI friendly image and video upscaling program.

★ 481Shellupdated 2025-05-08amd64amd64onlybsd-3-clauseclidebian

receptron/mulmocast-cli

AI-powered podcast & video generator.

★ 451TypeScriptupdated 2026-04-20

URUWorks/TeroSubtitler

Tero Subtitler is an open source, cross-platform, and free subtitle editing software.

★ 443Pascalupdated 2026-04-18aiaudio-to-textblu-raycaptionseditor

sb2702/free-ai-video-upscaler

Source code for free AI video upscaler tool

★ 408TypeScriptupdated 2026-02-10upscalingvideo-enhancementwebcodecswebglwebgpu

poseljacob/agentic-video-editor

AI-powered video editor that turns raw footage and a creative brief into a polished ad using an ensemble of AI agents (Google Gemini + FFmpeg)

★ 387Pythonupdated 2026-04-14

raindrop313/ComfyUI-WanVideoStartEndFrames

Start and end frames video generation nodes based on the modified Kijai version Wan2.1 nodes

★ 383Pythonupdated 2025-03-22

WeChatCV/Wan-Alpha

[CVPR 2026 Highlight] High-Quality Text-to-Video Generation with Alpha Channel

★ 360Pythonupdated 2026-04-09

freddyaboulton/orpheus-cpp

Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)

★ 353Pythonupdated 2025-04-10

FrankDavis236869/spyder-osint

Spyder OSINT GUI — Graphical open-source intelligence research tool for phone number lookup, IP geolocation, social media reconnaissance, email validation, domain WHOIS, username search, with multi-module architecture and Tkinter-based interface for digital forensics

★ 341Pythonupdated 2026-03-13digital-forensicsdomain-lookupemail-osintinformation-gatheringip-lookup

dheereshag/coloured-icons

The coloured icons library is a collection of brand logos and tech stack logos. It's a handy resource to easily add brand icons to your projects without the hassle of manual attribution.

★ 314CSSupdated 2026-03-21brand-iconcoding-iconcoding-iconscolor-iconscolored-icons

music-assistant/mobile-app

The (official) Music Assistant Mobile app is a cross-platform client application designed for Android, iOS, and Java runtime environments. Developed using Kotlin Multiplatform (KMP) and Compose Multiplatform frameworks, this project aims to provide a unified codebase for seamless music management across multiple platforms.

★ 285Kotlinupdated 2026-04-15androidandroid-apphome-assistantiosios-app

vargHQ/sdk

AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.

★ 284TypeScriptupdated 2026-04-27ai-sdkai-videoclaude-codecursorelevenlabs

streamwall/streamwall

Display a mosaic of livestreams. Built for streaming.

★ 282TypeScriptupdated 2026-02-02facebookinstagramlivestreammultistreamperiscope

daya0576/nightwatcher

Web-based IP camera viewer for fast, simple streaming in any browser 🦇

★ 263Pythonupdated 2026-03-31

zhanghengdev/awesome-video-object-detection

This is a list of awesome articles about object detection from video.

★ 247updated 2019-07-01awesome-listcomputer-visiondeep-learningdeep-neural-networksobject-detection

diffusionstudio/agent

The agentic video editing framework

★ 244Pythonupdated 2025-02-10

BB31420/AI-Auto-Video-Generator

An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using OpenAI's GPT-3, creates images using OpenAI's DALL-E, adds voiceover using ElevenLabs API, and combines the elements into a video.

★ 216Pythonupdated 2024-09-17aiartificial-intelligencedall-eediting-videosopenai

aregrid/frame

Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delivers a seamless vibe video editing experience.

★ 213updated 2025-05-06ai-video-agentcursorvideo-cupvideo-editingvideo-editor

yuvraj108c/4k-video-upscaler-colab

Upscale your videos up to 4k on free google colab using Real-ESRGAN

★ 207Jupyter Notebookupdated 2025-05-014krealesrganstable-diffusionsuper-resolutionupscaler

kantv-ai/kantv

workbench for learning and practicing on-device AI technology in real scenario with online-TV on Android phone, powered by ggml(llama.cpp,whisper.cpp...) and FFmpeg and opencv-mobile

★ 192C++updated 2025-06-12ggml-hexagonllamacpp-android-portonline-tvqualcomm-npustablediffusioncpp-android-port

botbahlul/PyAutoSRT

PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file

★ 189Pythonupdated 2024-05-05auto-captionauto-subtitlecaptionsffmpeggoogle-translate

WomenDefiningAI/claudecode-writer

A Claude Code workspace that transforms ideas into multi-format content: research → long-form articles → platform-specific versions (LinkedIn, newsletter, social media, podcast Q&A)

★ 178updated 2025-09-06

ccallazans/ai-video-generator

Automate Creation of Story-Based Videos.

★ 173Goupdated 2024-09-22dockergenerative-aigolangllmollama

jamditis/claude-skills-journalism

Claude Code skills for journalism, media, and academia - verification, FOIA, data journalism, academic writing, and more

★ 170HTMLupdated 2026-04-22academic-writingclaudeclaude-codeclaude-skillsdata-journalism

WaveSpeedAI/wavespeed-desktop

A cross-platform desktop application for running AI models from [WaveSpeedAI](https://wavespeed.ai), as well as many free local AI models including Z-Image.

★ 147TypeScriptupdated 2026-04-24aiai-image-generationai-image-generatorimagevideo

tgxn/lemmy-explorer

Instance and Community Explorer for Lemmy

★ 145TypeScriptupdated 2026-02-02fediverselemmylemmyversembinnodejs

vivoCameraResearch/Hyper-Motion

HyperMotion is a pose guided human image animation framework based on a large-scale video diffusion Transformer.

★ 140Pythonupdated 2026-03-10dithuman-video-animationhuman-video-generationmotion-generationpose-guided-text-to-image-generation

CIntellifusion/MultiWorld

Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models

★ 136Pythonupdated 2026-04-21action-conditioneddiffusion-modelsgame-generationinteractive-videomulti-agent

video-creator/ffmpeg-mcp

Using ffmpeg command line to achieve an mcp server, can be very convenient, through the dialogue to achieve the local video search, tailoring, stitching, playback,clip, overlay, concat and other functions

★ 132Pythonupdated 2025-05-13

nomadkaraoke/karaoke-gen

Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive

★ 128HTMLupdated 2026-04-27karaokekaraoke-makerlyricsmusicvideo

kevinbadi/hyperedit

AI-powered video editor with FFMPEG, Remotion, & Obsidian Agents Baked in

★ 126TypeScriptupdated 2026-04-15

ShmuelRonen/ComfyUI-VideoUpscale_WithModel

A memory-efficient implementation for upscaling videos in ComfyUI using non-diffusion upscaling models. This custom node is designed to handle large video frame sequences without memory bottlenecks.

★ 121Pythonupdated 2025-09-18

GStreamer/gstreamer-vaapi

Hardware-accelerated video decoding, encoding and processing on Intel graphics through VA-API. This module has been merged into the main GStreamer repo for further development.

★ 117Cupdated 2020-04-10

bilalnawaz072/AI-Prompts-200-Ideas

Here is over 200 AI prompts that covers Blog Writing, Email Marketing , YouTube Ad Scripts, Facebook Ad,YouTube Video Ideas,Twitter Thread ,Cold DM Ideas,Influencer Marketing and Copywriting and Instagram Story.

★ 114updated 2023-02-08aibardbingchatgptchatgpt3

noophq/subtitle

Convert subtitles from one format to another format. Supported formats: STL EBU, TTML SMI, VTT, SRT

★ 109Javaupdated 2025-07-12video

autoshow/autoshow

End-to-end workflow to automatically generate show notes from audio/video transcripts

★ 94TypeScriptupdated 2026-02-25assembly-aichatgptclaudedeepgramgemini

blueOkiris/bgrm

Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

★ 84Pythonupdated 2023-05-16background-removallinuxpythonv4l2loopbackvideo

Xingsandesu/CarrotAI

CarrotAI is a cutting-edge AI agent application that delivers real-time streaming chat via Server-Sent Events (SSE) with built-in Model Control Protocol (MCP) integration. It supports concurrent connections to multiple SSE MCP servers and provides user interfaces in English, Chinese, and Japanese.

★ 83Dartupdated 2025-05-10

voun7/VidSubX

A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.

★ 70Pythonupdated 2026-03-21chineseocrhardsubocrocr-pythonsrt

dweve-ai/hedl

Token-efficient data serialization for LLM/AI. 50% fewer tokens than JSON, 93% better value/token. Rust, schema validation, LSP.

★ 65Rustupdated 2026-04-20ai-mlclicsvdata-formatjson-alternative

neonwatty/bleep-that-shit

Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.

★ 63TypeScriptupdated 2026-04-16ffmpegpodcastprofanity-filterspeech-to-texttransformersjs

xychelsea/ffmpeg-docker

FFmpeg compiled inside an NVIDIA-enabled Docker Container

★ 62Dockerfileupdated 2025-12-02

mikeesto/gemini-transcribe

Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash

★ 61Svelteupdated 2026-04-07geminigemini-flashspeaker-diarizationspeech-to-textsveltekit

bitscorp-mcp/mcp-ffmpeg

★ 58TypeScriptupdated 2026-02-22

movieofthenight/streaming-availability-api

Streaming Availability API allows getting streaming availability information of movies and series; and querying the list of available shows on streaming services such as Netflix, Disney+, Apple TV, Max and Hulu across multiple countries!

★ 56Goupdated 2025-09-08amazon-prime-videoapiapi-clientapple-tvappletv

Eyevinn/auto-subtitles

Automatically generate subtitles from an input audio or video file using OpenAI Whisper

★ 53TypeScriptupdated 2026-03-17ffmpegopenaiopenai-whispersubtitle-generatorsubtitles

ChrisRoyse/clipcannon

Worlds First AI Video Editor and Voice Cloner

★ 52Pythonupdated 2026-04-11

lars-frogner/OpenBabyMonitor

A user-friendly Raspberry Pi baby monitor with cry detection and audio/video streaming.

★ 52PHPupdated 2026-02-03

TahaBakhtari/SubtitleGenerator

This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.

★ 50Pythonupdated 2025-03-30

ozdemir08/youtube-video-summarizer

★ 48TypeScriptupdated 2024-04-15

Mateusz-Dera/ROCm-AI-Installer

Installation script for an AI applications using ROCm on Linux.

★ 45Shellupdated 2026-04-183daiamdamdgpuaudio

cloudinary-community/astro-cloudinary

🚀 High-performance image and video delivery and uploading at scale in Astro powered by Cloudinary.

★ 31TypeScriptupdated 2025-11-13astroastro-loadercloudinarycloudinary-sdk

camgraphe/MaxVideoAi

Compare and generate AI videos across Sora, Veo, Kling, Seedance & more.

★ 30TypeScriptupdated 2026-04-26ai-toolsai-video-generationai-video-generatorfal-aigenerative-ai

luoyuweidu1/podcastcut-skills

Claude Code Skills for podcast/video editing: transcription, content editing, rough/fine cut, final polish

★ 30HTMLupdated 2026-03-17

guardian/language-system

The Multi-Language Automatic Translation, Subtitling, and Voice Rendering System uses third party software to automatically convert audio to text, translate text, render text to video, and render text to audio.

★ 29PHPupdated 2024-07-29audiolanguagephpspeechsrt

marc-shade/world-intel-mcp

100+ tool MCP server for real-time global intelligence — markets, FX, bonds, earnings, SEC filings, conflict, military, cyber, climate, news, company enrichment, and 30+ domains. Live Leaflet dashboard with 20 map layers, SSE streaming, and AI situation briefs.

★ 23Pythonupdated 2026-04-06ai-toolsai-watchanthropicclaudecybersecurity

HackerLion123/subtitles_generator

Automatically create subtitles for any video using google speech to text cloud api.

★ 22Pythonupdated 2018-08-23

cxyfer/GeminiASR

A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.

★ 19Pythonupdated 2026-01-02asrgeminigemini-apitranscribe

AlbinTouma/Iran-War-Media

Iran War Media Monitor collects news articles covering the US-Israeli war on Iran and applies sentiment analysis to uncover who supports and opposes the war.

★ 17Pythonupdated 2026-03-24

InboraStudio/Subtitle-Generator-AI

Open AI Video Subtitle Generator Agent Generate .srt subtitle files for any video no length limit, 100% free, offline, and runs locally on your machine.

★ 17Pythonupdated 2025-10-20editinggui-applicationpython3subtitlessubtitles-generator

schnoddelbotz/whisper-ui

Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.

★ 17Goupdated 2026-01-08ffmpegffmpeg-wrapperfyneguilocal

dnhen/vidgrid

Play multiple live videos simultaneously in a grid

★ 15TypeScriptupdated 2025-12-14gridlivemultiplayerstream

Ichthyostega/Lumiera

The new emerging Non Linear Video Editor for Linux. Backup of Lumiera master repository

★ 15C++updated 2026-04-19

azkadev/whisper_flutter

Whisper Flutter Example Speech To Text Offline Android Linux Without Api Key Without FFMPEG

★ 10C++updated 2025-08-02aiazkadevdartflutterggml

gsu-library/whisper-scribe

An audio/video transcriber with diarization and transcription editing.

★ 10JavaScriptupdated 2026-03-17

GitJuhb/voice-typing-linux

Fast, accurate voice typing for Linux — IBus input method engine with streaming STT, Whisper refinement, and CUDA acceleration

★ 10Pythonupdated 2026-02-10accessibilityfaster-whisperlinuxnixosspeech-to-text

felores/cloudinary-mcp-server

MCP (Model Context Protocol) server for uploading media to Cloudinary using Claude Desktop

★ 10JavaScriptupdated 2025-03-13

owenguoo/LifeOS

a multi-modal MCP layer for real life — built on continuous video, semantic search and natural language video understanding.

★ 9Pythonupdated 2025-09-03

openresearchtools/transcribeoffline

Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing

★ 9Rustupdated 2026-03-21ailocalaimacosopen-sourcetranscribe

Raafat-Nagy/YOLO-Object-Detection-App

A modern FastAPI-based web app for real-time object detection using YOLO models, supporting image and video uploads, model selection, live streaming, and interactive UI.

★ 9Pythonupdated 2025-06-28ai-projectback-endcomputer-visionfastapifront-end

SoferAi/torah-dl

Library and tool for downloading media and content from Torah websites.

★ 7Pythonupdated 2026-04-17mediapythontorah

swimmingkiim/video-edit-tools

Deterministic video editing SDK for AI agents. Ships with MCP tools.

★ 5TypeScriptupdated 2026-03-15ai-agentclaudeeditingffmpegmcp

Geun-Oh/s3-mcp-server

⚙️ A Model Context Protocol (MCP) server for accessing Amazon S3 buckets. This server provides seamless integration with S3 storage through MCP, allowing efficient handling of large files including PDFs through streaming capabilities.

★ 5TypeScriptupdated 2025-07-06

idodov/MusicTracker

This Python script is designed to track music played on Home Assistant media players. It stores track information in a SQLite database and provides various statistics

★ 5Pythonupdated 2025-07-19

broomva/ltx-video

LTX-2.3 video generation skill — setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model

★ 4Pythonupdated 2026-03-27

VaishakhVipin/whispers-final

🗣️ Whispers Talk. Recall. Repeat. A blazing-fast voice journal that remembers everything you say — searchable with AI. ✨ What is Whispers? Whispers is a voice-first journaling app powered by: 🧠 <300ms Latency Streaming Transcription (AssemblyAI) 🔍 Algolia MCP for instant search of your thoughts

★ 4TypeScriptupdated 2025-07-28

mateusz-kow/auto-subs-legacy

An offline-first desktop app to automatically transcribe and edit video subtitles using OpenAI Whisper. Full control over text, timing, and advanced styling in a powerful, intuitive editor.

★ 4Pythonupdated 2025-08-16

Wiecek-K/local-dictation-assistant

A fully offline, high-performance, streaming speech-to-text tool for developers on Linux.

★ 3Pythonupdated 2025-10-24

wanfuse123/nvidia-ffmpeg

installation for nvidia with cuda and ffmpeg encode on video card on Ubuntu 22.04 with GeForce GTX 1050 Ti

★ 3Shellupdated 2022-08-19

Anewryzm/transcript-generator-mcp-server

A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.

★ 2Pythonupdated 2025-06-10

AnbudanAdithya/Text_Analytics_Iran_Israel_Cross_Layer_Analysis

This repository contains a multi layer analysis of news articles, editorial opinions and public comments about the ongoing Iran - Israel War. It synthesis the dominant themes by perspectives by global media channels and what is convergence/divergence of editor's opinions and common public to news articles.

★ 1Jupyter Notebookupdated 2026-04-08

BitBOY21/IsraTV-app-android-Israel-IPTV

Watch Israeli TV channels live on Android. A simple, fast, and modern app for Israeli IPTV with PiP support and favorites. Built with Jetpack Compose.

★ 1Kotlinupdated 2026-04-13androidexoplayeriptv-playerisrael-tvjetpack-compose

SoferAi/soferai-openapi

OpenAPI specification for the Sofer.Ai API

★ 1Shellupdated 2026-03-30mediaopenapitorah

zeglicz/subtitles-generator

App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.

★ 1Pythonupdated 2025-05-26openai-apistreamlit

ranjanjyoti152/Smart-NVR

A powerful Network Video Recorder (NVR) application that leverages GPU acceleration for real-time AI object detection, smart recording, and efficient video management. Built with Python, Flask, and YOLOv5, this application provides enterprise-grade surveillance capabilities with a user-friendly interface.

★ 1Pythonupdated 2026-04-08