Video & Media

232 repos
A feature-rich command-line audio/video downloader
★ 158,626Pythonupdated 2026-04-19clidownloaderpythonsponsorblockyoutube-dl
Display and control your Android device
★ 139,017Cupdated 2026-04-19androidcffmpeglibavmirroring
High performance self-hosted photo and video management solution.
★ 98,632TypeScriptupdated 2026-04-20backup-toolfluttergoogle-photosgoogle-photos-alternativejavascript
real time face swap and one-click video deepfake with only a single image
★ 92,306Pythonupdated 2026-04-19aiai-deep-fakeai-faceai-webcamartificial-intelligence
The Free Software Media System - Server Backend & API
★ 50,847C#updated 2026-04-27csharpdotnethacktoberfestjellyfin
π RuView: WiFi DensePose turns commodity WiFi signals into real-time human pose estimation, vital sign monitoring, and presence detection — all without a single pixel of video.
★ 50,293Rustupdated 2026-04-20agentic-aidenseposeesp32firmwaremcu
Create agents that monitor and act on your behalf. Your agents are standing by!
★ 49,184Rubyupdated 2026-04-19agentautomationfeedfeedgeneratorhuginn
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
★ 45,835Goupdated 2026-04-20agentsaiapiaudio-generationdecentralized
🎥 Make videos programmatically with React
★ 45,095TypeScriptupdated 2026-04-27javascriptreactvideo
Video.js - open source HTML5 video player
★ 39,713JavaScriptupdated 2026-03-11dashhlshtmlhtml5html5-audio
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
★ 35,193Pythonupdated 2024-08-06aminedenoiseesrganimage-restorationjpeg-compression
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
★ 33,452Pythonupdated 2026-04-18deep-learningdiffusionfluximage-generationimage2image
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181, with codec support for H.264, H.265, AV1, VP9, AAC, Opus, and G.711.
★ 28,769C++updated 2026-04-19audiocc-plus-plusdashhevc
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
★ 27,292Pythonupdated 2026-04-19archiveboxbackupsbookmark-archiverbrowser-bookmarkschromium
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
★ 27,049TypeScriptupdated 2026-04-20ai-artartificial-intelligencegenerative-artimage-generationimg2img
280+ free n8n automation templates — ready-to-use workflows for Gmail, Telegram, Slack, Discord, WhatsApp, Google Drive, Notion, OpenAI, and more. AI agents, RAG chatbots, email automation, social media, DevOps, and document processing. The largest open-source n8n template collection.
★ 21,599updated 2026-04-09ai-agentsai-automationautomationautomation-templatesawesome
Kodi is an award-winning free and open source home theater/media center software and entertainment hub for digital media. With its beautiful interface and powerful skinning engine, it's available for Android, BSD, Linux, macOS, iOS, tvOS and Windows.
★ 20,703C++updated 2026-04-24androidc-plus-plusentertainment-hubhacktoberfesthome-theater
🎧 Your Personal Streaming Service
★ 20,701Goupdated 2026-04-20airsonicmadsonicmedia-servermusicmusic-server
A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.
★ 19,670C++updated 2026-03-07anime4kframe-interpolationmachine-learningneural-networksrealcugan
Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
★ 18,581Goupdated 2026-04-19gogolanghlsmedia-serverobs-studio
End-to-end realtime stack for connecting humans and AI
★ 18,378Goupdated 2026-04-20golangmedia-serversfuvideovoice
The free and privacy-friendly screen recorder with no limits 🎥
★ 18,138JavaScriptupdated 2026-04-08annotationannotation-toolaudiocamerachrome-extension
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.
★ 16,459C++updated 2026-04-19animeanime4kesrganframe-interpolationimage-enlarger
Wan: Open and Advanced Large-Scale Video Generative Models
★ 15,911Pythonupdated 2026-03-05aigcvideogeneration
Downloads videos and playlists from YouTube
★ 14,803C#updated 2026-04-19downloaddownloaderffmpegmp3mp4
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
★ 14,782updated 2025-09-20
A curated list of awesome big data frameworks, ressources and other awesomeness.
★ 14,366updated 2026-02-05awesomeawesome-listbigdatadatadata-analytics
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
★ 13,518updated 2025-08-12aiartificial-intelligencedeep-learningintelligent-machinesintelligent-systems
Ultimate camera streaming application
★ 12,882Goupdated 2026-03-23ffmpeggogolanghassiohls
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
★ 12,679Pythonupdated 2025-11-04cogvideoximage-to-videollmsoratext-to-video
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
★ 12,594Pythonupdated 2026-04-15asrcode-switchconformerkwspunctuation-restoration
Streamlink is a CLI utility which pipes video streams from various services into a video player
★ 11,433Pythonupdated 2026-04-26clilivestreampythonstreamingstreaming-services
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.
★ 10,962Pythonupdated 2026-03-17
Write HTML. Render video. Built for agents.
★ 10,875TypeScriptupdated 2026-04-20aianimationffmpegframeworkgsap
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
★ 10,457C++updated 2026-04-23analyticsbidata-visualizationjavascriptjupyter
Open-source framework for conversational voice AI agents
★ 10,447Pythonupdated 2026-04-14aimulti-modalreal-timevideovoice
A framework for building realtime voice AI agents 🤖🎙️📹
★ 10,222Pythonupdated 2026-04-20agentsaiopenaireal-timevideo
A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion
★ 10,218TypeScriptupdated 2025-11-13audiodailymotiondashfacebookhls
📈 A small, fast chart for time series, lines, areas, ohlc & bars
★ 10,102JavaScriptupdated 2026-04-22analyticschartchartsdata-visualizationgraph
Official repository for LTX-Video
★ 10,102Pythonupdated 2026-01-05diffusion-modelsditimage-to-videoimage-to-video-generationtext-to-video
Open and inexpensive DIY IP-KVM based on Raspberry Pi
★ 9,965updated 2026-04-06atxhardwarehdmiip-kvmipkvm
Use your tablet as graphic tablet/touch screen on your computer.
★ 9,120Rustupdated 2026-04-14androidandroid-applicationappbrowserffmpeg
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
★ 8,695Pythonupdated 2026-04-09deep-learningextracthardsubocrripper
Uncensored, open-source alternative to Higgsfield AI, Freepik AI, Krea AI, Openart AI — Free, unrestricted AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.
★ 8,455JavaScriptupdated 2026-04-23ai-art-generatorai-image-generationai-video-generationcreative-toolsflux-1
HTML5 <audio> or <video> player with support for MP4, WebM, and MP3 as well as HLS, Dash, YouTube, Facebook, SoundCloud and others with a common HTML5 MediaElement API, enabling a consistent UI in all browsers.
★ 8,298JavaScriptupdated 2025-11-12dashflashhlshtml5html5-audio
Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.
★ 7,852Pythonupdated 2026-03-21aibackground-removalbackground-removerbackgroundremoverphoto-editing
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
★ 7,688Pythonupdated 2026-04-17agentic-aiagentsaiai-agentsrealtime
Synchronous multiroom audio player
★ 7,595C++updated 2026-03-10audioaudio-playeraudio-streaminglmsmultiroom-audio
An extensible, plugin-oriented, HTML5-first media player for the web
★ 7,451JavaScriptupdated 2026-04-20clapprdashhlshtml5-audiohtml5-video
SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing
★ 7,075Pythonupdated 2026-04-20ai-artcaptiondiffusersgenerative-artpython
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
★ 6,935TypeScriptupdated 2026-04-20agiai-agentsai-suiteai-workspaceanthropic-api
:tv: Cross-platform IPTV player application with multiple features, such as support of m3u and m3u8 playlists, favorites, TV guide, TV archive/catchup and more.
★ 5,899TypeScriptupdated 2026-04-19chromeoselectronepgfair-sourceiptv
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
★ 5,820TypeScriptupdated 2026-04-20chatgptchatgpt-apideep-learningfew-shot-learninggpt
An automated e-mail OSINT tool
★ 5,794Goupdated 2024-02-02automationdata-breachemailemail-checkergo
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.
★ 5,695Pythonupdated 2026-04-18c-plus-plusffmpeggplv3openshotpython
Download web video and audio
★ 5,616C#updated 2026-04-20csharpdownloaderflathubgnomegtk4
A curated list of recent diffusion models for video generation, editing, and various other applications.
★ 5,610updated 2026-04-03awesomediffusion-modelsmotion-customizationvideo-editingvideo-generation
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
★ 5,437Rustupdated 2026-04-20ai-engineeringai-pipelinearrowartificial-intelligencebig-data
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
★ 5,002Pythonupdated 2026-03-18action-recognitionavabenchmarkdeep-learningi3d
The Restreamer is a complete streaming server solution for self-hosting. It has a visually appealing user interface and no ongoing license costs. Upload your live stream to YouTube, Twitch, Facebook, Vimeo, or other streaming solutions like Wowza. Receive video data from OBS and publish it with the RTMP and SRT server.
★ 4,995HTMLupdated 2025-12-29ffmpegffmpeg-apiffmpeg-serverh264hls
Free and open source video editor, based on MLT Framework and KDE Frameworks
★ 4,948C++updated 2026-04-20
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
★ 4,430TypeScriptupdated 2021-12-06annotation-toolcntkdeep-learningdetectiondetection-model
A curated list of awesome data labeling tools
★ 4,314updated 2024-06-173d-annotationannotationannotation-toolaudio-annotationaudio-annotation-tool
Effort free video editing!
★ 4,206Nimupdated 2026-04-17audioaudio-editingaudio-processingautomaticnim
Bubble Card is a minimalist card collection for Home Assistant with a nice pop-up touch.
★ 4,141JavaScriptupdated 2026-04-16buttoncardcardscustom-cardcustom-cards
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
★ 4,063Pythonupdated 2026-04-20agiaudio-evaluationbenchmarkevaluationlarge-language-models
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
★ 3,981Jupyter Notebookupdated 2025-06-12
A self-hosted web radio management suite, including turnkey installer tools for the full radio software stack and a modern, easy-to-use web app to manage your stations.
★ 3,824PHPupdated 2026-04-17icecastliquidsoapradioradio-stationshoutcast
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
★ 3,705Jupyter Notebookupdated 2026-01-08
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
★ 3,657C#updated 2026-04-19asrcsharpflyleaflanguage-learningllm
Whisper realtime streaming for long speech-to-text transcription and translation
★ 3,603Pythonupdated 2025-11-12
A deep learning library for video understanding research.
★ 3,554Pythonupdated 2026-01-12
LTX-Video Support for ComfyUI
★ 3,517Pythonupdated 2026-04-13comfyuidiffusion-modelsditimage-to-videoimage-to-video-generation
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
★ 3,485TypeScriptupdated 2026-04-19accessibilityanalyticsaudiohlshtml
Elegantly record your screen
★ 3,344Rustupdated 2026-04-20gnomegstreamergtk-rsgtk4gtk4-rs
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.
★ 3,247Pythonupdated 2026-04-24agentagentic-aiaiclaudecopilot
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
★ 3,162updated 2026-03-28
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
★ 3,099Shellupdated 2026-04-13agent-toolsai-agentsai-artai-musicai-video
QualityScaler - image/video AI upscaler app
★ 3,036Pythonupdated 2026-04-05amdanimecompression-artifact-reductiondeep-learningdirectx-12
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
★ 3,011Pythonupdated 2026-04-17coralcudadarknetedgetpuface-recognition
Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. 🌊 Streaming & Non-Streaming Support. ✨ Experience the Future of AI – Today! Click to Try Now! ✨
★ 2,790Pythonupdated 2026-02-23aiclaude-3-7-sonnetdeepseekgemini
Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
★ 2,776Javaupdated 2026-04-20epsexificciptcjava
Data context layer for unstructured data - images, video, sensor data, text and PDFs
★ 2,737Pythonupdated 2026-04-20claude-codecodexdata-context-layerdata-processingharness
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
★ 2,711Pythonupdated 2026-03-29agentic-aigcvideo-generation
Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.
★ 2,548Pythonupdated 2026-03-07aitooltiktoktranscribevideototextyoutube
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
★ 2,431updated 2026-04-13awesome-listdatadata-sciencedata-visualizationdatasets
Cross-platform desktop GUI app to clean image metadata
★ 2,419Perlupdated 2026-04-03concurrencydark-modedesktop-appelectronexif
Official SeedVR2 Video Upscaler for ComfyUI
★ 2,380Pythonupdated 2025-12-24aicomfyuicomfyui-nodesupscalervideo-processing
Enable AI models for video production in the browser
★ 2,345TypeScriptupdated 2025-06-12aimediavideo
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
★ 2,327Jupyter Notebookupdated 2026-03-063d3d-reconstruction3d-visionaiaugmented-reality
Automatically generate and overlay subtitles for any video.
★ 2,203Pythonupdated 2024-07-12ffmpegopenai-whispersubtitle-generatorsubtitlessubtitles-generator
A browser extension that helps users publish content to multiple social media platforms with one click.
★ 2,183TypeScriptupdated 2026-03-03articleautomationcontent-platformmarketingmarketing-automation
Tookie is a advanced OSINT information gathering tool that finds social media accounts based on inputs.
★ 2,148Pythonupdated 2026-04-09cyber-securitycybersecurityhacking-toolhacking-toolsinformation-gathering
Interpolate, Upscale, Decompress, and Denoise videos easily on Linux/Windows/MacOS.
★ 1,911Pythonupdated 2026-04-19guiinterpolationlinuxmacosreal-esrgan
Play videos side-by-side
★ 1,907Pythonupdated 2026-01-13libvlclivestreamplayerplayer-videopyqt
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
★ 1,810Pythonupdated 2026-04-07diffusion-modelsimage-to-videoimage-to-video-generationvideogeneration
MLT Multimedia Framework
★ 1,765Cupdated 2026-04-20audioaudio-processingcc-plus-plusffmpeg
A list of tools, papers and code related to Deepfake Detection.
★ 1,763updated 2025-09-02awesomecodedatasetdeepfake-detectiondeepfakes
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, speech synthesis and recognition, web search, memory, presets, assistants,and more. Linux, Windows, Mac
★ 1,749Pythonupdated 2026-02-06aiai-assistantartificial-intelligenceautonomous-agentchatbot
Minimalistic media card for Home Assistant Lovelace UI
★ 1,696TypeScriptupdated 2026-03-06automationcustomhacktoberfesthassiohome-assistant
Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.
★ 1,688Perlupdated 2026-04-16logitech-media-serverlyrionlyrion-music-servermusicperl
Moonfire NVR, a security camera network video recorder
★ 1,688Rustupdated 2026-04-07cameraip-camerajavascriptnvrrtsp
Docker build for FFmpeg on Ubuntu / Alpine / Centos / Scratch / nvidia / vaapi
★ 1,627Pythonupdated 2026-04-12alpinecentosdockerffmpegnvidia
Nodes related to video workflows
★ 1,603Pythonupdated 2026-04-14
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
★ 1,591Pythonupdated 2025-01-01deep-learninglanguage-modelmachine-learningmulti-modal-learningnatural-language-processing
Generate text, images, video, speech, and music by MiniMax.
★ 1,578TypeScriptupdated 2026-04-20ai
Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistant and must run on an always-on device like a Raspberry Pi, a NAS or an Intel NUC or alike.
★ 1,520Pythonupdated 2026-04-20
Dandere2x - Fast Waifu2x Video Upscaling.
★ 1,507C++updated 2023-08-17compressionfastfastervideowaifu2x
OBS plugin for local speech recognition and captioning using AI
★ 1,458C++updated 2026-04-09ailive-streaminglivestreamobsobs-studio
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
★ 1,457Pythonupdated 2024-09-27aigc-enhancementdeflickervideo-diffusion-modelvideo-super-resolution
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
★ 1,443Pythonupdated 2026-04-15image-generationimage-to-videomcpmcp-servermcp-tools
SALMONN family: A suite of advanced multi-modal LLMs
★ 1,412updated 2026-04-20audioaudio-processingaudio-visual-understandingbytedanceiclr2024
PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.
★ 1,368Pythonupdated 2026-04-03aiaicinemablenderchatterboxdiffusion
AI video agents framework for next-gen video interactions and workflows.
★ 1,367Pythonupdated 2026-01-23agentagent-frameworkai-agentsframeworkllm
Display paginated content in the browser and generate print books using web technology
★ 1,310HTMLupdated 2026-04-23htmlpaged-mediapdfpolyfillprinting
A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.
★ 1,291Vueupdated 2026-04-08audiocomfyuiextensionfile-explorerfile-server
Curated List of Awesome Django Admin Panel Articles, Libraries/Packages, Books, Themes, Videos, Resources.
★ 1,236updated 2026-02-24articleawesomeawesome-listdjangodjango-admin
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a reference image and audio.
★ 1,232Pythonupdated 2026-01-20aigcavatar-generatorvideo-generation
Web-based interface for Claude CLI with streaming chat responses
★ 1,053TypeScriptupdated 2025-11-03claudeclaude-cliweb-ui
An open and scalable video surveillance system for anyone making this world a better and more peaceful place.
★ 1,019Goupdated 2026-04-13dockergolangipcameramotiondetectionmotiondetector
Telegram MCP server powered by Telethon to let MCP clients read chats, manage groups, and send/modify messages, media, contacts, and settings.
★ 1,004Pythonupdated 2026-04-12adminapichat-managementcontactsgroups
ALwrity - AI Digital Marketing Platform. (WIP)
★ 1,000Pythonupdated 2026-04-20ai-content-generationai-content-marketingai-digital-marketingai-seo-toolsai-social-media
Media sorting tool to organize photos and videos from your camera in folders by year, month and day.
★ 995Pythonupdated 2024-05-06cameraexiftoolorganize-media-filesorganize-photosphotobackup
An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models
★ 936Pythonupdated 2025-02-26aichatgptdavinci-resolveeditingfilm-editing
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports OpenAI, Azure, Perplexity, LLaMA, and more, with features like streaming, interactive chat, prompt files, image/audio I/O, MCP tool calls, and an experimental agent mode for safe, multi-step automation.
★ 917Goupdated 2026-03-22agentagentic-aiazurechatgptcli
A set of tools to trim, crop and select frames inside a video
★ 908Swiftupdated 2024-12-11cropcroppingiosswiftthumbnail
Generate static HTML photo / video galleries
★ 852JavaScriptupdated 2026-02-28galleryphotographyphotosstatic-site-generatorstatic-website
Optimized Whisper models for streaming and on-device use
★ 829Pythonupdated 2026-04-09apple-siliconcoremlmlxnvidia-gpuon-device-ai
fully accessible cross-browser HTML5 media player.
★ 815JavaScriptupdated 2026-04-20
Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration
★ 809JavaScriptupdated 2023-03-16expressjsgpulibretranslatemachine-learningnodejs
Generative Agents for video games. Based on Generative Agents: Interactive Simulacra of Human Behavior
★ 760Javaupdated 2023-10-11generative-agents
🔈 Sonos Media Player Interface/Client
★ 719JavaScriptupdated 2026-04-15home-automationjavascriptmusicnodejssonos
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
★ 711Pythonupdated 2026-04-01awqbenchmarkdeepseek-v3deploymentevaluation
🥥 Coco AI App - Search, Connect, Collaborate, Personal AI Search and Assistant, all in one space.
★ 690TypeScriptupdated 2026-04-17ai-searchai-search-engineassistantcmd-kdeepseek
Emora is an OSINT tool like sherlock but with a GUI, which search for accounts by username across social networks
★ 688C#updated 2026-02-07analysisautomationcsharpcybersecuritydiscovery
Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. With a suite of over 15 specialized tools, function pipelines, and filters, this project supports academic research, agentic autonomy, multimodal creativity, workflows, and more
★ 681Pythonupdated 2026-04-20academic-researchai-agentsai-workstationarxivcomfyui
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.
★ 665Pythonupdated 2026-04-15agentic-aiaicommand-line-toolgenerative-ailinux
An image/video/workflow browser and manager for ComfyUI.
★ 659Svelteupdated 2024-11-11comfyuicomfyui-browsercomfyui-managerstable-diffusionworkflows
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
★ 647Pythonupdated 2026-04-27agent-skillsai-videobilibiliclaude-codeclaude-code-skill
An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos. Allows full local deployment (web app, RAG server, LLM server). Supports multi-modal RAG content Q&A.
★ 619TypeScriptupdated 2026-04-09contentcontent-searchragsearchsearch-engine
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
★ 614JavaScriptupdated 2024-02-12bbc-news-labskaldinews-labsreactstt
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
★ 603Pythonupdated 2025-10-17agentsaudio-editingaudio-understandingllm-agentsnotebooklm
BACnet Protocol Stack library provides a BACnet application layer, network layer and media access (MAC) layer communications services.
★ 553Cupdated 2026-04-20avrbacnetbacnet-clientbacnet-ipbacnet-library
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
★ 548HTMLupdated 2025-04-04aigclarge-language-modelslarge-vision-language-modelsllmlvlm
Local Video-LLM powered AI Baby Monitor
★ 500Pythonupdated 2025-05-22baby-monitorvideo-llm
🎦 Extract video hard subtitles and automatically generate corresponding srt files.
★ 498Pythonupdated 2025-09-11ocrrapid-videocrsubtitlevideovideosubfinder
Lightweight headless squeezebox player for Lyrion Media Server
★ 491Cupdated 2026-04-20
A consolidation of various compiled open-source AI image/video upscaling product for a working CLI friendly image and video upscaling program.
★ 481Shellupdated 2025-05-08amd64amd64onlybsd-3-clauseclidebian
AI-powered podcast & video generator.
★ 451TypeScriptupdated 2026-04-20
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
★ 443Pascalupdated 2026-04-18aiaudio-to-textblu-raycaptionseditor
Source code for free AI video upscaler tool
★ 408TypeScriptupdated 2026-02-10upscalingvideo-enhancementwebcodecswebglwebgpu
AI-powered video editor that turns raw footage and a creative brief into a polished ad using an ensemble of AI agents (Google Gemini + FFmpeg)
★ 387Pythonupdated 2026-04-14
Start and end frames video generation nodes based on the modified Kijai version Wan2.1 nodes
★ 383Pythonupdated 2025-03-22
[CVPR 2026 Highlight] High-Quality Text-to-Video Generation with Alpha Channel
★ 360Pythonupdated 2026-04-09
Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)
★ 353Pythonupdated 2025-04-10
Spyder OSINT GUI — Graphical open-source intelligence research tool for phone number lookup, IP geolocation, social media reconnaissance, email validation, domain WHOIS, username search, with multi-module architecture and Tkinter-based interface for digital forensics
★ 341Pythonupdated 2026-03-13digital-forensicsdomain-lookupemail-osintinformation-gatheringip-lookup
The coloured icons library is a collection of brand logos and tech stack logos. It's a handy resource to easily add brand icons to your projects without the hassle of manual attribution.
★ 314CSSupdated 2026-03-21brand-iconcoding-iconcoding-iconscolor-iconscolored-icons
The (official) Music Assistant Mobile app is a cross-platform client application designed for Android, iOS, and Java runtime environments. Developed using Kotlin Multiplatform (KMP) and Compose Multiplatform frameworks, this project aims to provide a unified codebase for seamless music management across multiple platforms.
★ 285Kotlinupdated 2026-04-15androidandroid-apphome-assistantiosios-app
AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.
★ 284TypeScriptupdated 2026-04-27ai-sdkai-videoclaude-codecursorelevenlabs
Display a mosaic of livestreams. Built for streaming.
★ 282TypeScriptupdated 2026-02-02facebookinstagramlivestreammultistreamperiscope
Web-based IP camera viewer for fast, simple streaming in any browser 🦇
★ 263Pythonupdated 2026-03-31
This is a list of awesome articles about object detection from video.
★ 247updated 2019-07-01awesome-listcomputer-visiondeep-learningdeep-neural-networksobject-detection
The agentic video editing framework
★ 244Pythonupdated 2025-02-10
An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using OpenAI's GPT-3, creates images using OpenAI's DALL-E, adds voiceover using ElevenLabs API, and combines the elements into a video.
★ 216Pythonupdated 2024-09-17aiartificial-intelligencedall-eediting-videosopenai
Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delivers a seamless vibe video editing experience.
★ 213updated 2025-05-06ai-video-agentcursorvideo-cupvideo-editingvideo-editor
Upscale your videos up to 4k on free google colab using Real-ESRGAN
★ 207Jupyter Notebookupdated 2025-05-014krealesrganstable-diffusionsuper-resolutionupscaler
workbench for learning and practicing on-device AI technology in real scenario with online-TV on Android phone, powered by ggml(llama.cpp,whisper.cpp...) and FFmpeg and opencv-mobile
★ 192C++updated 2025-06-12ggml-hexagonllamacpp-android-portonline-tvqualcomm-npustablediffusioncpp-android-port
PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file
★ 189Pythonupdated 2024-05-05auto-captionauto-subtitlecaptionsffmpeggoogle-translate
A Claude Code workspace that transforms ideas into multi-format content: research → long-form articles → platform-specific versions (LinkedIn, newsletter, social media, podcast Q&A)
★ 178updated 2025-09-06
Automate Creation of Story-Based Videos.
★ 173Goupdated 2024-09-22dockergenerative-aigolangllmollama
Claude Code skills for journalism, media, and academia - verification, FOIA, data journalism, academic writing, and more
★ 170HTMLupdated 2026-04-22academic-writingclaudeclaude-codeclaude-skillsdata-journalism
A cross-platform desktop application for running AI models from [WaveSpeedAI](https://wavespeed.ai), as well as many free local AI models including Z-Image.
★ 147TypeScriptupdated 2026-04-24aiai-image-generationai-image-generatorimagevideo
Instance and Community Explorer for Lemmy
★ 145TypeScriptupdated 2026-02-02fediverselemmylemmyversembinnodejs
HyperMotion is a pose guided human image animation framework based on a large-scale video diffusion Transformer.
★ 140Pythonupdated 2026-03-10dithuman-video-animationhuman-video-generationmotion-generationpose-guided-text-to-image-generation
Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models
★ 136Pythonupdated 2026-04-21action-conditioneddiffusion-modelsgame-generationinteractive-videomulti-agent
Using ffmpeg command line to achieve an mcp server, can be very convenient, through the dialogue to achieve the local video search, tailoring, stitching, playback,clip, overlay, concat and other functions
★ 132Pythonupdated 2025-05-13
Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive
★ 128HTMLupdated 2026-04-27karaokekaraoke-makerlyricsmusicvideo
AI-powered video editor with FFMPEG, Remotion, & Obsidian Agents Baked in
★ 126TypeScriptupdated 2026-04-15
A memory-efficient implementation for upscaling videos in ComfyUI using non-diffusion upscaling models. This custom node is designed to handle large video frame sequences without memory bottlenecks.
★ 121Pythonupdated 2025-09-18
Hardware-accelerated video decoding, encoding and processing on Intel graphics through VA-API. This module has been merged into the main GStreamer repo for further development.
★ 117Cupdated 2020-04-10
Here is over 200 AI prompts that covers Blog Writing, Email Marketing , YouTube Ad Scripts, Facebook Ad,YouTube Video Ideas,Twitter Thread ,Cold DM Ideas,Influencer Marketing and Copywriting and Instagram Story.
★ 114updated 2023-02-08aibardbingchatgptchatgpt3
Convert subtitles from one format to another format. Supported formats: STL EBU, TTML SMI, VTT, SRT
★ 109Javaupdated 2025-07-12video
End-to-end workflow to automatically generate show notes from audio/video transcripts
★ 94TypeScriptupdated 2026-02-25assembly-aichatgptclaudedeepgramgemini
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.
★ 84Pythonupdated 2023-05-16background-removallinuxpythonv4l2loopbackvideo
CarrotAI is a cutting-edge AI agent application that delivers real-time streaming chat via Server-Sent Events (SSE) with built-in Model Control Protocol (MCP) integration. It supports concurrent connections to multiple SSE MCP servers and provides user interfaces in English, Chinese, and Japanese.
★ 83Dartupdated 2025-05-10
A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.
★ 70Pythonupdated 2026-03-21chineseocrhardsubocrocr-pythonsrt
Token-efficient data serialization for LLM/AI. 50% fewer tokens than JSON, 93% better value/token. Rust, schema validation, LSP.
★ 65Rustupdated 2026-04-20ai-mlclicsvdata-formatjson-alternative
Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.
★ 63TypeScriptupdated 2026-04-16ffmpegpodcastprofanity-filterspeech-to-texttransformersjs
FFmpeg compiled inside an NVIDIA-enabled Docker Container
★ 62Dockerfileupdated 2025-12-02
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
★ 61Svelteupdated 2026-04-07geminigemini-flashspeaker-diarizationspeech-to-textsveltekit
★ 58TypeScriptupdated 2026-02-22
Streaming Availability API allows getting streaming availability information of movies and series; and querying the list of available shows on streaming services such as Netflix, Disney+, Apple TV, Max and Hulu across multiple countries!
★ 56Goupdated 2025-09-08amazon-prime-videoapiapi-clientapple-tvappletv
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
★ 53TypeScriptupdated 2026-03-17ffmpegopenaiopenai-whispersubtitle-generatorsubtitles
Worlds First AI Video Editor and Voice Cloner
★ 52Pythonupdated 2026-04-11
A user-friendly Raspberry Pi baby monitor with cry detection and audio/video streaming.
★ 52PHPupdated 2026-02-03
This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.
★ 50Pythonupdated 2025-03-30
★ 48TypeScriptupdated 2024-04-15
Installation script for an AI applications using ROCm on Linux.
★ 45Shellupdated 2026-04-183daiamdamdgpuaudio
🚀 High-performance image and video delivery and uploading at scale in Astro powered by Cloudinary.
★ 31TypeScriptupdated 2025-11-13astroastro-loadercloudinarycloudinary-sdk
Compare and generate AI videos across Sora, Veo, Kling, Seedance & more.
★ 30TypeScriptupdated 2026-04-26ai-toolsai-video-generationai-video-generatorfal-aigenerative-ai
Claude Code Skills for podcast/video editing: transcription, content editing, rough/fine cut, final polish
★ 30HTMLupdated 2026-03-17
The Multi-Language Automatic Translation, Subtitling, and Voice Rendering System uses third party software to automatically convert audio to text, translate text, render text to video, and render text to audio.
★ 29PHPupdated 2024-07-29audiolanguagephpspeechsrt
100+ tool MCP server for real-time global intelligence — markets, FX, bonds, earnings, SEC filings, conflict, military, cyber, climate, news, company enrichment, and 30+ domains. Live Leaflet dashboard with 20 map layers, SSE streaming, and AI situation briefs.
★ 23Pythonupdated 2026-04-06ai-toolsai-watchanthropicclaudecybersecurity
Automatically create subtitles for any video using google speech to text cloud api.
★ 22Pythonupdated 2018-08-23
A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.
★ 19Pythonupdated 2026-01-02asrgeminigemini-apitranscribe
Iran War Media Monitor collects news articles covering the US-Israeli war on Iran and applies sentiment analysis to uncover who supports and opposes the war.
★ 17Pythonupdated 2026-03-24
Open AI Video Subtitle Generator Agent Generate .srt subtitle files for any video no length limit, 100% free, offline, and runs locally on your machine.
★ 17Pythonupdated 2025-10-20editinggui-applicationpython3subtitlessubtitles-generator
Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.
★ 17Goupdated 2026-01-08ffmpegffmpeg-wrapperfyneguilocal
Play multiple live videos simultaneously in a grid
★ 15TypeScriptupdated 2025-12-14gridlivemultiplayerstream
The new emerging Non Linear Video Editor for Linux. Backup of Lumiera master repository
★ 15C++updated 2026-04-19
Whisper Flutter Example Speech To Text Offline Android Linux Without Api Key Without FFMPEG
★ 10C++updated 2025-08-02aiazkadevdartflutterggml
An audio/video transcriber with diarization and transcription editing.
★ 10JavaScriptupdated 2026-03-17
Fast, accurate voice typing for Linux — IBus input method engine with streaming STT, Whisper refinement, and CUDA acceleration
★ 10Pythonupdated 2026-02-10accessibilityfaster-whisperlinuxnixosspeech-to-text
MCP (Model Context Protocol) server for uploading media to Cloudinary using Claude Desktop
★ 10JavaScriptupdated 2025-03-13
a multi-modal MCP layer for real life — built on continuous video, semantic search and natural language video understanding.
★ 9Pythonupdated 2025-09-03
Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing
★ 9Rustupdated 2026-03-21ailocalaimacosopen-sourcetranscribe
A modern FastAPI-based web app for real-time object detection using YOLO models, supporting image and video uploads, model selection, live streaming, and interactive UI.
★ 9Pythonupdated 2025-06-28ai-projectback-endcomputer-visionfastapifront-end
Library and tool for downloading media and content from Torah websites.
★ 7Pythonupdated 2026-04-17mediapythontorah
Deterministic video editing SDK for AI agents. Ships with MCP tools.
★ 5TypeScriptupdated 2026-03-15ai-agentclaudeeditingffmpegmcp
⚙️ A Model Context Protocol (MCP) server for accessing Amazon S3 buckets. This server provides seamless integration with S3 storage through MCP, allowing efficient handling of large files including PDFs through streaming capabilities.
★ 5TypeScriptupdated 2025-07-06
This Python script is designed to track music played on Home Assistant media players. It stores track information in a SQLite database and provides various statistics
★ 5Pythonupdated 2025-07-19
LTX-2.3 video generation skill — setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model
★ 4Pythonupdated 2026-03-27
🗣️ Whispers Talk. Recall. Repeat. A blazing-fast voice journal that remembers everything you say — searchable with AI. ✨ What is Whispers? Whispers is a voice-first journaling app powered by: 🧠 <300ms Latency Streaming Transcription (AssemblyAI) 🔍 Algolia MCP for instant search of your thoughts
★ 4TypeScriptupdated 2025-07-28
An offline-first desktop app to automatically transcribe and edit video subtitles using OpenAI Whisper. Full control over text, timing, and advanced styling in a powerful, intuitive editor.
★ 4Pythonupdated 2025-08-16
A fully offline, high-performance, streaming speech-to-text tool for developers on Linux.
★ 3Pythonupdated 2025-10-24
installation for nvidia with cuda and ffmpeg encode on video card on Ubuntu 22.04 with GeForce GTX 1050 Ti
★ 3Shellupdated 2022-08-19
A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.
★ 2Pythonupdated 2025-06-10
This repository contains a multi layer analysis of news articles, editorial opinions and public comments about the ongoing Iran - Israel War. It synthesis the dominant themes by perspectives by global media channels and what is convergence/divergence of editor's opinions and common public to news articles.
★ 1Jupyter Notebookupdated 2026-04-08
Watch Israeli TV channels live on Android. A simple, fast, and modern app for Israeli IPTV with PiP support and favorites. Built with Jetpack Compose.
★ 1Kotlinupdated 2026-04-13androidexoplayeriptv-playerisrael-tvjetpack-compose
OpenAPI specification for the Sofer.Ai API
★ 1Shellupdated 2026-03-30mediaopenapitorah
App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.
★ 1Pythonupdated 2025-05-26openai-apistreamlit
A powerful Network Video Recorder (NVR) application that leverages GPU acceleration for real-time AI object detection, smart recording, and efficient video management. Built with Python, Flask, and YOLOv5, this application provides enterprise-grade surveillance capabilities with a user-friendly interface.
★ 1Pythonupdated 2026-04-08