Video & Media
232 repos
A feature-rich command-line audio/video downloader
Display and control your Android device
High performance self-hosted photo and video management solution.
real time face swap and one-click video deepfake with only a single image
The Free Software Media System - Server Backend & API
π RuView: WiFi DensePose turns commodity WiFi signals into real-time human pose estimation, vital sign monitoring, and presence detection — all without a single pixel of video.
Create agents that monitor and act on your behalf. Your agents are standing by!
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
🎥 Make videos programmatically with React
Video.js - open source HTML5 video player
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
SRS is a simple, high-efficiency, real-time media server supporting RTMP, WebRTC, HLS, HTTP-FLV, HTTP-TS, SRT, MPEG-DASH, and GB28181, with codec support for H.264, H.265, AV1, VP9, AAC, Opus, and G.711.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
280+ free n8n automation templates — ready-to-use workflows for Gmail, Telegram, Slack, Discord, WhatsApp, Google Drive, Notion, OpenAI, and more. AI agents, RAG chatbots, email automation, social media, DevOps, and document processing. The largest open-source n8n template collection.
Kodi is an award-winning free and open source home theater/media center software and entertainment hub for digital media. With its beautiful interface and powerful skinning engine, it's available for Android, BSD, Linux, macOS, iOS, tvOS and Windows.
🎧 Your Personal Streaming Service
A machine learning-based video super resolution and frame interpolation framework. Est. Hack the Valley II, 2018.
Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS / MPEG-TS / RTP media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
End-to-end realtime stack for connecting humans and AI
The free and privacy-friendly screen recorder with no limits 🎥
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.
Wan: Open and Advanced Large-Scale Video Generative Models
Downloads videos and playlists from YouTube
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
A curated list of awesome big data frameworks, ressources and other awesomeness.
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
Ultimate camera streaming application
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Streamlink is a CLI utility which pipes video streams from various services into a video player
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.
Write HTML. Render video. Built for agents.
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
Open-source framework for conversational voice AI agents
A framework for building realtime voice AI agents 🤖🎙️📹
A React component for playing a variety of URLs, including file paths, YouTube, Facebook, Twitch, SoundCloud, Streamable, Vimeo, Wistia and DailyMotion
📈 A small, fast chart for time series, lines, areas, ohlc & bars
Official repository for LTX-Video
Open and inexpensive DIY IP-KVM based on Raspberry Pi
Use your tablet as graphic tablet/touch screen on your computer.
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Uncensored, open-source alternative to Higgsfield AI, Freepik AI, Krea AI, Openart AI — Free, unrestricted AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.
HTML5 <audio> or <video> player with support for MP4, WebM, and MP3 as well as HLS, Dash, YouTube, Facebook, SoundCloud and others with a common HTML5 MediaElement API, enabling a consistent UI in all browsers.
Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
Synchronous multiroom audio player
An extensible, plugin-oriented, HTML5-first media player for the web
SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
:tv: Cross-platform IPTV player application with multiple features, such as support of m3u and m3u8 playlists, favorites, TV guide, TV archive/catchup and more.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
An automated e-mail OSINT tool
OpenShot Video Editor is an award-winning free and open-source video editor for Linux, Mac, and Windows, and is dedicated to delivering high quality video editing and animation solutions to the world.
Download web video and audio
A curated list of recent diffusion models for video generation, editing, and various other applications.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
The Restreamer is a complete streaming server solution for self-hosting. It has a visually appealing user interface and no ongoing license costs. Upload your live stream to YouTube, Twitch, Facebook, Vimeo, or other streaming solutions like Wowza. Receive video data from OBS and publish it with the RTMP and SRT server.
Free and open source video editor, based on MLT Framework and KDE Frameworks
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
A curated list of awesome data labeling tools
Effort free video editing!
Bubble Card is a minimalist card collection for Home Assistant with a nice pop-up touch.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
A self-hosted web radio management suite, including turnkey installer tools for the full radio software stack and a modern, easy-to-use web app to manage your stations.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Whisper realtime streaming for long speech-to-text transcription and translation
A deep learning library for video understanding research.
LTX-Video Support for ComfyUI
UI components and hooks for building video/audio players on the web. Robust, customizable, and accessible. Modern alternative to JW Player and Video.js.
Elegantly record your screen
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.
🔥🔥🔥 [IEEE TCSVT] Latest Papers, Codes and Datasets on Vid-LLMs.
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
QualityScaler - image/video AI upscaler app
Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.
Unleash Next-Level AI! 🚀 💻 Code Generation: DeepSeek r1 + Claude 3.7 Sonnet - Unparalleled Performance! 📝 Content Creation: DeepSeek r1 + Gemini 2.5 Pro - Superior Quality! 🔌 OpenAI-Compatible. 🌊 Streaming & Non-Streaming Support. ✨ Experience the Future of AI – Today! Click to Try Now! ✨
Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Data context layer for unstructured data - images, video, sensor data, text and PDFs
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
Cross-platform desktop GUI app to clean image metadata
Official SeedVR2 Video Upscaler for ComfyUI
Enable AI models for video production in the browser
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Automatically generate and overlay subtitles for any video.
A browser extension that helps users publish content to multiple social media platforms with one click.
Tookie is a advanced OSINT information gathering tool that finds social media accounts based on inputs.
Interpolate, Upscale, Decompress, and Denoise videos easily on Linux/Windows/MacOS.
Play videos side-by-side
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
MLT Multimedia Framework
A list of tools, papers and code related to Deepfake Detection.
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, speech synthesis and recognition, web search, memory, presets, assistants,and more. Linux, Windows, Mac
Minimalistic media card for Home Assistant Lovelace UI
Server for Squeezebox and compatible players. This server is also called Lyrion Music Server.
Moonfire NVR, a security camera network video recorder
Docker build for FFmpeg on Ubuntu / Alpine / Centos / Scratch / nvidia / vaapi
Nodes related to video workflows
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Generate text, images, video, speech, and music by MiniMax.
Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistant and must run on an always-on device like a Raspberry Pi, a NAS or an Intel NUC or alike.
Dandere2x - Fast Waifu2x Video Upscaling.
OBS plugin for local speech recognition and captioning using AI
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
SALMONN family: A suite of advanced multi-modal LLMs
PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.
AI video agents framework for next-gen video interactions and workflows.
Display paginated content in the browser and generate print books using web technology
A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.
Curated List of Awesome Django Admin Panel Articles, Libraries/Packages, Books, Themes, Videos, Resources.
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a reference image and audio.
Web-based interface for Claude CLI with streaming chat responses
An open and scalable video surveillance system for anyone making this world a better and more peaceful place.
Telegram MCP server powered by Telethon to let MCP clients read chats, manage groups, and send/modify messages, media, contacts, and settings.
ALwrity - AI Digital Marketing Platform. (WIP)
Media sorting tool to organize photos and videos from your camera in folders by year, month and day.
An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports OpenAI, Azure, Perplexity, LLaMA, and more, with features like streaming, interactive chat, prompt files, image/audio I/O, MCP tool calls, and an experimental agent mode for safe, multi-step automation.
A set of tools to trim, crop and select frames inside a video
Generate static HTML photo / video galleries
Optimized Whisper models for streaming and on-device use
fully accessible cross-browser HTML5 media player.
Generate transcripts for audio and video content with a user friendly UI, powered by Open AI's Whisper with automatic translations and download videos automatically with yt-dlp integration
Generative Agents for video games. Based on Generative Agents: Interactive Simulacra of Human Behavior
🔈 Sonos Media Player Interface/Client
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
🥥 Coco AI App - Search, Connect, Collaborate, Personal AI Search and Assistant, all in one space.
Emora is an OSINT tool like sherlock but with a GUI, which search for accounts by username across social networks
Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. With a suite of over 15 specialized tools, function pipelines, and filters, this project supports academic research, agentic autonomy, multimodal creativity, workflows, and more
A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.
An image/video/workflow browser and manager for ComfyUI.
AI-powered video podcast creation skill for coding agents. Supports Bilibili & YouTube, multi-language (zh-CN/en-US), 6 TTS engines (Edge/Azure/ElevenLabs/OpenAI/Doubao/CosyVoice), 4K Remotion rendering.
An open-source AI content search engine designed specifically for content creators. Supports extraction of text, images, and short videos. Allows full local deployment (web app, RAG server, LLM server). Supports multi-modal RAG content Q&A.
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
BACnet Protocol Stack library provides a BACnet application layer, network layer and media access (MAC) layer communications services.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Local Video-LLM powered AI Baby Monitor
🎦 Extract video hard subtitles and automatically generate corresponding srt files.
Lightweight headless squeezebox player for Lyrion Media Server
A consolidation of various compiled open-source AI image/video upscaling product for a working CLI friendly image and video upscaling program.
AI-powered podcast & video generator.
Tero Subtitler is an open source, cross-platform, and free subtitle editing software.
Source code for free AI video upscaler tool
AI-powered video editor that turns raw footage and a creative brief into a polished ad using an ensemble of AI agents (Google Gemini + FFmpeg)
Start and end frames video generation nodes based on the modified Kijai version Wan2.1 nodes
[CVPR 2026 Highlight] High-Quality Text-to-Video Generation with Alpha Channel
Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)
Spyder OSINT GUI — Graphical open-source intelligence research tool for phone number lookup, IP geolocation, social media reconnaissance, email validation, domain WHOIS, username search, with multi-module architecture and Tkinter-based interface for digital forensics
The coloured icons library is a collection of brand logos and tech stack logos. It's a handy resource to easily add brand icons to your projects without the hassle of manual attribution.
The (official) Music Assistant Mobile app is a cross-platform client application designed for Android, iOS, and Java runtime environments. Developed using Kotlin Multiplatform (KMP) and Compose Multiplatform frameworks, this project aims to provide a unified codebase for seamless music management across multiple platforms.
AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.
Display a mosaic of livestreams. Built for streaming.
Web-based IP camera viewer for fast, simple streaming in any browser 🦇
This is a list of awesome articles about object detection from video.
The agentic video editing framework
An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using OpenAI's GPT-3, creates images using OpenAI's DALL-E, adds voiceover using ElevenLabs API, and combines the elements into a video.
Frame is an AI-powered, open-source vibe video editor, offering a Professional VIDEO cuting alternative for creators. With Cursor-like interaction, it automates editing, enhances videos, and delivers a seamless vibe video editing experience.
Upscale your videos up to 4k on free google colab using Real-ESRGAN
workbench for learning and practicing on-device AI technology in real scenario with online-TV on Android phone, powered by ggml(llama.cpp,whisper.cpp...) and FFmpeg and opencv-mobile
PySimpleGUI based DESKTOP APP to AUTO GENERATE SUBTITLE FILE (using free Google Speech Recognition API) and TRANSLATED SUBTITLE FILE (using unofficial online Google Translate API) for any video or audio file
A Claude Code workspace that transforms ideas into multi-format content: research → long-form articles → platform-specific versions (LinkedIn, newsletter, social media, podcast Q&A)
Automate Creation of Story-Based Videos.
Claude Code skills for journalism, media, and academia - verification, FOIA, data journalism, academic writing, and more
A cross-platform desktop application for running AI models from [WaveSpeedAI](https://wavespeed.ai), as well as many free local AI models including Z-Image.
Instance and Community Explorer for Lemmy
HyperMotion is a pose guided human image animation framework based on a large-scale video diffusion Transformer.
Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Using ffmpeg command line to achieve an mcp server, can be very convenient, through the dialogue to achieve the local video search, tailoring, stitching, playback,clip, overlay, concat and other functions
Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive
AI-powered video editor with FFMPEG, Remotion, & Obsidian Agents Baked in
A memory-efficient implementation for upscaling videos in ComfyUI using non-diffusion upscaling models. This custom node is designed to handle large video frame sequences without memory bottlenecks.
Hardware-accelerated video decoding, encoding and processing on Intel graphics through VA-API. This module has been merged into the main GStreamer repo for further development.
Here is over 200 AI prompts that covers Blog Writing, Email Marketing , YouTube Ad Scripts, Facebook Ad,YouTube Video Ideas,Twitter Thread ,Cold DM Ideas,Influencer Marketing and Copywriting and Instagram Story.
Convert subtitles from one format to another format. Supported formats: STL EBU, TTML SMI, VTT, SRT
End-to-end workflow to automatically generate show notes from audio/video transcripts
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.
CarrotAI is a cutting-edge AI agent application that delivers real-time streaming chat via Server-Sent Events (SSE) with built-in Model Control Protocol (MCP) integration. It supports concurrent connections to multiple SSE MCP servers and provides user interfaces in English, Chinese, and Japanese.
A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.
Token-efficient data serialization for LLM/AI. 50% fewer tokens than JSON, 93% better value/token. Rust, schema validation, LSP.
Free in-browser audio & video censorship tool. AI-powered transcription with Whisper, 100% private client-side processing. Bleep profanity, custom words, or any phrase.
FFmpeg compiled inside an NVIDIA-enabled Docker Container
Transcribe audio and video files with speaker diarization and logically grouped timestamps using Gemini Flash
Streaming Availability API allows getting streaming availability information of movies and series; and querying the list of available shows on streaming services such as Netflix, Disney+, Apple TV, Max and Hulu across multiple countries!
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
Worlds First AI Video Editor and Voice Cloner
A user-friendly Raspberry Pi baby monitor with cry detection and audio/video streaming.
This project is a video processing application that extracts audio from videos, performs automatic speech recognition (ASR), and generates subtitles. It allows users to enhance audio quality, correct transcription errors, and convert subtitles into various dialects, all through a user-friendly command-line and web interface.
Installation script for an AI applications using ROCm on Linux.
🚀 High-performance image and video delivery and uploading at scale in Astro powered by Cloudinary.
Compare and generate AI videos across Sora, Veo, Kling, Seedance & more.
Claude Code Skills for podcast/video editing: transcription, content editing, rough/fine cut, final polish
The Multi-Language Automatic Translation, Subtitling, and Voice Rendering System uses third party software to automatically convert audio to text, translate text, render text to video, and render text to audio.
100+ tool MCP server for real-time global intelligence — markets, FX, bonds, earnings, SEC filings, conflict, military, cyber, climate, news, company enrichment, and 30+ domains. Live Leaflet dashboard with 20 map layers, SSE streaming, and AI situation briefs.
Automatically create subtitles for any video using google speech to text cloud api.
A Python tool that uses Google Gemini API to transcribe video or audio files into SRT subtitle files.
Iran War Media Monitor collects news articles covering the US-Israeli war on Iran and applies sentiment analysis to uncover who supports and opposes the war.
Open AI Video Subtitle Generator Agent Generate .srt subtitle files for any video no length limit, 100% free, offline, and runs locally on your machine.
Transcribe audio/video to text, locally on macOS, Linux and Windows. A simple whisper.cpp wrapper/UI built with Go/Fyne.
Play multiple live videos simultaneously in a grid
The new emerging Non Linear Video Editor for Linux. Backup of Lumiera master repository
Whisper Flutter Example Speech To Text Offline Android Linux Without Api Key Without FFMPEG
An audio/video transcriber with diarization and transcription editing.
Fast, accurate voice typing for Linux — IBus input method engine with streaming STT, Whisper refinement, and CUDA acceleration
MCP (Model Context Protocol) server for uploading media to Cloudinary using Claude Desktop
a multi-modal MCP layer for real life — built on continuous video, semantic search and natural language video understanding.
Transcribe Offline by openresearchtools.com is an open source desktop application that allows you to transcribe audio and video fully offline, with optional speaker diarisation and word-level alignment. It can also generate subtitles and integrate with local large language models (LLMs) for summarisation and editing
A modern FastAPI-based web app for real-time object detection using YOLO models, supporting image and video uploads, model selection, live streaming, and interactive UI.
Library and tool for downloading media and content from Torah websites.
Deterministic video editing SDK for AI agents. Ships with MCP tools.
⚙️ A Model Context Protocol (MCP) server for accessing Amazon S3 buckets. This server provides seamless integration with S3 storage through MCP, allowing efficient handling of large files including PDFs through streaming capabilities.
This Python script is designed to track music played on Home Assistant media players. It stores track information in a SQLite database and provides various statistics
LTX-2.3 video generation skill — setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model
🗣️ Whispers Talk. Recall. Repeat. A blazing-fast voice journal that remembers everything you say — searchable with AI. ✨ What is Whispers? Whispers is a voice-first journaling app powered by: 🧠 <300ms Latency Streaming Transcription (AssemblyAI) 🔍 Algolia MCP for instant search of your thoughts
An offline-first desktop app to automatically transcribe and edit video subtitles using OpenAI Whisper. Full control over text, timing, and advanced styling in a powerful, intuitive editor.
A fully offline, high-performance, streaming speech-to-text tool for developers on Linux.
installation for nvidia with cuda and ffmpeg encode on video card on Ubuntu 22.04 with GeForce GTX 1050 Ti
A powerful MCP (Model Context Protocol) server that transcribes audio and video files into text using Groq's Whisper model.
This repository contains a multi layer analysis of news articles, editorial opinions and public comments about the ongoing Iran - Israel War. It synthesis the dominant themes by perspectives by global media channels and what is convergence/divergence of editor's opinions and common public to news articles.
Watch Israeli TV channels live on Android. A simple, fast, and modern app for Israeli IPTV with PiP support and favorites. Built with Jetpack Compose.
OpenAPI specification for the Sofer.Ai API
App for transcribing audio/video to editable SRT subtitles using Whisper. Supports mp3/mp4/wav inputs, audio extraction, and local download.
A powerful Network Video Recorder (NVR) application that leverages GPU acceleration for real-time AI object detection, smart recording, and efficient video management. Built with Python, Flask, and YOLOv5, this application provides enterprise-grade surveillance capabilities with a user-friendly interface.