Image Generation & AI Art
129 repos
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
21 Lessons, Get Started Building with Generative AI
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the community's development of image generation and unified models(click to website to see our blog)
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.
Enlightened library to convert HTML and CSS to SVG
🍌 World's largest Nano Banana Pro prompt library — 10,000+ curated prompts with preview images, 16 languages. Google Gemini AI image generation. Free & open source.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Uncensored, open-source alternative to Higgsfield AI, Freepik AI, Krea AI, Openart AI — Free, unrestricted AI image & video generation studio with 200+ models (Flux, Midjourney, Kling, Sora, Veo). No content filters. Self-hosted, MIT licensed.
Stable Diffusion built-in to Blender
Multi-Platform Package Manager for Stable Diffusion
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
A repository of models, textual inversions, and more
SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas, AGI functions, world-class Beam multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usable locally.
SD-Trainer. LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
Free prompt engineering online course. ChatGPT and Midjourney tutorials are now included!
Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative
The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.
Examples of ComfyUI workflows
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
ComfyUI's ControlNet Auxiliary Preprocessors
An extensive node suite that enables ComfyUI to process 3D inputs (Mesh & UV Texture, etc) using cutting edge algorithms (3DGS, NeRF, etc.)
GGUF Quantization support for native ComfyUI models
LTX-Video Support for ComfyUI
🤖 A Telegram bot that integrates with OpenAI's official ChatGPT APIs to provide answers, written in Python
Improved AnimateDiff for ComfyUI and Advanced Sampling Support
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.
Dead simple FLUX LoRA training UI with LOW VRAM support
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
Custom nodes pack for ComfyUI This custom node helps to conveniently enhance images through Detector, Detailer, Upscaler, Pipe, and more.
In order to make it easier to use the ComfyUI, I have made some optimizations and integrations to some commonly used nodes.
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Official SeedVR2 Video Upscaler for ComfyUI
A powerful tool that translates ComfyUI workflows into executable Python code.
Your Automatic Prompt Engineering Assistant for GenAI Applications
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
A ComfyUI custom node designed for advanced image background removal and object, face, clothes, and fashion segmentation, utilizing multiple models including RMBG-2.0, INSPYRENET, BEN, BEN2, BiRefNet, SDMatte, SAM, SAM2, SAM3 and GroundingDINO.
Curated GPT-Image-2 prompts for the OpenAI API — portraits, posters, UI mockups, game screenshots, character sheets, and more. Ready-to-use prompts for gpt-image-2.
Nodes related to video workflows
An open source `vercel` like deployment platform for Comfy UI
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
A ComfyUI workflows and models management extension to organize and manage all your workflows, models in one place. Seamlessly switch between workflows, as well as import, export workflows, reuse subworkflows, install models, browse your models in a single workspace
PALLAIDIUM — a generative AI movie studio, seamlessly integrated into the Blender Video Editor (VSE), enabling end-to-end production from script to screen and back.
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
A full-featured image/video management app with AI-powered organization and semantic search. Supports metadata from SD-webui, ComfyUI, Fooocus, NovelAI, StableSwarmUI, and more. Available as standalone app, SD-webui extension, or library.
Simple shell script to use OpenAI's ChatGPT and DALL-E from the terminal. No Python or JS required. Formerly https://gptshell.cc
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
Examples of programs built using Modal
Portable ComfyUI installer for Windows, macOS and Linux 🔹 Nvidia GPU support 🔹 Pixaroma Community Edition
LoRA Manager for ComfyUI - A powerful extension for organizing, previewing, and integrating LoRA models with metadata and workflow support.
ComfyUI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
Metadata-indexer and Viewer for AI-generated images
ControlNet scheduling and masking nodes with sliding context support
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. Now ZLUDA enhanced for better AMD GPU performance.
This repository offers various extension nodes for ComfyUI. Nodes here have different characteristics compared to those in the ComfyUI Impact Pack. The Impact Pack has become too large now...
Command Line Interface for Managing ComfyUI
Qwen-Image text to image lora trainer
Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. With a suite of over 15 specialized tools, function pipelines, and filters, this project supports academic research, agentic autonomy, multimodal creativity, workflows, and more
The most advanced Nano Banana image generator and editor application. Your central hub for AI image generation and revisions. Intuitive UI features reference images, editing with image masks, version history, and more. Powered by Gemini 2.5 Flash images API.
An image/video/workflow browser and manager for ComfyUI.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This custom node lets you train LoRA directly in ComfyUI!
ComfyUI nodes for WanAnimate model input preprocessing
Huge AI models catalog. A curated list of AI tools, platforms, and resources across various domains.
The stable diffusion webui training aid extension helps you quickly and visually train models such as Lora.
Start and end frames video generation nodes based on the modified Kijai version Wan2.1 nodes
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable Diffusion), tool calling, AI personas, RAG knowledge packs, TTS/STT. Fully offline, zero subscriptions, open-source.
ComfyUI custom nodes and web utilities for real-time AI generation and interaction
This extension serves as a complement to the Impact Pack, offering features that are not deemed suitable for inclusion by default in the ComfyUI Impact Pack
lightweight Python-based MCP (Model Context Protocol) server for local ComfyUI
Private voice keyboard, AI chat, images, webcam, recordings, voice control with >= 4 GiB of VRAM.
AI video generation SDK — JSX for videos. One API for Kling, Flux, ElevenLabs, Sora. Built on Vercel AI SDK.
Seamlessly integrate state-of-the-art transformer models into robotics stacks
LDSR custom node for ComfyUI
Welcome to the ChatGPT Prompts Library! This repository contains a diverse collection of over 100,000 prompts tailored for ChatGPT. Our prompts cover a wide range of topics, including marketing, business, fun, and much more.
Transcribe audio and add subtitles to videos using Whisper in ComfyUI
An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using OpenAI's GPT-3, creates images using OpenAI's DALL-E, adds voiceover using ElevenLabs API, and combines the elements into a video.
Run Replicate models as nodes in ComfyUI
Upscale your videos up to 4k on free google colab using Real-ESRGAN
An extensive node suite for ComfyUI with over 210 new nodes
Natural language → ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download. Supports txt2img, img2img, txt2vid, img2vid, audio, 3D generation across SD1.5/SDXL/SD3/FLUX/Wan2.2/HunyuanVideo/LTXV/Mochi/Cosmos + LLM integration. Works as a skill for Claude Code, Cursor, and other AI coding agents.
Custom nodes for using fal API.
AI-api text generation
VividNode: Multi-purpose Text & Image Generation Desktop Chatbot (supporting various models including GPT).
A cross-platform desktop application for running AI models from [WaveSpeedAI](https://wavespeed.ai), as well as many free local AI models including Z-Image.
HyperMotion is a pose guided human image animation framework based on a large-scale video diffusion Transformer.
A memory-efficient implementation for upscaling videos in ComfyUI using non-diffusion upscaling models. This custom node is designed to handle large video frame sequences without memory bottlenecks.
A Dall-E 3 localhost web UI for using advanced settings like style (vivid vs natural) or quality (standard vs hd). Can also be used when ChatGPT's Dall-E throttles you for the day, if you are ready to pay the API call costs. Comes integrated with Prompt Inspirer!
A visual node-based editor for building, sharing, and executing complex AI workflows with Fal.ai and Replicate.
Node to enable seamless multiuser workflow collaboration
ComfyUI Chatterbox TTS & Voice Conversion Node
🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic evaluations of text-to-image models and supports customization with user-defined metrics, datasets, and visualizations.
A powerful ComfyUI node for text-based image editing using Black Forest Labs' Flux Kontext API.
A ComfyUI custom node for Google's Gemini 2.5 Flash Image (aka "Nano Banana") model - the state-of-the-art image generation and editing AI that went viral for its incredible quality and capabilities.
Stable Diffusion Desktop client for Windows, macOS, and Linux built in Embarcadero Delphi.
An Extensive AI & Camera Metadata Viewer
a comfyui cuatom node for audio subtitling based on whisperX and translators
Replicate Flux LoRA image editor.
Installation script for an AI applications using ROCm on Linux.
MCP server for Fal.ai - Generate images, videos, music and audio with Claude
ComfyUI with AMD ROCm support for GPU-accelerated AI image generation on AMD RX 6000/7000+ GPUs
Self-correcting image generation for Gemini's Nano Banana model
AI-powered image generation using Google Gemini, integrated with Claude Code via Skills or Claude.ai via MCP (Model Context Protocol).
Learn how multimodal AI merges text, image, and audio for smarter models
RocM Optimized ComfyUI nodes
Prompt Management System for Interaction with the ChatGPT API
Generate videos from text using various Stable Diffusion Models via Text2Video-Zero.
Linux virtual keyboard driver which types what you say using Deepgram Flux STT API
AMD GPU Monitor for ComfyUI
Agent-native image-editing SDK for Claude Code. 21 MCP tools + /decompose skill — semantic layer splits, L1–L5 cultural scoring, region inpaint. Powered by ComfyUI, Gemini, or mock.
The ultimate PyQt6 application that integrates the power of OpenAI, Google Gemini, Claude, and other open-source AI models
The TeamAI application allows users to create a team of AI powered assistants with individual capabilities, personas. The AI assistants will solve the task requested by the user as a team effort, each bot contributing with its respective capabilities. Supported providers are Ollama and OpenAI.
LTX-2.3 video generation skill — setup, inference, prompting, ComfyUI integration for Lightricks 22B DiT audio-video model