Getting Started with Stable Diffusion (2026)

15 min read •Updated June 2026

Step-by-step tutorial on installing Stable Diffusion WebUI on Windows, writing prompts that work, using negative prompts, img2img, inpainting, ControlNet, LoRA models for free.

What Is Stable Diffusion?

Stable Diffusion is an open-source AI image generation model that runs locally on your computer. Unlike Midjourney or DALL-E (which run on cloud servers and charge monthly fees), Stable Diffusion is completely free once you have the hardware to run it. You own every image you create, there are no filters, no censorship, and unlimited generations.

As of 2026, Stable Diffusion 3.5 and SDXL produce images that rival or exceed commercial alternatives. Combined with tools like ControlNet and LoRA models, the possibilities are essentially unlimited.

Hardware Requirements

You do not need a supercomputer, but you do need a dedicated GPU:

Minimum specs: NVIDIA GPU with 4GB+ VRAM (GTX 1060 6GB or newer), 8GB system RAM, 10GB free storage (models are 2-7GB each), Windows 10/11 (Linux works too).

Recommended specs: NVIDIA GPU with 8GB+ VRAM (RTX 3060 12GB, RTX 4060, RTX 4070), 16GB system RAM, 25GB+ SSD storage (for multiple models + outputs).

No NVIDIA GPU? You can use CPU-only mode (very slow), Google Colab (free but time-limited), or cloud GPU rentals starting at $0.20/hour.

Installing AUTOMATIC1111 WebUI

AUTOMATIC1111's Stable Diffusion WebUI is the most popular interface. Here is the fastest way to install it:

Install Python 3.10.6 from python.org (check Add Python to PATH)

Install Git from git-scm.com

Create a folder called sd-webui somewhere convenient

Open Command Prompt in that folder and run these commands:


git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
webui-user.bat

Wait - First run downloads about 5-8GB of dependencies (one-time only)

A browser window opens automatically when ready

That is it. The WebUI runs locally at http://127.0.0.1:7860.

Your First Image: Prompt Basics

Stable diffusion uses natural language prompts. Here is the formula for great results:

Structure: (subject), (quality tags), (style), (composition), (lighting)

Example prompt showing good structure:
a majestic owl perched on ancient oak branch, intricate feather details, sharp focus, fantasy art style, dappled sunlight through leaves, golden hour lighting, highly detailed, 8k resolution

Key principles:

Put the most important elements first (order matters!)

Be descriptive, not vague (golden retriever is better than dog)

Add artist names for style direction

Include quality boosters: highly detailed, sharp focus, 8k

Negative Prompts: What to Exclude

Negative prompts tell the AI what you do NOT want. A solid default negative prompt includes:

low quality, blurry, distorted anatomy, extra limbs, deformed hands, watermark, signature, text, ugly, duplicate, mutation, bad proportions, cropped, out of frame, oversaturated, underexposed

When to customize negatives:

Portraits: add cross-eyed, asymmetric face, double chin

Architecture: add crooked lines, perspective error

Anime: add 3d render, realistic photo (to keep anime style clean)

Img2img: Transforming Existing Images

img2img lets you transform an existing image while preserving its composition:

Go to the img2img tab

Upload any image

Set denoising strength (the key parameter):

- 0.25-0.35 means subtle changes, preserves original closely - 0.45-0.55 means balanced transformation - 0.65-0.75 means dramatic changes, loose composition - 0.85+ means essentially ignores input, generates freely

Write your prompt describing the target result

Generate

Use cases: Colorize sketches, change art styles, upscale low-res photos, convert photos to paintings, fix composition issues.

Inpainting: Selective Editing

Inpainting lets you modify only parts of an image:

Generate or upload an image

Click Send to inpaint below the image

Use the brush tool to paint over the area you want to change

Write a prompt describing what should go in that area

Generate - Only the masked region changes

Pro applications: change clothing or accessories on people, replace backgrounds, fix distorted faces or hands, add or remove objects, extend images beyond their borders (outpainting).

ControlNet: Precision Control

ControlNet is revolutionary - it gives you precise control over pose, composition, edges, depth, and more:

Popular ControlNet models:

Canny / Lineart - Preserve edges and outlines

Depth - Maintain 3D spatial relationships

Pose - Lock human body positions from a reference

Seg - Control semantic regions (sky, person, building)

Tile - Upscale and add detail to small images

Workflow: Upload a reference image, select ControlNet type, adjust strength (0.5-1.0), and generate.

LoRA Models: Fine-Tuned Styles

LoRA (Low-Rank Adaptation) are small model files (10-200MB) that add specific styles, characters, or concepts:

Where to find LoRAs: Civitai.com (largest library, 100K+ models), Hugging Face (open-source community).

Popular categories: Character LoRAs for specific people or anime characters, Style LoRAs for photography styles or artistic mediums, Concept LoRAs for clothing items or aesthetic themes.

Usage: Download .safetensors file, place in models/Lora/ folder, refresh WebUI, select in the LoRA dropdown, set weight (0.5-1.0 typical).

Conclusion

Stable Diffusion has a steeper learning curve than Midjourney, but the payoff is complete creative freedom with zero ongoing costs. Start with basics (prompt engineering, txt2img), then progressively explore img2img, inpainting, ControlNet, and LoRAs. Within a week of regular practice, you will be producing images that match or exceed commercial AI art tools.