Skip to main content

Command Palette

Search for a command to run...

SpriteDX - Pixel Restoration Model

Updated
3 min read
SpriteDX - Pixel Restoration Model

Right now, SpriteDX generates pseudo pixel arts that does not have perfect pixel alignments. For SpriteDX to be a production tool, we will need to nail this down.

Problem Definition

I want to learn a mapping:

$$f_{\theta}: \mathbf{X}_{\text{degraded}} \rightarrow \mathbf{X}_{\text{clean}}$$

where X_degraded are low-quality upscaled/blurry/noisy versions, and X_clean are perfectly crisp pixel-art originals.

This is a supervised image-to-image regression task, akin to super-resolution + denoising + deblurring + quantization alignment but with discrete color constrains typical to pixel art.


Dataset Design

We’ll need paired (degraded, clean) examples.

(a) Start from clean pixel art

Use existing pixel-art datasets:

  • Lospec dataset, Pixel-Art GAN dataset, or scrape open-licensed game sprites from Itch.io / OpenGameArt.

  • Ensure diversity in palette and style.

(b) Generate synthetic degradations

Automate your dataset creation using scripted corruption pipelines:

from PIL import Image, ImageFilter
import numpy as np, random

def degrade(img):
    img = img.resize((img.width*4, img.height*4), Image.BICUBIC)
    img = img.filter(ImageFilter.GaussianBlur(random.uniform(0.2,1.0)))
    arr = np.array(img) + np.random.normal(0, 3, img.size[::-1] + (3,))
    return Image.fromarray(np.clip(arr, 0,255).astype(np.uint8))

We can further add:

  • Slight sub-pixel shifts

  • Bilinear resampling

  • Palette diffusion (convert to RGB then reduce palette incorrectly)

  • Random gamma/exposure changes


Model Architecture Options

Option A — Pixel-Aware SR Network (Custom ESRGAN)

Start from ESRGAN or Real-ESRGAN, but modify it:

  • Replace perceptual losses (VGG) with CLIP or LPIPS fine-tuned on pixel art.

  • Add quantization loss that encourages output colors to snap to discrete palette bins.

Option B — Difussion-based Restorer

Train or fine-tune a diffusion model (like StableSR or Flux Fill) where conditioning image = degraded art, output = clean art.

This gives high fidelity but is heavier to train.

Option C — U-Net with quantization head

Simple U-Net (like Pix2Pix) with:

  • Residual blocks + pixel-shuffle upsampler

  • Output head applies softmax over limited palette (~256 colors)

  • Loss = L1 + perceptual + palette quantization penalty


Loss Functions

Combine several:

  • L1/L2 loss on pixels

  • Perceptual loss on low-level features

  • Palette quantization loss

  • Edge alignment loss (sobel gradient loss)

  • Optionally: adversarial loss for crispness (Patch GAN)


Training Pipeline

  1. Build dataset of (degraded, clean) pairs.

  2. Train with PyTorch Lightning or similiar.

  3. Validation metrics

    • PSNR / SSIM

    • Palette color error

    • Edge F1 (compare edge maps)


Inference / Post-Processing

At inference:

  • Apply model on degraded frame

  • Optionally re-quantize colors using K-Means or original palette.

  • Snap pixels to nearest grid alignment (sub-pixel rounding)

  • Apply minor sharpening or edge-aware dithering if desired.


Downsampling

The degraded pixel arts are often upscaled 2-5x. We want the model to be able to downsample while making the predictions.


— Sprited Dev 🌱

SpriteDX

Part 1 of 50

Tracks development of sprite generator AI tool. https://spritedx.com