SpriteDX

Right now, SpriteDX generates pseudo pixel arts that does not have perfect pixel alignments. For SpriteDX to be a production tool, we will need to nail this down.

Problem Definition

I want to learn a mapping:

$$f_{\theta}: \mathbf{X}_{\text{degraded}} \rightarrow \mathbf{X}_{\text{clean}}$$

where X_degraded are low-quality upscaled/blurry/noisy versions, and X_clean are perfectly crisp pixel-art originals.

This is a supervised image-to-image regression task, akin to super-resolution + denoising + deblurring + quantization alignment but with discrete color constrains typical to pixel art.

Dataset Design

We’ll need paired (degraded, clean) examples.

(a) Start from clean pixel art

Use existing pixel-art datasets:

Lospec dataset, Pixel-Art GAN dataset, or scrape open-licensed game sprites from Itch.io / OpenGameArt.
Ensure diversity in palette and style.

(b) Generate synthetic degradations

Automate your dataset creation using scripted corruption pipelines:

from PIL import Image, ImageFilter
import numpy as np, random

def degrade(img):
    img = img.resize((img.width*4, img.height*4), Image.BICUBIC)
    img = img.filter(ImageFilter.GaussianBlur(random.uniform(0.2,1.0)))
    arr = np.array(img) + np.random.normal(0, 3, img.size[::-1] + (3,))
    return Image.fromarray(np.clip(arr, 0,255).astype(np.uint8))

We can further add:

Slight sub-pixel shifts
Bilinear resampling
Palette diffusion (convert to RGB then reduce palette incorrectly)
Random gamma/exposure changes

Model Architecture Options

Option A — Pixel-Aware SR Network (Custom ESRGAN)

Start from ESRGAN or Real-ESRGAN, but modify it:

Replace perceptual losses (VGG) with CLIP or LPIPS fine-tuned on pixel art.
Add quantization loss that encourages output colors to snap to discrete palette bins.

Option B — Difussion-based Restorer

Train or fine-tune a diffusion model (like StableSR or Flux Fill) where conditioning image = degraded art, output = clean art.

This gives high fidelity but is heavier to train.

Option C — U-Net with quantization head

Simple U-Net (like Pix2Pix) with:

Residual blocks + pixel-shuffle upsampler
Output head applies softmax over limited palette (~256 colors)
Loss = L1 + perceptual + palette quantization penalty

Loss Functions

Combine several:

L1/L2 loss on pixels
Perceptual loss on low-level features
Palette quantization loss
Edge alignment loss (sobel gradient loss)
Optionally: adversarial loss for crispness (Patch GAN)

Training Pipeline

Build dataset of (degraded, clean) pairs.
Train with PyTorch Lightning or similiar.
Validation metrics
- PSNR / SSIM
- Palette color error
- Edge F1 (compare edge maps)

Inference / Post-Processing

At inference:

Apply model on degraded frame
Optionally re-quantize colors using K-Means or original palette.
Snap pixels to nearest grid alignment (sub-pixel rounding)
Apply minor sharpening or edge-aware dithering if desired.

Downsampling

The degraded pixel arts are often upscaled 2-5x. We want the model to be able to downsample while making the predictions.

— Sprited Dev 🌱

SpriteDX - Pixel Restoration Model

Problem Definition