SpriteDX - Pixel Restoration Model

Right now, SpriteDX generates pseudo pixel arts that does not have perfect pixel alignments. For SpriteDX to be a production tool, we will need to nail this down.
Problem Definition
I want to learn a mapping:
$$f_{\theta}: \mathbf{X}_{\text{degraded}} \rightarrow \mathbf{X}_{\text{clean}}$$
where X_degraded are low-quality upscaled/blurry/noisy versions, and X_clean are perfectly crisp pixel-art originals.
This is a supervised image-to-image regression task, akin to super-resolution + denoising + deblurring + quantization alignment but with discrete color constrains typical to pixel art.
Dataset Design
We’ll need paired (degraded, clean) examples.
(a) Start from clean pixel art
Use existing pixel-art datasets:
Lospec dataset, Pixel-Art GAN dataset, or scrape open-licensed game sprites from Itch.io / OpenGameArt.
Ensure diversity in palette and style.
(b) Generate synthetic degradations
Automate your dataset creation using scripted corruption pipelines:
from PIL import Image, ImageFilter
import numpy as np, random
def degrade(img):
img = img.resize((img.width*4, img.height*4), Image.BICUBIC)
img = img.filter(ImageFilter.GaussianBlur(random.uniform(0.2,1.0)))
arr = np.array(img) + np.random.normal(0, 3, img.size[::-1] + (3,))
return Image.fromarray(np.clip(arr, 0,255).astype(np.uint8))
We can further add:
Slight sub-pixel shifts
Bilinear resampling
Palette diffusion (convert to RGB then reduce palette incorrectly)
Random gamma/exposure changes
Model Architecture Options
Option A — Pixel-Aware SR Network (Custom ESRGAN)
Start from ESRGAN or Real-ESRGAN, but modify it:
Replace perceptual losses (VGG) with CLIP or LPIPS fine-tuned on pixel art.
Add quantization loss that encourages output colors to snap to discrete palette bins.
Option B — Difussion-based Restorer
Train or fine-tune a diffusion model (like StableSR or Flux Fill) where conditioning image = degraded art, output = clean art.
This gives high fidelity but is heavier to train.
Option C — U-Net with quantization head
Simple U-Net (like Pix2Pix) with:
Residual blocks + pixel-shuffle upsampler
Output head applies softmax over limited palette (~256 colors)
Loss = L1 + perceptual + palette quantization penalty
Loss Functions
Combine several:
L1/L2 loss on pixels
Perceptual loss on low-level features
Palette quantization loss
Edge alignment loss (sobel gradient loss)
Optionally: adversarial loss for crispness (Patch GAN)
Training Pipeline
Build dataset of (degraded, clean) pairs.
Train with PyTorch Lightning or similiar.
Validation metrics
PSNR / SSIM
Palette color error
Edge F1 (compare edge maps)
Inference / Post-Processing
At inference:
Apply model on degraded frame
Optionally re-quantize colors using K-Means or original palette.
Snap pixels to nearest grid alignment (sub-pixel rounding)
Apply minor sharpening or edge-aware dithering if desired.
Downsampling
The degraded pixel arts are often upscaled 2-5x. We want the model to be able to downsample while making the predictions.
— Sprited Dev 🌱




