# SpriteDX - Pixel Restoration Model

Right now, SpriteDX generates pseudo pixel arts that does not have perfect pixel alignments. For SpriteDX to be a production tool, we will need to nail this down.

## Problem Definition

I want to learn a mapping:

$$f_{\theta}: \mathbf{X}_{\text{degraded}} \rightarrow \mathbf{X}_{\text{clean}}$$

where `X_degraded` are low-quality upscaled/blurry/noisy versions, and `X_clean` are perfectly crisp pixel-art originals.

This is a supervised image-to-image regression task, akin to super-resolution + denoising + deblurring + quantization alignment but with discrete color constrains typical to pixel art.

---

## Dataset Design

We’ll need paired (degraded, clean) examples.

### (a) Start from clean pixel art

Use existing pixel-art datasets:

* Lospec dataset, Pixel-Art GAN dataset, or scrape open-licensed game sprites from Itch.io / OpenGameArt.
    
* Ensure diversity in palette and style.
    

### (b) Generate synthetic degradations

Automate your dataset creation using scripted corruption pipelines:

```python
from PIL import Image, ImageFilter
import numpy as np, random

def degrade(img):
    img = img.resize((img.width*4, img.height*4), Image.BICUBIC)
    img = img.filter(ImageFilter.GaussianBlur(random.uniform(0.2,1.0)))
    arr = np.array(img) + np.random.normal(0, 3, img.size[::-1] + (3,))
    return Image.fromarray(np.clip(arr, 0,255).astype(np.uint8))
```

We can further add:

* Slight sub-pixel shifts
    
* Bilinear resampling
    
* Palette diffusion (convert to RGB then reduce palette incorrectly)
    
* Random gamma/exposure changes
    

---

## Model Architecture Options

### Option A — Pixel-Aware SR Network (Custom ESRGAN)

Start from ESRGAN or Real-ESRGAN, but modify it:

* Replace perceptual losses (VGG) with CLIP or LPIPS fine-tuned on pixel art.
    
* Add quantization loss that encourages output colors to snap to discrete palette bins.
    

### Option B — Difussion-based Restorer

Train or fine-tune a diffusion model (like StableSR or Flux Fill) where conditioning image = degraded art, output = clean art.

This gives high fidelity but is heavier to train.

### Option C — U-Net with quantization head

Simple U-Net (like Pix2Pix) with:

* Residual blocks + pixel-shuffle upsampler
    
* Output head applies softmax over limited palette (~256 colors)
    
* Loss = L1 + perceptual + palette quantization penalty
    

---

## Loss Functions

Combine several:

* L1/L2 loss on pixels
    
* Perceptual loss on low-level features
    
* Palette quantization loss
    
* Edge alignment loss (sobel gradient loss)
    
* Optionally: adversarial loss for crispness (Patch GAN)
    

---

## Training Pipeline

1. Build dataset of (degraded, clean) pairs.
    
2. Train with PyTorch Lightning or similiar.
    
3. Validation metrics
    
    * PSNR / SSIM
        
    * Palette color error
        
    * Edge F1 (compare edge maps)
        

---

## Inference / Post-Processing

At inference:

* Apply model on degraded frame
    
* Optionally re-quantize colors using K-Means or original palette.
    
* Snap pixels to nearest grid alignment (sub-pixel rounding)
    
* Apply minor sharpening or edge-aware dithering if desired.
    

---

## Downsampling

The degraded pixel arts are often upscaled 2-5x. We want the model to be able to downsample while making the predictions.

---

— Sprited Dev 🌱