Skip to main content

Command Palette

Search for a command to run...

SpriteDX - Pixel Alignment - Lab Note 4

Updated
4 min read
SpriteDX - Pixel Alignment - Lab Note 4

In a previous post, we proposed several approaches to solving the white color corruption issue. One of the more obvious solutions was to use system magenta as the background color.

Background Update

First, we updated the template’s background color from white to magenta:

Sample Generation Test

Prompt: girl wearing pretty white dress

The generated character correctly uses a magenta background, which is a good initial sign.

Next, we verified that Stage 2: Animation Generation still works correctly after this change. I updated the background setting in the prompt from white to magenta.

<shotstyle="background: magenta;" … >
  <character ... />
</shot>
<shotstyle="background: magenta;" … >
  <character ... />
</shot>
<shotstyle="background: magenta;" … >
  <character ... />
</shot>

Result:

There is a some amount of background color shift across frames. This could likely be improved with additional constraints or post-processing, but for now, we’ll ignore it and proceed.

Anti-Corruption Model Test

We then ran the previously trained anti-corruption model on every animation frame.

Original (640×640):

Resampled to 128×128 (bilinear):

Anti-Corruption Result (128×128):

Let’s zoom in 4x to inspect the result better:

Additional tests using a character with a more purple-toned dress yielded similarly promising results.

What Worked Well

  • Every Frame is Pixel Art: I think the strength is that every frame looks like pixel art to a good degree.

  • Correct Whites: Each frame captured the whites correctly.

  • Character is Consistent: Character was consistent.

  • Detail Preserving: No noticeable detail was lost.

  • Outline Stroke Recovery: The outlines were properly recovered.

  • Improved Latency: It takes just 4 seconds to run the inference as opposed to running 3 rounds of Flux.1 Kontext model which tends to take upwards of 1-2 minute.

  • Cost Reduction: Since we are not using Flux.1 Kontext, we will be saving cost of 30 cents (for 3 sprite sheets). This is huge savings given that our Stage 1 and Stage 2 only takes around 22 cents. We are saving 57.7% of total cost.

Areas That Need Improvements

  • Magenta Tint: Tint of Magenta color is visible in contours. This is probably area of further enhancement.

  • Bobbing Shoes: If you look carefully at waving scene. The characters foot bobs a bit. This artifact is also seen in the original animation sequence but accentuated in pixel art versions.

  • Sub-pixel Translations (❗️): In real sprite animations, regions of images are often shifted together, for example, if character breaths in and breaths out, the character’s head bobs up and down. Often times this is animated by moving the head 1 full pixel up and down instead of re-drawing sub-pixel translations.

Conclusion

  • System Magenta (#FF00FF) seems to be providing much better quality of final animated character sprites.

  • The anti-corruption model is effective at producing good quality sprite animation with transparency.

  • I believe we should integrate system-magenta approach and anti-corruption model into SpriteDX pipeline after more extensive testing.

  • Sub-pixel Translation seems to be a missed opportunity. If we can curate a animated sprite dataset, we should be able to create a model that will be able to train video-to-video model instead of image-to-image model. I think it would be extremely rewarding to work on this since video-to-video generation.

Next Steps

  • I will test out video-to-video gen topic today.

Appendix

Bilinear vs Nearest Neighbor

Since our anti-corruption model works in 128×128 resolution, the original 640×640 frames are downscaled using a simple Bilinear down sampler.

In this section, we study what happens when using Nearest Neighbor down sampler instead.

Bilinear:

Nearest Neighbor:

Anti-Corruption After Bilinear Down Sampling:

Anti-Corruption After Nearest Neighbor Down Sampling:

Result is very subtle…

  • The Nearest Neighbor ones looks sharper overall but the contour pixel colors drifts a bit.

  • Bilinear on the other hand does not much much color drift in contour pixels, but looks slightly blurrier.

I think I will stick with bilinear, but perhaps there is some missed opportunity here we could exploit.

— Sprited Dev 🐛