Skip to main content

Command Palette

Search for a command to run...

SpriteDX - Day 239 of Sprited

Updated
7 min read
SpriteDX - Day 239 of Sprited

It’s day 239 of Sprited. Today’s Saturday and I’m not going to be doing much work but just wanted to try out some sample generations.

Trial Run

We should really remove that magenta out of here. Also, should make pixelation optional. That is for the folks who want the exact pixel alignment, they c

an use it but the pseudo pixel art may be a good alternative. We could make it so that pixel art feel is entirely optional and user configurable.


Inscription

This week, I had chance to talk with my friend about the projects and one thing we discussed is that generating these pictures are easy but what really gives it life is the stories.

I want to experiment with “giving life” into these characters, and right next step may be to give them names, identities and background stores for each. They won’t be mere pictures or moving frames. I want each one of them to have an identity and a story.

One possible way is to have the story generation be part of the pipeline. But how to generate a good story that is relevant to the character? I mean, that’s easy, we just use LLM to generate story given a prompt.

But, in order for each of the characters generated to be living in the same universe, we need the story that is coherent to other characters generated by the same pipeline or pipeline template.

For example, let’s say the character I generated above is Luna, and Luna is a streamer and has friends A, B, C or lives in X town. These symbols need to be all real place, person, thing in the universe.

Instead of just generating mere characters, I want SpriteDX to generate a whole coherent universe with a intertwined story.

I want to call this process of giving a story to character something.

  • It defines characters location

  • It defines characters age, etc.

  • It defines personality, family history.

  • It defines …

  • So perhaps, I can call it “inscription,” like how a kid puts a name to the character and gives it meaning.

  • Or perhaps, we can simply call it “naming.” And once the character’s information is registered with the universe, it can be called “named.”

This idea requires that the characters have a universe identifier. So when the universe spawns a character, it needs to record that character’s history. It isn’t just a soulless character but a character that is “recorded“ into the universes net of history.

Let’s sleep on it.


Anti-Corruption Model — Action Items

For the Anti-corruption model, I want to write some TODOs.

  1. Increase validation set.

  2. Do ablation study on adding noise.

  3. Setup a codebase to generate more examples using SOTA models.

  4. Integrate relevant matting datasets. For those examples, lock the colors.

  5. Improve the architecture by taking in some strategies in BiRefNet.

  6. Finetune BiRefNet for Pixel Arts.

  7. Add temporal dimension. → Not a focus area b/c even with single frame, human annotator is is able to get correct answers.


Anti-Corruption Model — Guardrails

There are lots we can do, so let’s set some guardrails.

  • Single: Single-frame inference-only.

  • Cheap: less than 300mb preferred, 128×128 only, no external dependencies. Train only up to 100 epochs. Inference for 5 second video should take less than 10 seconds.

  • Academic: Every improvement will recorded. We will benchmark against BiRefNet. We will get help from academic community.

  • Specific: Non-pixel arts are going to be out-of-distribution.


Pixel Art Quantization Discussion

There is one issue I want to discuss. The model does pixelization and background matting at the same time. Should it? Or should it not.

Universe 1 - Matte First, Quantize Second

Each can be optimized separately. So, we will be breaking down the problem down to sub problems. This should help with focus and also opens more options since background-matting can be done using open source models, and the pixelization can be solved entirely separately just focusing on quantization. The keyword here is focus.

In this Universe 1, we do the Matting first with conventional SOTA models out there, and then quantize. The quantization will now need to be able to quantize the RGBA images instead of just RGB images.

BiRefNet for matting is a solid open-source choice, but it may need fine-tuning with the pixel art domain (Toon-out may provide a good alternative since it is already fine-tuned on comic art domain which is close to pixel art domain than real life photos).

Also downside is the BiRefNet is rather chunky (444-885mb). So, I will need to yet again increase the docker size. Not a show stopper but yeah. There is also lite version which seems to be around 178. That may be an option.

Score: 8/10

Universe 2 - Quantize First, Matte Second

In this mode, we are still training two separate model. However, we will quantize the images first. Then apply the matte. Is it better than Universe 1? I don’t think so, if matting is a matter of finding the right model and fine-tuning it, why not do the easy step first. And matting a quantized images may require quite a bit of fine-tuning of existing models since the most of the pixel arts are out-of-distribution.

Score: 3/10

Universe 3 - Quantize and Matte Together

This is the current implementation. Why are we doing this? The reason why I’m doing this is because information about silhouette (from matting) reinforces quantization along the edges. Also by learning how to matte a pixel art, the model naturally learns where the contour needs to be. So, both the model can be trained in one go without much of extra hassle. They reinforce each other.

One key observation is that matting and quantizing is not all that different. The matting predicts changes to R, G, B and A. It is just that in matting the RGB changes are mostly focused around the edges that require color tint change. The quantization model also predicts R, G, B and A. The color shifts are learned and same goes for alpha. When quantizing, it is not just doing a convolution filter but also may minimally shift the positions of the pixels in some areas to make it a better pixel art. So, both are basically predicting the those same 4 channels. Why should we train two different models?

The downside is that now the matting portion of the model is also going to be restricted to 128×128. So, we can’t use the matting model for matting higher resolution images. And if we need to matte larger images, we would need to fine-tune yet another model to make that happen. That is effort wasted. The plan is to provide users to create HighRes (non-pixel art) version and/or LowRes (pixel art). I mean studios have different needs, and we should support both needs.

Another more subtle issue with this solution is that training a perfect matte+quantize model would require more effort than training just the quantize model. That is, if alpha is already known, the quantize model does not have to learn the contours and it just have to learn how to quantize based on given alphas.

And, we notice that matting often is the most error prone and, if we can separate the concern, we would be able to solve matting more effectively by fine tuning, and solve quantization separately with the provided alpha at a higher quality.

Score: 7/10

Universe 4 - Do it All

Alternative is to keep the matte+quanitze model but make it able to do take in both RGB and RGBA images. If it is passed RGB only it will do matting+quantize, if it is passed RGBA, it will do mostly quantize. It sounds like it is going to be a headache to implement but it is actually a matter of altering the data corruption step to randomly add/not-add BG. The model will be able to do both matting and quantization. The benefit is that we can reuse all the parts we have in the current Anti-Corruption model and that the model will be able to do both matting+quantize at the same time or just quantize if matting is already done.

In addition to it, we will finetune SOTA model for matting (with high res support). So we can use the powerful SOTA model to reduce the errors downstream (anti-corruption model).

Score: 8.5/10

Thinking, we will do the Universe 4.


Monday Plan

Jotting down action items to focus on Monday

  1. Make the AC model into a Flow model → Train it → Evaluate it.

— Sprited Dev 🐛