Skip to main content

Command Palette

Search for a command to run...

SpriteDX - Lab Note 6

Updated
3 min read
SpriteDX - Lab Note 6

Study on Frame Durations

Let’s take a break and study sprite animations and frame durations.

Example Sprite Animations

  • Example 1: Idle - Consistent - 90ms

  • Example 2: Variable

    • Slow attack 80ms

    • Normal attack 60ms

    • Fast attack 40ms

    • Slowdown 120ms → 160ms

  • Example 3: 100ms

  • Example 4 (KOF) uses 80ms and 120 in slow motions

  • Example 5 (KOF): 250ms → 200ms → 1ms (?)

  • Example 6: 60ms

  • Example 7: 100ms during fast move, 200ms during end of movement

  • Example 8: 110ms

  • Example 9 (Rag): 110ms

  • Example 10: 60ms (excited scene)

  • Example 11: 100ms

  • Example 12 (Metal Gear): Explosion 30 → 60 → 90 → 120ms (slows down)

Conversion:

  • 41ms → 24.4 FPS

  • 60ms → 16.7 FPS

  • 80ms → 12.5 FPS

  • 100ms → 10 FPS

  • 110ms → 9.09 FPS

Generated Animations from SpriteDX:

  • 41ms (24.4 FPS)

Takeaway:

  • Most of the “good” sprite animations has variable durations, and it ranges from 40ms to 120ms. We can categorize these in buckets:

    • 200ms (5 FPS)→ Very Slow

    • 100ms (10 FPS) → Regular

    • 60ms (16 FPS) → Fast

    • 40ms (24 FPS) → Very Fast

  • It would give more realism (i.e. “retro-vibe”) if the model can generate variable frame rates. Perhaps this is a topic for Anti-Corruption Model V3. Let’s leave it at that and move back to the topic of video-to-video anti-corruption model.

What does it mean for training anti-corruption-model v2?

The dataset I have has variable frame rates. So, if I want to learn the frame to frame relationships, I will probably have to convert the animations to run in 24 FPS probably by copying the longer duration frames. For example if there is a animation that uses 80ms (~12 FPS), I will need to repeat each frame twice to make it run 24 FPS.

Another example is when the frame durations are irregular like at 60ms (~16 FPS) or at 100ms (10 FPS). Naive nearest neighbor upscaling will create temporal aliasing.

Is temporal aliasing even an issue?

Good thing is that our inference target always is always at 24 FPS. Even if we were to train it using temporally aliased dataset, we probably won’t even notice anything.

What are some approaches to reduce temporal aliasing in training data?

Option 1 - Use 48 FPS (20ms interval):

Instead of converting the animation into 24 FPS, we could make it like 48 FPS or even 60 FPS. Then at the inference time, we will simply convert the input to that FPS as well. Then at the end, we downsample it back to 24 FPS or something lower or even figure out a way to support variable frame rates.

Option 2 - Train Variable Frame Rate Model:

This will basically train a model that is not only able to predict the sequence but also predict durations of each frame. Here is an example formulation:

  1. Input: sequence of frames and their durations

  2. Output: updated sequence of frames and their durations where <0ms duration would indicate skipping of that frame.

However, we don’t really have a dataset to train this model. That is, we will need smooth 24fps video and variable frame rate pixel art animation pairs.

Most likely way to train such model is to use some type of heuristic grounded in animation fundamentals. For example, we can say if there is less movement, the frame duration should be longer.

Option 3 - Be happy with what you got!

Acknowledge that temporal aliasing exist. Don’t fret over it and just train the model and be done with it. 😀 Idea is that we do minimal work. We just repeat the input frames to roughly match the 24FPS.

So, what’s next?

  • Do option 3, and fix the model’s FPS to 24.4 (=41ms).

  • I constructed sample dataset, now I will work on the model next.

— Sprited Dev 🐛

SpriteDX - Lab Note 6