SpriteDX - Lab Note 6

Study on Frame Durations
Let’s take a break and study sprite animations and frame durations.
Example Sprite Animations
Example 1: Idle - Consistent - 90ms
Example 2: Variable
Slow attack 80ms
Normal attack 60ms
Fast attack 40ms
Slowdown 120ms → 160ms
Example 3: 100ms
Example 4 (KOF) uses 80ms and 120 in slow motions
Example 5 (KOF): 250ms → 200ms → 1ms (?)
Example 6: 60ms
Example 7: 100ms during fast move, 200ms during end of movement
Example 8: 110ms
Example 9 (Rag): 110ms
Example 10: 60ms (excited scene)
Example 11: 100ms
Example 12 (Metal Gear): Explosion 30 → 60 → 90 → 120ms (slows down)
Conversion:
41ms → 24.4 FPS
60ms → 16.7 FPS
80ms → 12.5 FPS
100ms → 10 FPS
110ms → 9.09 FPS
Generated Animations from SpriteDX:
- 41ms (24.4 FPS)
Takeaway:
Most of the “good” sprite animations has variable durations, and it ranges from 40ms to 120ms. We can categorize these in buckets:
200ms (5 FPS)→ Very Slow
100ms (10 FPS) → Regular
60ms (16 FPS) → Fast
40ms (24 FPS) → Very Fast
It would give more realism (i.e. “retro-vibe”) if the model can generate variable frame rates. Perhaps this is a topic for Anti-Corruption Model V3. Let’s leave it at that and move back to the topic of video-to-video anti-corruption model.
What does it mean for training anti-corruption-model v2?
The dataset I have has variable frame rates. So, if I want to learn the frame to frame relationships, I will probably have to convert the animations to run in 24 FPS probably by copying the longer duration frames. For example if there is a animation that uses 80ms (~12 FPS), I will need to repeat each frame twice to make it run 24 FPS.
Another example is when the frame durations are irregular like at 60ms (~16 FPS) or at 100ms (10 FPS). Naive nearest neighbor upscaling will create temporal aliasing.
Is temporal aliasing even an issue?
Good thing is that our inference target always is always at 24 FPS. Even if we were to train it using temporally aliased dataset, we probably won’t even notice anything.
What are some approaches to reduce temporal aliasing in training data?
Option 1 - Use 48 FPS (20ms interval):
Instead of converting the animation into 24 FPS, we could make it like 48 FPS or even 60 FPS. Then at the inference time, we will simply convert the input to that FPS as well. Then at the end, we downsample it back to 24 FPS or something lower or even figure out a way to support variable frame rates.
Option 2 - Train Variable Frame Rate Model:
This will basically train a model that is not only able to predict the sequence but also predict durations of each frame. Here is an example formulation:
Input: sequence of frames and their durations
Output: updated sequence of frames and their durations where <0ms duration would indicate skipping of that frame.
However, we don’t really have a dataset to train this model. That is, we will need smooth 24fps video and variable frame rate pixel art animation pairs.
Most likely way to train such model is to use some type of heuristic grounded in animation fundamentals. For example, we can say if there is less movement, the frame duration should be longer.
Option 3 - Be happy with what you got!
Acknowledge that temporal aliasing exist. Don’t fret over it and just train the model and be done with it. 😀 Idea is that we do minimal work. We just repeat the input frames to roughly match the 24FPS.
So, what’s next?
Do option 3, and fix the model’s FPS to 24.4 (=41ms).
I constructed sample dataset, now I will work on the model next.
— Sprited Dev 🐛


![[WIP] Digital Being - Texture v1](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fuploads%2Fcovers%2F682665f051e3d254b7cd5062%2F0a0b4f8e-d369-4de0-8d46-ee0d7cc55db2.webp&w=3840&q=75)

