Post Not Found

In previous post, we looked at the Stage 2 error rate which comes out to be around ~30%. Because the model that we are using is not open-weight, we’ve really only got prompt engineering and prompt engineering only really add 5-10% gain.

Unless we have a better model for the job, we will need to figure out automatic way to detect errors and to recover from it.

In this post, we will investigate into how to detect multi-shot animation errors, and how we plan to solve it.

Options

Option 1: Pick best out of N

We can run multiple batches in parallel and score each of them and pick the best 1.

Option 2: Score and Retry

Run Stage 2 repeatedly until quality/confidence score is met up to N times.

Option 3: Hybrid

We run batch of N inferences, then score each of them. We pick the best one, then if even the best one does not fit our score metric, we retry M times until we meet the quality.

Math

How many n inferences do we have to run to get 95% success rate?

Given:

Per-trial success rate p = 0,7
Per-trial failure rate q = 0.3
We want overall success probability to ≥ 0.95 after n independent attempts.

That means:

$$1 - q^n \ge 0.95$$

Then we solve for n:

$$\begin{align} q^n &\le 0.05 \\ n &\ge \frac{ln(0.05)}{ln(0.3)} \\ &\approx 2.5 \end{align}$$

To guarantee success 95% of time, we will need to run around n=2.5 times.

And to guarantee success 99% of the time, we will need around n=4 times.

If each run succeeds with probability p = 0.7 and we retry until the first success (independent trials), the number of runs N is geometrically distributed.

$$\mathbb{E}[N] = \frac{1}{p} \approx 1.4286$$

So even if we ran until successful run, on average each run will cost only 1.43 times the original.

We do have to cap this however since there may be other issues contributing to failures and we don’t want to indefinitely retry.

The Plan

If we batch 2 runs, we would be getting 91% success rate. Then, if that doesn’t work out, we will try 4 more times.

$$\begin{align} p(\text{success by 4}) &= 1 - q^4 = 0.9919 \\ \mathbb{E}[N] &= \sum_{k=1}^{4} k q^{k-1} p + 4 q^4 \\ &= 1.417 \end{align}$$

If we were to run it serially with N=1 and M=4, it would take on average:
- Expected amount of runs: 1.4
- Duration: 1.4x
  - with standard deviation of 0.73
- Cost: 1.4x
If we were to run it on batch with N=2 and M=4 (2 retries), it would take on average:
- Expected amount of runs: 2.1
- Duration: 1.1x
  - with standard deviation of 0.397
- Cost: 2.1x

Cost wise, it is better to run serially but we want to reduce the standard deviation of duration. This should provide better user experience and allow user to be able to expect thing to take same amount of the time most of the time.

— Sprited Dev

SpriteDX - How to Fight The Animation Error

Options

Option 1: Pick best out of N

Option 2: Score and Retry

Option 3: Hybrid

Math

The Plan

Comments

SpriteDX

More from this blog

Liveware: The Inevitable Future of Living Codebases - Part 1

Homepage Refresh

Sprited — Why We Exist

📌 Sprited Operating Principles V1

Where have I been

Command Palette

Options

Option 1: Pick best out of N

Option 2: Score and Retry

Option 3: Hybrid

Math

The Plan

Comments

SpriteDX

More from this blog