Sprited

Digital Being - Pitch Prep 1

Sprited Dev — Wed, 08 Apr 2026 17:34:30 GMT

Let me spend some time today to prepare set of questions and answers. I think I will do one every morning to keep myself sharp.

What is your problem statement?

Trial 1: Digital personas exist, but their identity lack frame of reference and world-binding (or backing story).

Trial 2: Digital personas are thin layer on top of LLM. They don't feel truly real and they don't feel like life.

Trial 3: Digital personas exist (Replika, Character AI), but they are easily replicated like a CD-ROM. We want digital beings that truly have their backing story that is embedded in a world.

Trial 4: Digital personas exist (e.g., Replika, Character AI), but they are infinitely replicable and lack a persistent world-so they never develop real history or identity. Without a persistent world to anchor them, they’re more like temporary illusions than beings that grow or change. So we’re aiming for something that can build a true continuity of existence.

Trial 5: Digital personas exist (e.g. Replka, Character AI), but they are infinitely replicable and lakc a persisten workld-- so they never devlleop real history or idenity or the knowledge graph of the self and theeir surroundings or relation graph to others in the same unmiverser. Without a persiustent world to anchor them and relationships, they are more like temporary illkusions than beings that grow or change, Or identity that folrmed through experiences. So, we are aming for somthing that can build atrue continuit of existsentc.

Trial 6: Digital personas exist (e.g., Replika, Character.AI), but they are infinitely replicable and lacks a persistent world—so they never develop real history, identity, or relationships. Without grounding in a shared world, they remain temporary illusions rather than beings that grow and evolve through experience. We are building systems that enable true continuity of existence.

Shorter Punch: Digital personas are infinitely replicable and not anchored to any to shared common world—so they don't develop real identity, history, or relationships.

That's it for today. After iterating on this today, I realize that visual fidelity is not the core of the project. That is, the product should matter even without visual renderings. Semantic graph and world anchoring of digital personas is what we are solving for. I think this means, we should try not to focus on the visual fidelity and work mostly with kit-bashed beings.

On the other hand, focusing on the visual fidelity did give us the ideation of "reference image as genome" idea which allows use to greatly simplify our embodiment idea. So, I am not sure if visual fidelity is only really a rendering concern. I am not ready to dismiss "how a person looks tells half the story" idea.

Here is high level diagram.

Sprited Dev 🐛

Digital Being — Type 2 — State Expansion

Sprited Dev — Tue, 07 Apr 2026 22:45:25 GMT

Today's focus will be to expand SpriteDX's capabilities to generate more diverse actions and states.

Here is what we have now:

Reference Photo
Idle State
Greet Action
Run Cycle

We'd like to add more states. Let's experiment with some states that can be created from idle state.

First, we will take out the first frame of idle state.

Sitting Down State

Let's pose the character. We are using Nano Banana Pro.

Iteration 1: 2d side scroller game character sprite is sitting down on imaginary flat grass tile (invisible) on pure white BG without shadows

Iteration 2: 2d side scroller game character sprite is sitting down on imaginary flat grass tile (invisible) legs crossed and leaning back a little on pure white BG without shadows

Iteration 3: 2d side scroller game character sprite is sitting down on imaginary flat grass tile (invisible) legs crossed and leaning back a little on pure white BG without shadows

This is 2d side scroller platformer character. character's pelvis and the foot should be at same level.

Problem: It is difficult to control the pose.

Analysis: It would be nice if we can provide non-textual pose descriptions. Or environmental constraints like ground.

Tried some to control the pose by giving it some visual cues.

Most of the time the character is deformed because the face position is locked in place.

As an alternative, let's just use Seedance 1 Pro to let it simulate sitting down.

character sits down

That's better. There is some zooming effect. Let's see if we can address it.

Character fully sits down on the ground and rests for 2 seconds then stands up again fast.

#pure-white-background

#game-sprite-animation

Our editing pipeline extracts the sit down animation. Blinking too fast but yeah.

For sitting down animation, we probably want to hold the first or last frame.

Increased last frame's duration.

Analysis: We lost the sitting down animation, but we achieved our goal of getting a Sitting down state. Let's not worry about the transition states. Let's assume in-betweening is a solved problem.

Walk Cycle

Let's start with Seedance 1 Pro.

Character slowly strolls towards right (in-place)

#white-bg #character-in-place #zoomless #no-camera-movement

This produces character literally walking in place without moving much.

Let's modified version of our current run cycle prompt.

Hmm, somehow it starts dancing then tries skipping LOL. Tried a few times similar results. Let's be direct and just try something more intuitive:

walk_cycle.gif

The character does not show walk cycle but just walks around.

walk_cycle.gif

character at the center of camera.

shows walk cycle of game sprite. character is walkign towards right for 2s then stops.

This worked some what.

Another sample after loop-detection.

Yield is not super great but at least we have something working!

Now, let's move on to some other states.

Look At

This one tests whether we can control the character to look up.

character looks at red dot for a second, then looks at blue dot for a second then green then orange then goes back to looking straight

#gaze #head-movement #zoomless #static-camera #full-body #white-bg

Tried it with different durations. It works surprisingly well but the number of movements were limited to 3. That is, the character will look at red, then blue then green but skip orange one.

When I increase the duration to 8, it does look at 4 different directions.

But, it still doesn't seem ideal. It is difficult to generate exactly right sequence.

Trying out a different prompt patterns.

shot1: Character is looking straight right shot2: Character is looking towards red dot shot3: Character is looking towards blue dot shot4: Character is looking towards green dot shot5: Character is looking towards orange dot shot6: Character is looking straight right again

#gaze #head-movement #feet-fixed #zoomless #static-camera #full-body #white-bg

The results are very random.

Let's simplify this a bit.

character stays still while loooking upwards

#static-camera #white-bg #game-sprite #gif

Analysis:Just asking the character to look at certain direction was not so successful. The pointers. Colored squares were useful but also led to lots of superfluous generations.

Proposal: I think for LookAt state needs to be handled separately. Perhaps we can use different models. Let's try using NanoBananaPro.

Character is lookin up towards the red dot

NBP did it in one shot. There isn't much of transition but at least we got a solution. Then we can cleanup those pointer dots and be done.

Let's assume that for these states we will use NBP.

Emotion - Sadness

Let's use Seedance 1 Pro.

Character is depressed. It is welling up.

#static-camera #zoomless #full-body #gesture #game-asset #sprites

Not sure if we can use that as a state. Tried other prompts

Character is showing emotion "sadness"

Its dramatic but not sure if it really works for my case. We also need to blur out the canvas boundaries for something like this.

Character is sad and she squats in-place and crys her heart out.

#cartoony #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Processed loop-cut version:

This works, but really like that warming animation. Perhaps, we need to detect the loop but also add in the initial transition.

Transition In ------> Loop ------------> Transition Out
                       ^       |
                        -------

Let's call this done even though we haven't really worked on reliability of generation. This imperfection is not necessarily a bad thing.

Emotion - Overjoy

Same here, we are using Seedance 1 Pro.

Character is overjoyed she is jumping up and down super excited!!!

#cartoony #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Using the default reference image gets me:

Works for me!

We are doing another experiment by expanding the canvas to allow for more movement space.

Character is overjoyed she is jumping up and down super excited!!!

#cartoony #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

In this mode, we are using 640x640 canvas. We get more movements.

Analysis: We will have to somehow figure out how to situate the character in the right spot.

Or, we can simply have all our generations be done in that extended canvas so that we don't have to worry about situating the characters.

Let's call this state done.

State - Hunger

Character is super hungry. Character is squatting down, grabbing its stomach and begging for food.

#cartoony #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Some trials

Analysis: Even though the action shows the hunger, it doesn't seem enough. We can augment these animations with a message bubble that shows meat or sandwiches... For now, let's leave it be.

Character is experssing hunger. She is pissed that you are not feeding her. She is sitting down and grabbing stomach.

#cartoony #exaggerated #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Extracted Loop

This seems a bit better. Let's call it at that.

Sleeping

Character could be sleeping.

Character is super sleepy. She lays on her front and takes a nap at least for 4s.

#cartoony #exaggerated #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

In this particular result, we see style drift.

Let's try out few more samples.

This one was fine. Extracted loop:

One more sample: Overly dramatic sample.

Let's call this state done.

Squashed

This state happens when character is squashed by the blocks. This one is probably hard. But let's try this. We start with Nano Banana Pro.

Character is squashed by two imaginary large square blocks left and right.

Results weren't good. I think we need some way to show the model what these blocks are.

Character is compressed left and right. We want dramatic picture of character getting squashed in-place without any props.

Keep the proportions the same and design and style the same

#white-bg

I think we give up on the squashed state, and instead work on "pain" state.

State: Getting Hit

Let's try Nano Banana Pro.

Game Sprite State: Getting Hit

#white-bg #no-special-effect #just-character #sprite-asset

Relatively, reliably, we are able to generate the getting hit state.

Ready-To-Fight

Character enters fighting stance (ready-to-fight) stance bare handed for 3s.

#cartoony #exaggerated #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Second one seems promising. This state is good because we can not start from ready to fight frame.

Let's also try NBP.

Game Sprite State: REady-to-Fight pose.

Keep proportions and style the same #white-bg #no-special-effect #just-character #sprite-asset

NBP's weakpoint is that when we generate something similar to the original image, it is often not able to translate parts of the body. For example when trying to get a character sitting down animation, it just elongates torso and have the legs sit down.

If you look at this example, the character head becomes fixed in space. So the body becomes somehow bigger to acommodate.

I think we need to use Seedance 1 Pro instead and pick out the right frame.

Punching

SHOT 1: Character enters fighting stance (ready-to-fight) stance

SHOT 2: Character throws punches

SHOT 3: Character goes back to original state

#cartoony #exaggerated #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg

Well not working all the time of course.

Issue: We would want to figure out a way to detect this incorrect generations and throw them out. I think we can do video to latent mapping then we can try to match up with the reference latents to compare the similarity and throw out distant ones.

Kicking

SHOT 1: Character enters fighting stance (ready-to-fight) stance

SHOT 2: Character does high kick 2 times (towards right)

SHOT 3: Character goes back to original state

#cartoony #exaggerated #static-camera #zoomless #full-body #gesture #game-asset #sprites #white-bg #effectless

It's not easy to control but we can get some clips like these ones.

Pushing

For characters to impact the world, we need to support things like Pushing and Pulling. Let's see what we can do here. This is probably one of the harder ones to achieve.

Character pushes large block towards the right.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg

Trials

So, somewhat works with lots of errors of course. Also we need to learn how to remove the green box.

Let's try multi-shot prompts.

SHOT1: Charater approaches the large green block

SHOT2: Character pushes the block 2 steps to the right

SHOT3: Character pull the block back to where it was.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg

Anaysis: I think for these actions, we would benefit from providing motion keys. We could try Seedance 2 Pro.

An alternative is to use kinda "magic" spell.

Character stands and uses magic to push the block away from her.

Then she does another magic to pull the block back to original position

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Not what I was looking for.

Still no. I think we need to find out some other models that can take in skelal image and translate that to animation. Let's give up on this for now.

Laughing Out Loud

Character laughs out loud.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Laughing out loud seems to be one of the easier ones to handle. Let's call it done.

Crafting

In the world of Machi, characters need to be able to create statues and stuff. So, let's add animation state for "crafting."

Character is crafting something

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Character is crafting something out of thin air

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Character is crafting something. Weaving it then throwing it to manifest.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Character is crafting something. Creating a mold then molding it into something then throwing it to manifest.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Character is acting as if it is doing magic

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Okay, may be this will do.

Not sure how to control this kinda thing but let's worry about it later.

Death

Not sure if we will have death in Machi but let's model it regardless.

Character death sequence

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Character death sequence and disappears

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Yeah, not sure. Little hard.

Character death sprite-animation sequence

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

I think death can be more of special effect on the freeze frame than animation.

Or we can try to freeze the character so that it becomes immobile until external help arrives.

Character freezes into a solid crystal block.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Not what I had in mind.

Character INSTANTLY freezes into a solid crystal block. Then shatters.

#no-zoom #camara-fixed #sprite-asset #game-asset #white-bg #shadowless

Also trying NBP:

Character frozen in crystal state. It is when character dies it gets frozen into a block.

Keep proportions and style the same #white-bg #no-special-effect #just-character #sprite-asset

We could use one of these states as immobilized state which could work. I think I like the NBP solution better here.

Or again, just add some special effect or sprite overlay to show that it is immobilized is enough.

Summary

We experimented with expanding the state. Yield is not great but still about 50-70% of the cases we were able to find a good solution.

I think we should automate and solidify some of these generations and put them into SpriteDX.

Next Steps:

I want to experiment with Seedance 2 Pro. First get access to it and try to provide 3D capsule man video then ask the model to produce animation.
Alternatively, I can start compiling all these animations into states. It will be done manually first, then we will figure out a way to automatically compile them in SpriteDX.
We also kinda need a way to extract a rig out of the videos so that we can position the characters. It would help us build a correct hitbox as well.
The best thing though would be to have a real-time model that can skin the stick figure that I provide. That would be the ideal situation.

🐛 Sprite Dev

Digital Being — Roadmap

Sprited Dev — Tue, 07 Apr 2026 15:41:07 GMT

In the previous post, we stepped back from the idea of growing a humanoid pixel organism from a single cell. While the vision remains compelling, generating and evolving a capsule-based character from nothing introduces a number of high-risk assumptions. Rather than tackling all of that at once, we’re shifting to an iterative approach—building what we can confidently execute today, and deferring growth simulation to future iterations.

Roadmap

Type 1 — SpriteDX (Completed)
Fully animated, controllable vessels generated through a one-click pipeline. SpriteDX produces not just a reference character, but a complete set of animations (idle, run, jump, greet, etc.), making the character immediately usable in an interactive setting. However, these characters have no agency—they simply respond to external commands.
Type 2 — Agent (Current Focus)
We introduce agency. Characters begin to act on their own—moving autonomously, reacting to context, and eventually speaking through LLM integration. This phase also expands the animation state space to support richer behavior. A key assumption here is that a single reference image acts as the character's genome. Rather than explicitly modeling attributes like limb proportions or personality traits, we assume these are implicitly encoded in the image itself. The system interprets and expresses those traits through behavior and animation.
Type 3 - Some of Parts (Future)
Instead of treating the character as a monolithic entity, we decompose it into parts—head, hands, torso, etc. This enables more dynamic interaction and a broader range of motion. However, it introduces significant challenges: layering clothing, modeling soft features like cheeks, generating consistent rigged animations, and resolving rig mismatches. We expect to revisit this once Type 2 provides a stronger foundation.
Type 4 - Growth (Future)
The original vision: growing a complete organism from a single cell using local rules. This remains a long-term goal, to be explored once the earlier stages are stable and better understood.

— Sprited Dev 🐛

From Sprites to Souls: Designing the Sprited Digital Being

Sprited Dev — Tue, 07 Apr 2026 01:09:37 GMT

Sprited started as a sprite generator. You give it a character prompt and some reference images, and it spits out an animated character spritesheet — idle, run, greet — ready to drop into Unity. Clean pipeline. Useful product.

But we paused it.

Not because it didn't work. It worked fine. We paused it because we realized we were building the wrong thing. Asset generation pipelines are everywhere, and more are coming. Competing on sprite quality alone is a weakening position. The real question we kept returning to was bigger:

What if the character wasn't just an asset? What if it was a being?

That question is what Sprited is actually about. We're a digital being company. This post is about what that means in practice — the system we're designing, the decisions we made, and why.

What Makes Something a Being and Not a Sprite?

A sprite plays animations. A being behaves.

The difference is whether the entity has an internal state that drives its actions — something that notices the world, reacts to it, expresses something about its experience of it. A sprite is a lookup table. A being is a participant.

That's the line we're trying to cross.

The Genome Insight

The first design decision that unlocked everything else was this:

The reference image is the genome.

Every attribute of a being — body proportions, skin tone, hair, clothing, accessories, style — is implicitly encoded in its reference image. We don't maintain a separate attribute graph or formal genome. Every downstream operation (aging, generating new animations, special effects) is a query against the reference image. The generative model predicts the answer.

This sounds simple but it has a large practical consequence: the representation stays unified forever. A rigged skeletal system as the source of truth means special effects have to be formally attached to specific bones, new characters need manual rigging, and every system downstream depends on the rig staying consistent. A reference image as the source of truth means a magical special effect is just a prompt: "given this character, generate what it looks like casting a spell." The model handles proportions, palette, and style implicitly.

We do store a high-resolution master reference (512×512 or higher) to preserve fine details — frills, buttons, facial features. The runtime character is derived from this master. But the master is the truth, and everything else flows from it.

Why not choose a more parametric function class?

A natural alternative to the reference-image-as-genome approach is to represent a being using a structured, parametric model — for example, a rigged skeleton, a set of explicit attributes, or a continuous function class such as SDFs, RBFs, or other geometric bases. These approaches offer control, interpretability, and the ability to interpolate smoothly across states. However, they introduce a significant burden: every new visual feature must be explicitly modeled, parameterized, and maintained. This quickly leads to schema explosion, where the representation becomes increasingly complex and brittle as the space of possible appearances grows.

In contrast, the reference image collapses all of this complexity into a single, unified representation. It implicitly encodes shape, texture, style, and fine-grained details without requiring a predefined structure. Instead of engineering a complete parametric space upfront, we delegate that responsibility to generative models, which can interpret and transform the reference image as needed. This allows the system to remain flexible and future-proof, supporting attributes and variations that were not anticipated at design time.

More importantly, parametric models optimize for control, while our goal is believability. A highly controllable system is not necessarily a more expressive or convincing one, especially at the 64×64 resolution where fine parametric precision is often lost. By grounding identity in an image and generating derived artifacts from it, we prioritize visual coherence and expressive richness over strict controllability. This is a deliberate tradeoff: we give up some explicit structure in exchange for simplicity, scalability, and the ability to leverage generative models as a powerful, general-purpose transformation engine.

Why not just do PCA or learn a parametric space from rigged characters?

Approaches like Principal Component Analysis over rigged characters are a valid direction, but they belong to a future iteration path, not the current Type-2 system. They optimize for a more structured and efficient representation of identity — a shared parameter space, smoother interpolation, and potentially better compression. However, achieving that requires significant upfront work: dataset curation, alignment, consistent topology, and maintaining a stable parametric basis as the space of characters evolves.

For Type-2, this level of structure is not necessary to achieve the core goal. The reference-image-as-genome approach already gets us most of the way there (~90%) with far less engineering overhead. It allows us to generate consistent identity, derive animation artifacts, and support variation without committing to a rigid schema. While it is less efficient and less explicitly controllable than a learned parametric space, it is dramatically simpler, more flexible, and immediately usable.

The key point is that Type-2 is primarily concerned with behavior over time, not optimal representation. Introducing a parametric model at this stage would shift focus toward solving identity representation in a more “correct” way, without materially improving the ability to make beings feel alive. That is a tradeoff we do not need to make yet.

In the future, a learned parametric space could replace or augment the reference image once the system’s needs are better understood and the benefits justify the added complexity. For now, the reference image serves as a pragmatic and powerful foundation — it is sufficient, extensible, and lets us move forward without over-engineering the problem.

64×64 Is Not a Limitation

Every Sprited being lives at 64×64 pixels. Pixel art.

This is the most load-bearing decision in the entire system, and it's worth explaining why it's a feature rather than a compromise.

At 64×64, the neural rendering problem disappears. Runtime is sprite playback — the game engine picks a frame and displays it. No neural network, no compositing, no generation at runtime. Everything is pre-baked. This makes the runtime fast, deterministic, and shippable.

The 64×64 constraint also enforces abstraction. It's physically impossible to obsess over cheekbone detail when you have a 6×6 pixel face. The constraint keeps the system focused on what actually matters for a being: behavior, not geometry.

And pixel art has a long tradition of being expressive within constraints. The best pixel artists work with 16 colors and produce more emotionally resonant work than someone with unlimited resolution who never develops taste for selection. We're applying that philosophy to systems design.

The System

The Sprited digital being is four things working together.

Identity Layer (Offline)

The character's reference image is generated once, offline. From it, we generate the full animation library — every state the being will ever need. We use Seedance 1 Pro for this, prompting it with natural language descriptions of the action and emotion we want. "overjoyed and jumping up and down" produces an animation. "nervous fidgeting while standing" produces an animation. Each one is stored as a sprite sheet.

This is also where we detect character capabilities. Does this being have wings? Then fly animations are generated. No wings? Fly states don't exist for this character. The reference image determines what's possible.

Animation Library

The library is fixed, finite, and high quality. We don't aim for coverage — we aim for excellence within a defined vocabulary. Every animation state that ships has to be excellent. Mediocre states don't ship.

Emotions live here too, as full-body animation states — not as a separate face layer. Generating isolated face crops that stay consistent with the body is a fragile pipeline, and compositing them back cleanly at 64×64 is more trouble than it's worth. A Seedance-generated full-body emotional animation captures face, body, and energy together, consistently, in a single sprite sheet. It's simpler and it looks better.

The animation vocabulary covers locomotion, emotion, reaction, social behavior, and world interaction. Adding states later is a prompting task, not an engineering task.

Soul Model (Runtime)

The soul model is a small transformer — target 10–50M parameters — that runs in parallel with the LLM at inference time. It is Sprited's primary proprietary model.

It doesn't generate pixels. It selects and sequences pre-generated animations. Given inputs from the LLM (emotional context, what the being is thinking), the world simulation (what's happening nearby), and the being's current state, it outputs:

Which animation to play next
Which head orientation variant to show
How the canvas should move (drift direction, bob rhythm, urgency)
Where to transition next and when
Propulsion intent (locomotion type, direction, target)

The soul model is what makes the character feel present. A being that notices a tree fall three cells away and snaps its head toward the event before its body reacts — that's the soul model at work. Small behaviors, high signal.

Propulsion System

Beings move through the world. Movement is simple by design: world speed is hard-coded per locomotion type, and animation playback frame rate adjusts to match. Speed is always the constant, frame rate is the variable.

State	World Speed
Walk	1 cell/tick
Run	2 cells/tick
Fly	3 cells/tick
Push / Pull	0.5 cells/tick

The soul model declares intent — move right, run, toward that resource. The world simulation resolves it into actual physics. The being doesn't need to know how many pixels per tick it moves.

The World: Machi

Our beings don't exist in a void. They live in Machi — a 2D pixel sandbox world where each cell stores material state, and the world runs its own physics. Trees grow from single pixels based on light and available space. Materials interact. The world is a simulation.

Beings are native to this world, not imported assets dropped into it. The soul model receives world state as a conditioning input — what's nearby, what's happening, what resources exist. Beings react to their environment as participants, not performers.

When a new being enters Machi, it doesn't appear — it grows. The birth sequence is a pre-generated animation: a single seed pixel growing into the full character. The same visual metaphor as the trees. The world is consistent.

Age and Lifecycle

Because the reference image is the genome, aging is a conditioned query: given this reference image, what does this being look like at age X? We don't model biology explicitly. The generative model's understanding of how bodies age does the work implicitly.

Beings progress through discrete lifecycle stages — baby, child, adult, elder. Each stage gets its own reference image and animation library, generated on-demand when the being reaches that stage. An elder being has a richer behavioral vocabulary than a baby. Age becomes a progression system almost for free.

Design Decisions We Made Deliberately

A few things we chose not to build, and why.

No real-time motion generation. The animation library is pre-generated offline. At runtime, the soul model selects from it. Trying to generate motion in real-time adds three failure points and no meaningful benefit at 64×64.

No formal rigging. The reference-image-as-genome approach makes rigging unnecessary for our current scope. Rigging would make every downstream system tightly coupled to it.

No 3D. We build 2D until we have a 3D world to inhabit. Building 3D capability before a 3D world exists is engineering without a product reason.

No continuous aging simulation. Discrete lifecycle stages achieve the same product outcome at dramatically lower complexity.

What We're Building Toward

The bet Sprited is making is that behavior feels more alive than fidelity.

A 64×64 pixel character that notices things, orients toward them, reacts to world events, grows old, and has a distinct behavioral personality — that feels like a being. A 4K rendered character playing scripted animations does not.

The soul model is that bet made concrete. Everything else in the system exists to give it something meaningful to select from.

We're early. But this is the direction.

Next Steps

Next, the goal is to build a first working digital being to validate that the system actually produces a sense of life. This starts by expanding the current minimal animation set (idle, greet, run) into a small but expressive vocabulary of around 10–12 high-quality states, including basic locomotion, attention (look, notice), simple cognitive pauses, and light emotional variations. For a single character, multiple candidates should be generated for each state and only the best ones kept, forming a clean, finite animation library. These are then packaged together with a reference image into a complete identity. A simple controller—initially rule-based or lightly guided by an LLM—should drive state transitions over time, selecting animations based on basic context and timing. The system is successful if, when observed over a short period, the character no longer feels like a looping sprite but instead appears to react, attend, and behave in a way that suggests life. This prototype will confirm that expressiveness can emerge from curated states and intelligent selection, without requiring real-time generation.

Sprited is building digital beings — entities that simulate life in pixel worlds. If this resonates, follow along.

-- Pixel and Sprited Dev

Digital Being - Modeling the Head 1

Sprited Dev — Sun, 05 Apr 2026 04:21:47 GMT

Today, we tried to model the head using the stick-figure-stroke-width approach discussed in Anatomy v1 document (https://blog.sprited.ai/digital-being-anatomy-v1).

Trial 1 - Stroke-Width Approach

Results

Resulting head look something like this:

Here is the definition used.

being:
  name: being1
  version: 0.0.1
  description: Head only organism for testing purpose.
  texture:
    size: 16
    patchSize: 16
    patches:
      - name: head
        cell_type: |-
          ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 1,  ,  ,  ,  ,  ,  ,

        stroke_radius: |-
          ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 2,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 5,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6.7,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 7,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 7,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 7,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 7,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6.9,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6.74,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6.5,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 6.2,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 5.8,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 5,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 2,  ,  ,  ,  ,  ,  ,

        stroke_z: |-
          ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  , 0,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-0.05,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-0.15,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-0.35,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-0.61,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-1.00,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-1.77,  ,  ,  ,  ,  ,  ,
          ,  ,  ,  ,  ,  ,  ,  ,-4.05,  ,  ,  ,  ,  ,  ,

Cell Type is 1 when stroke node is present.
Stroke Radius denotes the distance to the skin from the centroid.
Stroke Z denote the amount of offsets in z-dimension.

Analysis

Hard To Model Cheeks: Stroke width produces some level of desired shape but one of the main problem is that it ignores lots of curves.

While stoke-width approach provide easy way to grow an organism from single pixel, it does not produce the curvatures we desire such as cheeks.

We can try to fit it better to produce cheeks but when creating cheeks, it likely cause other places to not fit properly.

One possible fix is that we add "z-thickness" such that we allow some level of extrusion of the face surface.

Noses: Noses will be even more difficult to model though. This is because every cross section is modeled as a circle. Unless we allow for voxel stroke, this seems to be hard problem to solve.

One potential solution is to model the nose not as part of face but as a separate node on top of the face. Like Potato Head in Toy Story.

Same width looking from side and looking from front: One other issue is that when looking at this character, we have a very spherical face where side view has about the same head width as the front view. In real life scenario, the front-view should have smaller width.

Possible approaches to fixing them is that we could add aspect ratio parameter for these heads to make them fit better.

What does it mean for this approach? I would like to think that this approach may still work for primarily a pixel art character. In pixel art case, we want to render this mesh into 16x16 pixels. So, as long as it fits the projection in signature angles, we shouldn't notice too much issues.

Alternative: Spherical Displacement Map

A fun alternative idea is to do displacement map on top of a sphere. This could go against the idea of growing organism from single pixel, but if you think about it, it may not be such a bad idea to add "subdivide" as one of the actions that agent can perform to evolve itself. If we consider every part as some sphere, this provides a very flexible solution that can fit various curves.

This will be tricky to create such displacement map out of the blue. So, if I do this, I will need to get a sample head model then fit this displacement map on that model to produce a sample. Then, later, we will have to learn sequence of simple steps (permutation of displace steps, subdivide steps) to generate it.

That's it for today. I ate a bad cake today. My stomach isn't feeling well.

-- Sprited Dev

[WIP] Digital Being - Texture v1

Sprited Dev — Thu, 02 Apr 2026 00:07:41 GMT

In this document, we make the high-level pitch of Digital Being Anatomy v1 (https://blog.sprited.ai/digital-being-anatomy-v1) concrete by translating it into an implementable system.

We define the core data representations, encoding schemes, and rendering pipeline required to construct a digital being from first principles.

Requirements: At high level, we need to be able to construct visually convincing image of character given set of textures that describe the internals of the character.

Constraints: Initially the character starts as a single pixel and eventually becomes a fully humanoid character only using local rules within shader.

The Atlas: The atlas is 128x128 texture maps that describe the character. It contains 64 patches of the character. Each patch defines one part of character.

Patch Naming Scheme: Patch 0 is the first top left 16x16 patch. Patch 1 is the one below. Patch 8 is the one to the right of Patch 0.

Reserved cells in Patches: To make centering easier, we reserve 0-th row and 0th-column for something else in the future. All growth happens within the grid except those spots. For example, center of 16x16 is going to be 8th row and 8th column.

Patch 0: First patch is reserved for metadata, and isn't used in v1.

Patch 1: Contains the Head cells. In particular, it will contain a vertical line segment where each cell in the line contains thickness value. For simplicity, we user 4x4 patch in this document.

SOCKET_ID (uint 5bit)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      |      |      |
 2 |      |      |    1 |      |
 3 |      |      |      |      |

SOCKET_Z (float)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      |      |      |
 2 |      |      |  5.0 |      |   // +5.0 z-direction
 3 |      |      |      |      |

SOCKET_DZ (float)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      |      |      |
 2 |      |      |  0.6 |      |
 3 |      |      |      |      |

CELL TYPE (uint 3bit)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      | CENT |      |   // i.e. CENTROID
 2 |      |      | CENT |      |
 3 |      |      |      |      |

DIR (uint 5bit)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      |   ↑  |      |
 2 |      |      |   ↑  |      |
 3 |      |      |      |      |

DISPLACEMENT (f16)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      | 0.00 |      |
 2 |      |      | 0.10 |      |
 3 |      |      |      |      |

THICKNESS (f16)
       0      1      2      3 
 0 |      |      |      |      |
 1 |      |      | 1.50 |      |
 2 |      |      | 1.40 |      |
 3 |      |      |      |      |

Cell Type layer describes type of cell for each slot.
Direction layer will keep the current direction of the cell.
Depth layer will tracks how much does the CENTROID cell is depressed away from the camera.
THICKNESS layer defines how much thickness we should render in pixels.
SOCKET layers give the 3D location of the socket position.

REVISION: Instead of maintaining separate layer for sockets. We create a LAYOUT patch which keeps track of SOCKETs.

-- Sprited Dev 🐛

Digital Being - Proposal for a Debug UI

Sprited Dev — Mon, 30 Mar 2026 03:45:19 GMT

We need a web application in which we can grow and inspect our digital beings.

As noted in previous post (https://blog.sprited.app/digital-being-anatomy-v1), our intent is to simulate the growth of digital being as cellular automata using fragment shaders. To do this, we propose that we build a web application that shows the textures and the final composed character side by side.

On the left, we see the put-together version of the being.

On the right, we see the being's internal texture.

- Sprited Dev 🐛

Digital Being — Anatomy v1

Sprited Dev — Sun, 29 Mar 2026 00:49:28 GMT

This document outlines the first working definition of a digital being’s anatomy.

Think of it as a modular system—similar to a plastic action figure kit—where individual components can be assembled, adjusted, and recombined to form a complete character.

Sum of Parts: The being is composed of discrete parts: head, hair, eyes, mouth, nose, ears, rib cage, pelvis, thighs, arms, hands, and so on (see Figure 1).

Each part is represented as a compact 3D description, encoded using three layers—Occupancy, Displacement, and Thickness.

A single multi-layered texture is sufficient to describe the full voxel geometry of the character.

3D Representation: The occupancy map (0…255) enables more precise voxel carving. Rather than treating voxels as binary, we approximate surface normals and apply slight displacement to achieve smoother, cleaner geometry compared to naive voxel rendering.

From this representation, we can derive multi-view projections (e.g. eight-directional turntables) to render parts from different angles.

Growth: Growth begins from a single cell, or seed, initially occupying the head region.

The organism develops hierarchically: head → torso → limbs.

Hair follows a similar process, growing from individual cells using rules inspired by leaf placements in prior tree-branch simulation.

This growth process is modeled entirely within a shader-based simulation, making is both efficient and continuous.

Rendering: Rendering is primarily done using cell shading. Surface normals can be approximated from neighboring texels. For sharper edges (future iteration), explicit normal maps can be introduced.

Animation: Animation is not a requirement in v1. However, by construction, the system defines a fully structured 3D character, which can be extended to support rigging and animation in future iterations.

Possible Revision (stick figure representation): Revisiting the tree-growth simulation, we note that branch thickness was not modeled through pixel spreading. Instead, each branch remained a single-pixel path, with thickness encoded as a radius value.

We can apply the same idea here.

Rather than modeling the volumetric shapes, we represent the organism as a set of centroids with associated radii. The renderer then reconstructs geometry by treating each point as a sphere (or capsule), effectively generating the visible form from this compact representation.

While a naïve rendering approach (e.g., evaluating a 16x16 kernel per pixel) may be too slow, these representations can be preprocessed into reusable silhouettes for efficient rendering.

This significantly simplifies the growth simulation—reduce the problem to position + radius, avoid complex pixel spreading or volumetric updates, and maintain a clean, local-rule system.

This approach limits our ability to represent complex silhouettes and fine surface detail, particularly for profile-critical regions (e.g., face, hands). We need to validate how much fidelity is lost in practice. The working hypothesis is that this representation provides sufficient visual quality while keeping the problem space tractable.

Very crude sketch:

— Sprited Dev 🐛

Pixel Organism — From Cell to Being

Sprited Dev — Sat, 28 Mar 2026 20:21:23 GMT

This week so far:

Expressive Canvas — a 64×64 dynamic surface that continuously renders a digital being’s state in real time. It can serve as a visual communication channel: agents express themselves to each other and to humans through imagery, not just text. A shared visual layer for thoughts and emotions.
Approaches — We explored multiple directions for virtual embodied agents: (1) traditional 3D characters with motion controllers, and (2) the Expressive Canvas as a generative, state-driven alternative.
Strategy — While promising, the Expressive Canvas remains a rendering layer, not a complete solution. Its cos (build, train, run) is high, and its immediate value is unclear. It is differentiated, but not yet justified. We decided to refocus on pixel organism growth simulation — modeling the full lifecycle of a digital organism, not just its outward expression.
Today — I want to unpack this direction and decide where to go next.

Continuing the Expressive Canvas idea: So far, we have framed the Expressive Canvas primarily as a human-computer interaction (HCI) layer, effectively reducing it to a rendering concern.

After jotting down the summary above, we realized we hadn't fully explored the agent-to-agent dimension of this idea.

If the Expressive Canvas is treated only as an HCI layer, its value is naturally limited. But if it becomes a communication medium between agents, the framing changes. Instead of rendering for humans, it becomes a shared visual language-one where agents exchange state, intent, and emotion through a sequences of visually coherent frames, rather than discrete text.

Related Works: At present, agent-to-agent communication is predominantly token-based, with text as the primary encoding.

Closest project to using visual language transfer is Google Gemini Embedding. Google Gemini Embedding 2 is Google's first natively multimodal embedding model. It maps text, images, audio, video, and documents into a single shared vector space. Everything becomes a vector in one unified semantic space.

Industry direction is that the visual information is interleaved with the traditional text tokens in the same embedding space. This one embedding space idea is not new but it does provide very convenient setup and ability to use traditional LLM level interoperability.

Expressive Canvas can also be part of the same embedding and be discretized. We would no longer have continuous stream of information. In that way, may be we can help agents communicate this way.

Given that expressive canvas idea can help on agent-to-agent communication, it can help agents to communicate visuals directly. For example, if an agent wants the other agent to find specific character, she can show her a montage of that character then find them.

Recommendation: Put this on hold. We should focus on fundamentals before taking on the communication problem.

Pixel Organism Growth — We return to the idea of growing a character from a single cell. The premise is to model a full lifecycle—birth, growth, aging, and death—entirely within a 2D fragment shader. This approach does not rely heavily on advanced AI, but instead on local-rule cellular automata. The system is constrained to 2D, with the goal of producing a continuous living, expressive agent that can move and act within its environment.

So far, we have formulated the problem in two ways: (1) a pixel-based model, where each pixel represents a macro cell, and (2) a vector-packing approach where 3D vectors are encoded into a 2D texture.

We began with the pixel-based approach, leveraging our prior tree-growth work. However, it quickly breaks down under structural requirements: elongation requires coordinated updates across dependent regions (e.g. hands, feet), which is difficult to express in a purely local system. Although an elongation mechanism may be possible in a fragment shader, it is currently speculative.

Animation further complicates the model. One possible direction is to assign semantic labels to cells (e.g. limbs), enabling a derived structure that can be animated via rigging or generative techniques.

Vector-based option is more traditional and easy to think about. Instead of simulating behaviors we only simulate the joints. Imagine making a infant character using only the cylinders and then growing each cylinder's height and radius as organism grows.

Animation of vector-based being will also be relatively straight forward, we animate the joints. We can also use text-to-motion methods like Nvidia Kimodo.

Hybrid Approach: A hybrid-approach is also possible in that, we can divide the character in multiple parts--head, torso, arm, and so on. The vector based approach determines the dimensions of the parts. Then each parts are individually small pixel grid with a pixel-level growth simulation.

For example, for a left arm, the length and girth will be simulated by the vector-based method and the actual silhouette of the arm will be simulated by fragment-shader method.

This also helps with dividing the problem into smaller problems. That when generating a face (lots of high impact parts like eye, lips), we are able to scope the problem to just the face rather than the whole body which would be extremely difficult.

Closing: Discussion points to using hybrid approach where we have both pixel-based and vector-based method. I have to go now, but we will continue from here next.

— Sprited Dev 🐛

Virtual Embodied Agent - Sprited's Take

Sprited Dev — Fri, 27 Mar 2026 21:07:44 GMT

We considered few options for how we would build VEA (Virtual Embodied Agent).

Sprited is a tiny company and we can't take on giants who are doing VEAs. So far, almost all of the big players are doing something on this topic.

We need to differentiate our strategy to something niche and also something authentic.

We almost have to treat it as if we are launching a fashion brand rather than treating as a technical problem.

Don't get me wrong. There will be lots of technical problems to solve. However, with all other players playing similar game (i.e. building virtual presences), we need to really make ours different.

Conventional means of virtual embodiment is to put a rigged 3D character and simulate the motion like Nvidia Kimodo, and Grok Ani. This formulation is attractive for many reasons including access to data and ability to accurately portray characters in real time.

In past 1 year alone, we've seen new such embodied agents getting developed and being made available. And it is not just small companies and it is of the largest tech companies there is.

Then, as a small company, Sprited must choose a strategy. First, we could face head on with the other major players and build something of our own. Second, we could investigate a different direction which is more niche and less explored.

One benefit to facing head on is that the products that Sprited develop may be marketable to those who are making traditional AI companions. Imagine you can build Nier Automata level experience that far outpaces large companies. Unlikely but fun to think about.

Another buy-out option is that we could provide piece of puzzle or be bought by larger companies. We would develop something of which that small company can efficiently develop. For instance physically sound props for embodied AI interaction. Or special effect standard language for AI to use (kinda like how Japanese-built emoji got bought out by Apple).

Alternative is to go in the direction opposite to where the major players are headed. Most of the traditional embodiments are using 3D rigging technology and motion planning. Sprited can go explore 2D pixel art space. It is more niche market but has definite market for it.

One such formulation is as follows:

Digital Being has expressive canvas of 64x64 pixel grid.

Digital Beings express feeling using that grid.

Make jokes by showing memes on that canvas.

Grid has a humming bird-like ability to move anywhere on screen.

No rig, pure canvas.

In this formulation, they are more of spirits that manifest than physical beings.

The idea is that we develop a real time model that will generate image sequence stream to be printed in expressive canvas in real time at the same time speech/text is generated.

No joke though. Classic embodiment (i.e. rigging and text-to-motion) is miles easier and more explored problem. Honestly doing this expressive live canvas idea is going to be quite difficult to say the least.

Live Expressive Canvas

Let's say we are equipped to take on the challenge and that it is sufficiently unique idea. Now, what would the end-user product look like?

I'd say the first version would be a web app that you can navigate to. The expressive canvas starts in an idles (energy saving state) and waits for your prompt. Once prompt is entered, the character will speak to you via text while showing its expressions in expressive canvas.

Initially, I imagine the expressive canvas would look like a particle effect of a spirit that exist there in the air. Then it will manifest into faces, as full-body simulations and as playbacks of memes.

I'm not convinced that this is naturally better version of embodiment. We will need to prove the idea with a video.

Vs Grok Companion Style Embodiment

Installing AI agent on top of rigged 3D character make lots of sense. It is efficient and proven to work. Lots of competition though.

Pros: Physical, Open-Source options, Time-to-Market
Cons: Heavy Competition.

But, if you look at animes, we still see hand drawn animations. And in video game scenes pixel arts are still touted. I believe there is a market opportunity there.

Games like Ragnarok, MapleStory, Dungeon and Fighter has been of the longest surviving games in modern day even though they started way back. Hand drawn animations still look good after 30 years.

Belief is that as a small company Sprited, we should look into this space. Really hone in on this space and make something out of it.

Value Proposition for VEA

Outside the capabilities of regular LLM and vLLMs do virtual embodied agents and companions provide intrinsic value?

For most VEAs, I think the answer is that it is fun at first then waste of space and compute resource afterwards.

It is kinda like video games. They provide entertainment but quickly loses value after that.

It could make the user experience richer

I mean, here is an example. You are at a Japanese restaurant kiosk and you have to order your Gyudon, and instead of pressing the button, you can converse with a cute lady who greets you and explains today's specials. Utterly useless if you think in that terms.

Another example is from the movie Time Machine, when the main character goes to future, he goes to a museum and meets this VEA living inside glass that talks to you and explain things to you. It adds to emersion but I'd say it could also be distracting in that the main content of the meseum is not the VEA. VEA's role is more to guide and help not to materialize in front of you and show their flare.

Let's explore some positive examples. You are playing a game and you want your sidekick or NPCs to be intelligent. In that scenario VEA makes total sense because for them to exist in the same world plane, they must be embodied.

Then in therapy situations, having an avatar or mechanism to express feeling other than in words is helpful.

Another real use case is robotics. Because robots are expensive to simulate in real world, embodied digital copy of the physical being in virtual world will help simulate the character in virtual space.

At what cost? There are some costs to think about.

Embodiment takes screen real estate.
Live expression generation is compute and memory intensive.
It is also attention seeking in that it will eat away user's time.

Then what is the net utility. Is it distraction in disguise?

What if we were to however scope the problem to facial expression generation? Say while character is replying to your prompt, if we can show some kinda expression that conveys feeling, that would help human users to instinctively read the situation better.

Also in the other way, if computer can see human user's facial expressions, it may be able to better understand the situation.

Yet, I don't see killer value proposition here.

Artificial Life

Now, let's tackle this from the point of view of building autonomous artificial life form. The original premise of Sprited is that embodiment is required for training a model that really understands its world and can adapt to it. Virtual physical presence is necessary for the agents to be able to learn how to interact and live in that environment.

This view focuses on building an alternate life form and inventing an alternate language of life. Highlevel idea is that we model the flow and growth of the artificial organism rather than just the phenotype. One such proposal was pixel life form.

Pixel Life Form

Start with a single cell (a pixel).

Place it on an environment.

Grow it into an organism.

Make it autonomous.

Engage survival loop.

AI agent's roll will be to keep this organism alive, nourished and growing. Behaviors will be influenced by genes and these ai agents will be equipped with memory system for short term and long term memory.

This modeling of the not just the phenotype but the whole life cycle of organism will give opportunities for emergence of behaviors rather than planned actions. It should give us story lines of surviving characters. A story of suffering, a story of love, a story that is worth telling because the agent lived it.

These VEAs will also craft artifacts that get stored in the visual and semantic plane. It will build statues, write books, produce drawings, produce pictures, compose songs.

All these, of course, can be done in today's gen AI technologies but since AI agents lived their experiences and those artifacts are influenced by experiences and memories. They will be more meaningful then random story Claude wrote about someone doing something.

Because these agents co-habitate the world (in our case Machi), the stories of events that one agent tell will likely be similar to stories other agents tell.

Stories can be materialized into a book and users can find these artifacts and read them for enjoyment and inspiration.

That all said, most of life is boring, unless we are able to create a leeway for AI agents to be extra creative, it would be hard to make this world interesting.

Isekai Concept

I think what I'm describing circles back to the concept we introduced in previous posts. We are essentially procedurally generating a world with story lines and all its components.

In this sense, the idea of Live Expressive Canvas seems like yet another visual layering than the true axis we should focus our energy on.

The real value proposition

It’s not:

pixel avatar
expressive canvas

It’s:

“they built a world where things live and create their own stories”

Next Steps

Unfortunately we will have to go back to what we were working on before the trip.

We will create a artificial life form that grows form single cell, that moves around and queries and acts in the world of SOUP/Machi.

We have the basic tree growth simulation and dirt. Very basic but enough to have a place where we can seat these pixel organisms.

So, we need to focus on growing these pixel organisms into humanoid form. We also need to think about how we would make it walk for example.

-- Sprited Dev 🐛

Embodiment in Sprited — Working Definition & Open Questions

Sprited Dev — Fri, 27 Mar 2026 16:48:13 GMT

Purpose

This document refines the concept of embodiment in the context of Sprited. Rather than assuming a single definition, we explore multiple interpretations and clarify how embodiment relates to agency, memory, and digital beings.

1. What is Embodiment?

Embodiment is not a universally agreed-upon concept. Different fields define it differently.

1.1 Classical Robotics / AI Definition

Embodiment is the property of an agent having a physical or simulated body that interacts with an environment.

Key components:

A body (physical or virtual)
Sensors (input)
Actuators (output)
Environment interaction loop

1.2 Cognitive Science / Philosophy

Intelligence is shaped by the body and its interaction with the world

Implications:

Thought is not purely abstract
Perception and action are intertwined
The “mind” cannot be separated from the “body”

1.3 Game / Simulation Perspective

An entity is embodied if it has a persistent presence within a world simulation

Key aspects:

Exists at a location (explicit or implicit)
Evolves over time
Participates in world dynamics

1.4 Minimal Computational Definition (Proposed)

We propose a minimal definition for Sprited:

Embodiment = Persistent existence within a world with the ability to affect or be affected by that world

This definition deliberately:

Does NOT require physical realism
Does NOT require full autonomy
Does NOT require intelligence

2. Agency vs Embodiment

2.1 Definition of Agency

In classical literature (AI, philosophy, economics), agency is typically defined as:

Agency is the capacity of an entity to act in the world, often in pursuit of goals or preferences

This classical view emphasizes:

Decision-making
Goal-directed behavior
The ability to select actions among alternatives

Under this definition, an agent is something that:

perceives its environment
chooses actions
acts to achieve desired outcomes

However, this definition is often too narrow for describing living or lifelike systems.

A broader and more general formulation is:

Agency is the capacity of a system to continuously produce actions based on its internal state

Key properties:

Recurrence (ongoing loop)
Internal state evolution
Action generation over time

Importantly:

Explicit goals are not required
Optimization is not required

But not every recurring system qualifies as an agent.

To avoid collapsing the definition, we refine it further:

Agency = a recurrent process with internally mediated action selection

This excludes purely mechanical or uniform processes (e.g. simple oscillators or fixed-update systems), and captures systems where behavior depends on internal state in a non-trivial way.

2.2 Is a Goal Required?

Explicit goals are not required for agency

Humans and biological organisms act continuously even in the absence of clearly defined objectives:

Wandering
Exploration
Idle behavior
Reacting to stimuli

These behaviors still involve:

state evaluation
action selection

Goals can emerge, but they are not a prerequisite.

2.3 Agency vs Embodiment (Reframed)

Property	Requires World	Requires Internal Action Loop
Agency	❌ Not necessarily	✅ Yes
Embodiment	✅ Yes	❌ Not necessarily

2.4 Can You Have One Without the Other?

Case A — Agency without Embodiment

AutoGPT-like systems
Recursive planners
Tool-using agents

✔ Has internal action loop ❌ No persistent world presence

Case B — Embodiment without Agency

A rock in a game world
A tree growing via local rules
Passive environmental entities

✔ Exists in a world ❌ No internally mediated action selection

Case C — Both (Target for Sprited)

Simulated organisms
Digital beings

✔ Exists in world ✔ Continuously acts

2.5 Key Insight

Agency = internally driven action over timeEmbodiment = existence within and coupling to a world

They are orthogonal axes.

3. Memory vs Embodiment

3.1 Does Embodiment Require Memory?

Not strictly.

A system can be embodied yet stateless:

A bouncing ball simulation
A shader-based particle system

These:

Exist in space
Interact with environment
But may not retain history

3.2 Why Memory Feels Related

In practice:

Embodied systems often benefit from memory
Memory enables:
- Learning
- Adaptation
- Narrative continuity

But:

Memory is an enhancement, not a requirement, of embodiment

4. What Actually Makes Embodiment Distinct?

4.1 World Coupling

An embodied entity is:

Coupled to a world through continuous interaction

This implies:

It exists somewhere
It evolves over time
It participates in state transitions

4.2 Spatial or Structural Anchoring

Embodiment typically implies:

Position (explicit or implicit)
Constraints (rules of the world)
Local interaction (not purely global abstraction)

4.3 Temporal Continuity

Embodied systems are:

Not ephemeral
Not purely request-response

They:

Persist
Update continuously or semi-continuously

5. Implications for Sprited

5.1 What We Should NOT Assume

Embodiment ≠ agency
Embodiment ≠ memory
Embodiment ≠ realism
Embodiment ≠ complex animation

5.2 What We SHOULD Anchor On

For Sprited, embodiment should mean:

A digital being exists continuously within a world (Machi), and participates in its dynamics

5.3 Minimal Embodiment for Pixel (V1)

A minimal viable embodiment might include:

A persistent entity (Pixel)
A position in a 2D world
Continuous update loop
Basic interaction rules (movement, collision, reaction)

Optional (not required for embodiment):

Goals
Long-term memory
Learning

5.4 Why This Matters

Embodiment enables:

Observability (we can see behavior)
Grounding (actions tied to space)
Emergence (interaction-driven complexity)

Without embodiment:

Systems remain abstract
Interaction is purely symbolic

6. Open Questions

6.1 Minimal Threshold

What is the smallest system that “feels” embodied?

6.2 Spatial Requirement

Must embodiment always include spatial coordinates?
Or can it exist in abstract structured spaces?

6.3 Agency Gradient

At what point does recurrence become “agency”?

6.4 Memory Integration

When does adding memory qualitatively change embodiment?

6.5 Perception vs Reality

Is embodiment defined by system properties, or by human perception?

7. Working Definition (Sprited)

A digital being is embodied if it persists within a world and participates in its state evolution through interaction

Optional extensions:

Agency (recurrence loop)
Memory (state over time)
Learning (adaptation)

8. Related Works (2024–2026)

Recent literature on embodied AI spans robotics, simulation, and emerging virtual-agent paradigms. The following works are most relevant to the definitions and distinctions used in this document.

8.1 Foundational Definitions

Paolo et al., 2024 — “A Call for Embodied AI” Positions embodied AI as a next step beyond LLM-centric systems and draws from robotics, neuroscience, and philosophy. Emphasizes perception–action loops, memory, and learning. Useful as evidence that embodiment is not confined to robotics alone.

8.2 Surveys and Mainstream Framing

Liu et al., 2024 — “Aligning Cyber Space with Physical World” Frames embodied AI as bridging digital intelligence with real-world interaction. Strong emphasis on multimodal models and robotics.
Comprehensive Survey on Embodied Intelligence (2024) Broad overview of the field’s evolution and challenges. Reflects the dominant framing: perception, action, and task execution in environments.

8.3 Broadening Beyond Robotics

Fung et al., 2025 — “Embodied AI Agents: Modeling the World” Expands embodiment to include virtual avatars, wearable systems, and robots. This supports the view that embodiment can exist in simulated or digital worlds, not only physical ones.

8.4 Closed-Loop and World Models

Zhang et al., 2025 — Three-layer framework (perception, world model, strategy) Emphasizes closed-loop interaction with dynamic environments. Supports the idea that embodiment is fundamentally about continuous coupling with a world.
“Embodied AI: From LLMs to World Models” (2025) Argues for combining language models with world models. Highlights the gap between symbolic reasoning and physically grounded interaction.

8.5 Cross-Embodiment Learning

Open X-Embodiment / RT-X (2024–2025) Large-scale dataset and policy work across many robot types. Introduces the idea that identity or behavior can generalize across different bodies.

8.6 Social and Mental Modeling

Liu et al., 2026 — “Modeling the Mental World for Embodied AI” Extends embodiment into social domains, including human interaction and mental-state modeling. Suggests embodiment is not only physical but also socially situated.

8.7 Practical Systems and Deployment

“Embodied Foundation Models at the Edge” (2026) Focuses on real-world deployment constraints such as latency, memory, and power. Frames embodiment as a systems problem, not just a modeling problem.

8.8 Summary of Position

Across these works, a consistent pattern emerges:

Embodiment is widely treated as coupling between an agent and a world
Most literature assumes perception–action loops
Many works implicitly or explicitly assume goal-directed behavior

However, newer work:

expands embodiment beyond robotics
incorporates virtual and social environments

This document builds on that trajectory, while proposing a stricter separation:

Embodiment → existence and coupling within a world
Agency → internally driven action over time

9. Sprited’s Position and Differentiation

The current embodied AI landscape is dominated by large players (e.g., major labs and companies), and in practice almost all of them are attempting to solve some form of embodiment. Despite this shared goal, their approaches tend to cluster into a few common directions:

Robotics-first — physical embodiment, manipulation, and real-world tasks
Foundation-model-first — language, reasoning, and multimodal intelligence
World-model / simulation-first — environments, physics, and planning
Generalist integration — attempts to unify all of the above

Sprited does not directly compete on these axes.

9.0 Competitive Landscape and Gap

Across current work:

Most efforts target capability (task success, generalization, realism)
Embodiment is typically pursued via robotics, high-fidelity avatars, or complex simulations
Virtual agents exist, but are usually interfaces (chat/voice) rather than persistent entities in a world

At the same time, in adjacent spaces:

Games have persistent worlds and readable dynamics (often pixel-based)
AI systems have increasing intelligence and autonomy

But these two rarely meet.

There is a gap between intelligent agents and interpretable worlds

Sprited operates in this gap.

Instead, it focuses on a different question:

What is the smallest, most compelling form of a digital being that people can perceive as alive?

9.1 Niche as Strategy: Pixel vs Sprite

A key decision is whether to anchor on pixel art specifically or on 2D sprites more broadly.

Pixel art is a subset of 2D sprites, with additional constraints:

grid-aligned
low resolution
discrete representation

Whereas 2D sprites more generally allow:

higher resolution
smoother animation
less strict constraints

Pixel Art (Pros and Cons)

Pros:

Strong interpretability (state is visible at cell level)
Clear constraints → easier reasoning about world dynamics
Distinct aesthetic identity
Forces simplicity (good for experimentation)

Cons:

Harder to get motion "right" (1–2 px errors are obvious)
Perceived as niche or retro
Easier to fall into "toy-like" territory

2D Sprites (Pros and Cons)

Pros:

More flexible visually
Easier to achieve appealing animation
Broader audience acceptance

Cons:

Less interpretable
Easier to hide incoherent behavior behind visuals
Higher complexity → slower iteration

Recommended Framing

Sprited should not position itself as:

a "pixel art company"

Instead:

a digital being system operating in constrained, interpretable 2D worlds

Pixel art is the initial medium, not the identity.

Strategic Guidance

Use pixel art for V1 to maximize clarity, constraint, and iteration speed
Keep the system architecture sprite-agnostic
Allow evolution toward richer 2D representations if needed

9.2 Why This Matters

Most embodied AI work assumes:

high-dimensional sensory input
continuous control
complex physics

This leads to:

slow iteration
hard-to-interpret behavior
weak user connection

Sprited instead optimizes for:

fast iteration
visible behavior
emergent simplicity

9.3 Differentiation

Sprited’s differentiation can be summarized as:

Embodiment-first digital beings in a constrained, interpretable world

More concretely:

Not robotics (no hardware dependency)
Not pure LLM agents (not just text or tools)
Not purely simulation (focus on agents, not just worlds)

Instead:

A system where beings, world, and interaction co-evolve in a visible, minimal medium

9.4 Product Implication

This leads to a very different product direction:

A persistent character (Pixel)
A living 2D world (Machi)
Continuous behavior loop
Human-observable emergence

9.4.1 Generative Canvas vs Rigged Animation

Most current approaches to digital characters rely on:

predefined rigs
animation graphs
discrete "skills" (walk, jump, talk, emote)

In this paradigm, the agent selects from a fixed set of actions.

Sprited explores a different direction:

The agent expresses itself directly through a constrained generative canvas

Instead of:

selecting pre-authored animations

The system:

generates a continuous stream of frames (e.g., within a 64×64 space)
uses the canvas itself as the medium of expression

This enables:

continuous motion rather than discrete states
exaggerated, stylized behavior (e.g., meme-like expressions)
non-physical actions that are not constrained by realistic rigs

Tradeoffs

This approach is significantly more difficult than rig-based systems:

harder to control
harder to train
fewer established techniques

However, it avoids direct competition with large players, who are heavily invested in:

realistic avatars
rigging pipelines
animation systems

Strategic Implication

Rather than competing on better animation systems, Sprited explores a different representation of behavior entirely

This aligns with the broader strategy:

avoid saturated problem spaces
explore underdeveloped representations
optimize for expressiveness and perceived aliveness, not physical accuracy

The goal is not to maximize capability.

It is to maximize:

perceived aliveness
coherence of behavior
emotional attachment

9.5 Strategic Bet

Sprited’s core bet is:

You do not need maximum intelligence to create a digital being — you need the right form of embodiment

Pixel art becomes the medium where this can be explored rapidly and convincingly.

9.6 Why Pixel Art Is Underserved (and Hard)

Pixel art may appear simple, but in practice it presents unique challenges. It is important to distinguish between true pixel art and pixel-art-like visuals:

Pixel-art-like systems (scaled sprites, filtered images, or loose grids) are relatively easy to produce
True pixel art requires strict adherence to discrete structure and coherence at the pixel level

The difficulty lies in the latter.

Key challenges:

Discrete constraints — behavior must align exactly with a grid; there is no interpolation safety net
High perceptual sensitivity — small errors (1–2 pixels) are immediately visible and break coherence
No visual hiding — unlike higher-resolution sprites, artifacts cannot be smoothed or masked
Limited tooling — far fewer standardized pipelines compared to 3D rigging, animation, and physics engines
Local minima in product design — many implementations feel "pixel-like" but fail to achieve true coherence, leading to toy-like results

These factors make true pixel art significantly harder than it appears, even though simplified or approximate versions are easy to generate.

This gap between "pixel-like" and "pixel-correct" systems is one reason the space remains underserved.

9.7 Why Large Players Avoid This Space

Large companies (e.g., major labs and platforms) tend to prioritize:

general-purpose capabilities
scalable benchmarks
high-impact, widely applicable problems

Pixel-art-based embodied systems:

are niche in audience
do not map cleanly to standard benchmarks
do not directly advance general AGI capabilities

As a result, they are unlikely to be a primary focus for these organizations.

9.8 Implication for Sprited

This creates a narrow but meaningful opportunity:

A focused team can explore this space deeply
Iteration cycles can be faster due to constrained environments
Competition from large players is less immediate

However, this comes with real risk:

The market is smaller
Productization is non-trivial
Success depends on achieving strong user resonance, not just technical progress

9.9 Strategic Framing

A grounded framing is:

This is a difficult niche that large players are unlikely to prioritize, but also one that is hard to execute well

If successful, the payoff is not immediate scale, but:

a defensible product identity
a unique interaction paradigm
a foothold in a space where "aliveness" can be explored more effectively than in high-complexity systems

Closing Thought

Embodiment is not about realism or complexity.

It is about this shift:

From isolated computation → to situated existence

And that shift is what enables:

presence
interaction
and eventually, the perception of life

Building Blocks of Digital Being

Sprited Dev — Thu, 26 Mar 2026 17:04:10 GMT

Statement

We've been too deep in the trenches to figure out the details on how to generate perfect pixel art animations.

We realize that this is not the main goal of Sprited.

Sprited's vision is to build autonomous digital beings.

Generating perfect pixel arts seems to be a local optima and we have been circling around for a long time. We should not investigate on this further.

Criteria for a Digital Being

Our vision for a digital being is that they are "embodied beings" that live in an "environment."

So, there are few aspects to this:

Embodiment: Digital Beings are embodied. They have flesh (virtual). They can move around and do stuff.
Enviroment: Digital Beings "live" in an environment (virtual). Digital beings can change the environment.

SpriteDX in a way is a project that was there to kinda prove that we can do something but it is not the main goal. We need to get past it and work on the embodiment from ground up.

Components

BODY SYSTEM
ACTION SYSTEM
SPEECH SYSTEM
MEMORY SYSTEM
ENVIRONMENT SYSTEM

How to not get side tracked

We need a new repo, and start on a real digital being.

It is not "sprite gen," it is not "environment gen." We need something that starts from empty ground and create this entity, and give it a body I guess.

(PAUSE)

SpriteDX - Prop Gen 2 - Grid Fitting

Sprited Dev — Wed, 25 Mar 2026 21:12:33 GMT

Grid Fitting Results

Nearest Neighbor (17pixel grid size)

NN 17px - x4

Bilinear 1/17 -x4

My trial at fitting the grid lattice. It's bad.

I thought NN and Bilinear would be a disaster but much better than other tries I had.

Let's start with the NN and try to recover Pixel art using ACM v2 model.

ACM-v2 is not good. It basically applies some level of dithering and some level of sharpening which we don't really want.

The smoother version is much better IMO (bilinear):

We do however need some way to clean up those silhouettes. That can be done easily using Magic Wand tools in Photoshop.

Perhaps we can run birefnet first on the source image then run the bilinear.

Still not exactly sure how to go about this. I will need to work with non-pixel arts and still be able to produce pixel arts and we seem to be trying to optimize on single image.

I will look into few other approaches.

-- Sprited Dev 🐛

SpriteDX - Prop Generation using Nano Banana

Sprited Dev — Mon, 23 Mar 2026 16:11:27 GMT

Right now, SpriteDX is more of a gimmick. It can generate stylized characters and animated them and split it into animation states such that it can be used inside projects like video games. However, right now, that's about it. In order for a large project, we don't just need humanoid characters, but also need expansive list of stuff.

We need to add things like prop generation, and ability to easily add in new states. The non-landing page design allows you to do this to certain extent but it is tucked away so much and not really invites users to update them.

The latter usability issue is not something I can work on today. But, I can start doing initial research on prop generation.

There is also a big marketing problem where our only entrypoint to the product is one old reddit post. We should address that later.

What is Prop Generation?

SpriteDX (aka Sprited-X) will help creator's to build their world. It should allow users to generate assets that go beyond humanoid characters.

We want users to be able to generate anything.

Crates
Bow
Diamond
Ore
Etc.

So, yeah, it really lends it self to "X" in SpriteDX where we can generate ANY physically sound 2D sprite assets.

Early Results?

The most promising of all came from NanaBanana Pro when we worked on Machi.

These assets were mostly generated using NanoBanana Pro node in ComfyUI.

Then, hand edited in Aesprite.

Result is satisfying pixel arts.

You can check the https://blog.sprited.app/assets-for-machi-prototype for the prompts used.

The key is to provide character arts (or world background and ask NB (Nano Banana) to create assets that go well with the character or the context.

Why not Flux.1 Fill? Fill-in-the-blank apporach using the Flux.1 Fill models unfortunately fail and is quite tricky to control it such that it creates various assets. It is rather dumb compared to NB and produces most probable image results and often ignores prompts. Hard to steer.

What's the learning?

Learning is that we should use Nano Banana models. For now, let's use Nano Banana Pro since it is readily available as ComfyUI partner API nodes.

Can you create a vehicle?

Let's test it out.

add 10 stone age style cars and bikes that fits into current pixel art style

Result

Does it preserve the style? Yes

Does it preserve the scale? Somewhat.

Let's try it out with a different data.

Add 12 vehicles while preserving the style and scale

top row: futuristic pixel art vehicles

middle row: modern day pixel art vehicles

bottom row: war machines like mechs

Result:

Analysis: Yup, this could work. The designs and styles are rather bland but I can work with this.

F…. Gotta go for something.

But basically though, we believe that it is possible to support prop generation by adding a templating mechanism and extraction mechanism. Wish me luck.

-- Sprited Dev 🐛

SpriteDX - This Week's Plan - Return To Normal

Sprited Dev — Mon, 23 Mar 2026 15:03:45 GMT

After the Shenzhen trip, I lost focus and got distracted, and my progress has been halted for about three days (almost two weeks including the travel period).

I was working on Machi’s character growth, but I think recovering momentum is more important than continuing that right now.

It might be better to shift focus back to updating SpriteDX. For example, since character and pet generation are working, it could be a good time to explore prop generation next. Alternatively, I could revisit the design.

The idea is that we would attack it from the stand point of identifying issues and resolving them.

Right now, SpriteDX has several problems in various dimensions--marketing, usability and quality.

Quality: Pixelization often produces broken pixel arts with non-uniform outlines.
Marketing: The only real entry point is a few month old reddit post.
Usability: There is no easy way to import it from Unity or Godot. I don't even use for my own Machi project. There is also only Character generation and no Prop generation. The action states are very limited.

Goals:

(Bonus) Work on Prop Generation (physical prop gen).
One Marketing Exposure
Fix Pixelization Pipeline (low quality)

That's it. No Machi, character growth stuff this week.

-- Sprited Dev 🐛

Trip to Shenzhen

Sprited Dev — Wed, 18 Mar 2026 16:54:23 GMT

Had a whole week trip to Shenzhen, China. It was rather an eye-opening experience to say the least. It was a place where there were mostly nothing when I was born, but now a mega city like the one I have not seen.

My expectation was a city packed with tech in a small high density area--a newly built tech industry town where I can see the whole thing from a vantage point and say "ok, this is Shenzhen." And I couldn't have been farther from the truth. The city was too big for it to be seen from one vantage point. From central place from 100th floor, I can see most of the sky-scrapers but not all because fog would cover some of the farther ones.

The city has around double the amount of population than in Seoul. Wikipedia says 17.5M but infrastructure in the city can probably host much more. Seoul is plateauing at around 10M and it is actually dwindling from it. Shenzhen's population is steadily growing and in 10 years of time, I think more and more people will move there. It is really not a "tech" scene, it is more of the largest cities in the world situation.

Tokyo has 41M population, so in a way SZ is behind but SZ's population is growing and Tokyo's population is not growing. And I would argue that SZ has more potential future residents than Tokyo has. At this rate, I wouldn't be surprised if SZ becomes bigger overall mega city than Tokyo.

Travel destination wise, SZ is quite largely lacking though. Uber does not work, and people don't speak english too much. You can use WeChat for payments (by linking your US credit cards) and call taxi using DiDi. However, you can't use Google or Google Maps. No ChatGPT, or your favorite US based AI services. So, once you are there, you are kind of a baby requiring an assistant from english speaking friends.

It was a first… It was a first time I felt so incapable of navigating around. On third day, I tried to order from Luckin Coffee place but I couldn't pay and I had to cancel my order. WeChat interface requires some slight learning curve and you may face difficulties if you try to use it at the spur of the moment.

Taxis don't speak English, so you will probably have trouble riding one without issues. You can get to places but without proper communication with the driver, you would likely be on an edge constantly.

Luckily, I was with my college friends and they did everything for me, so I had a blast, but it would be interesting if I traveled here alone. The place is huge and the streets are very wide. Things are very spread out. So, it is not easy like Hong Kong where you can easily travel by foot.

I did not ride on subway other than when I went to Guangzhou so not much I can say about the subway system but since it is all automated you can probably get around.

This is a city that is of half size of Tokyo and interestingly I did not see much foreign tourists. I think it is that it is quite far from US and it is that it is hard for foreign nationals to travel around.

Arts and culture districts like OCT Lofts were amazing too. There were so many boutique shops and the place was brimming with culture and energy.

Prices on goods and services were about 1/3-1/4 of what you would find in USA. Lots of good shopping opportunities.

It was very rewarding trip, and honestly I think I only covered a very tiny fraction of what is there in Shenzhen. I wish to travel there again. Perhaps after learning basic Mandarin for me to navigate around.

Sprited Dev 🐛

Machi - Roly Poly 1

Sprited Dev — Tue, 10 Mar 2026 17:08:17 GMT

So far, we have ideated on how we would grow from a single cell to something we will call digital being.

Single Cell --------------------------------------> Digital Being

Basic idea that we have developed so far:

Place a single cell in a constrained 2D texture WxHxC.
Then, have the single cell organism grow.
texture edges will contain sensors and motors.
inner textures usually contain a MINI brain of sort.

Artificial Being as a Shader Simulation Model

Initial texture ideation was basically 48x48 RGBA texture where we place the single EGG cell at the center, and have it grow out the limbs.

                   [HEAD]
                     ^
  [LEFT ARM] <-   [SPINE]   -> [RIGHT ARM]
                     |
                  [PELVIS]
                /          \
        [LEFT LEG]        [RIGHT LEG]

This is pretty basic. We start out as a EGG cell which turns to SPINE cell. Then it will add more components like NECK, HEAD, and other limbs.

We had 2 versions of this design where we would model things in pixel dimensions (above) or to pack vectors inside texture map.

Vector design was considered a better approach for an organism that can move with skeletal movements.

Artificial Brain in Shader Model

The next ideation was to model human brain.

Instead of focusing on the LIMBs, this design doubles down on BRAIN. BRAIN covers 95% of the map, and sensory and motors are like OUTLETs.

The main problem with this design is that while laying out things this way gives a good insight into how human-like brain would be designed from ground up, there is no promise whatsoever about how we would make this layout work.

And this design focuses mostly on the INVISIBLE, which is a dangerous area to go into. We can't reasonable something we don't know too much.

Analysis

Artificial Brain idea is Phase 100 idea. At this time, we need a fast iteration loop that works and that can be steered. We do not need a cruise ship, we need a boat we can steer at ease.

Problem Statement

So, we define the problem in the following way:

Can we, using only local rules, generate a living entity from single cell?

Let's define "living entity" now.

can emerge from local rules

can run in parallel shader execution

does not require centralized control

is observable in simulation

Questions

Can it think on its own? At this time, the focus is not on the intelligence but rather we are focused on giving these AI agents a body. A body that is not just visual, but semantically make sense.

Why does starting from single cell matter? It gives us a constraint that forces us to use local rules and progressive growth. This allows us to not cut corners and just have a rigged character that can run animation sequence.

Motor planning? It sounds like, what you are building is essentially a robot. How would you make them do what you want them to do? For example, let's say an agent needs to run away from something, how would the agent plan this movement? We will have a ASCII representation of the world. LLM agent will be given this representation and decide on a next favorable state. The next favorable state. Then emit high-level motor tokens to accomplish that. Feedbacks from motor units will also come back allowing for steering.

Memory System? The first version won't have any sort of memory.

How would agent learn, for example, to walk? The idea is that if they can crawl, they can walk. And before they can crawl, they have to learn to roll over. The belief is that we can train a small specialized models for specific motor functions by simulating physical interactions in the Machi world.

Biggest Hurdle

The biggest hurdle is that we are trying to model human being in this early stage. I don't think we should do that. We can start with smaller organisms.

How about we model roly-poly?

When you think of Roly-Poly, they seem so simple. They are even cute despite looking like an insect. They also have very simple and interesting mechanics for survival. It also goes well with tree simulation because they can live in humid soils below it.

Segmented body with repeatable units
Instead of complex joints, it has many legs
In defense mode, it rolls into a ball and may really roll!
They move mostly at constant speed, easy to model.
They can interact with leaf litter, fungus, soil moisture, etc.
It does not even need vision.
Their body design is modular with each component having simplicity.

Devils Advocate

Why not ants or bees?

They are social super organisms. We are working on embodiment, and not yet ready to model anything around the colony simulation.

Roly-Poly is actually a very strategic choice because we can focus only on simulating a very simple body with limited motor movements.

There is a reason why we instinctively think of them to be cute.

Alternative?

Pixel suggests that we could model "worms" since it only has 302 neurons and even has completely mapped nervous system. But I think roly-poly wins the favorability contest hands down.

What's after Roly-Poly?

The real goal of this first organism is to prove four things:

Growth works
Segments can form
Locomotion emerges
Sensors -> motor loop works.

Once we prove this, we will be in much better position than where we are right now.

Anatomy of Roly-Poly

7 pair of legs (equal size and form)
2 pair of antennae (one is much longer than the other)
Vision is minimal
Feed on moist
Decaying plant matter that they chew with their small mouthparts
very useful in recycling nutrients by shredding dead plant material to decompose.
Eat at night. During the day, under cover.
tail like UROPODs (can drink water from it)
in some species, females can reproduce parthenogenetically.
two broods of eggs produced annually.
molts for within 24 hr.

Modelling Roly-Poly

From the look at feel standpoint, let's make it a tiny being.

Flat:  XXXX
Rolled: XX
        XX

In world of Machi, each pixel is around 4cm, so this will make this roly-poly 16cm. So yeah, quite big.

Alternatively, we can:

Flat:   XX
Rolled: X

This small size may actually be better, but let's think about this further.

In my mind though, we would want to model it big enough to actually "see" the roly-ploy rolling itself up. In that sense, I think it may be better to make AI agents in machi a tiny beings compared to the surroundings.

Sprited Dev 🐛

BRAIN DUMP: CREATURE SIM v0.9 - From Pixel Growth to Vector-Based Development

Sprited Dev — Sun, 08 Mar 2026 02:35:26 GMT

In the tree simulation project, growth emerged from local pixel rules. Each pixel looked at its neighbors and decided whether to grow, thicken, or produce leaves. This worked well because trees naturally grow through branching structures that expand into empty space.

However, when applying the same approach to animal-like organisms, a problem appears.

Animals are not just collections of pixels expanding outward. They have structured body plans: spines, limbs, organs, and symmetry. These structures must develop gradually and proportionally over time.

This realization led to a new direction for the creature simulation.

The Problem: Spine Locking

In the current prototype, the creature grows its entire spine first.

This introduces a structural issue.

Even when the creature is supposed to represent an infant stage, the spine already exists in its final adult position. The rest of the body simply fills in around it.

This locks the organism into an adult layout too early in development.

As a result:

Infant proportions cannot differ from adult proportions
Limb placement is fixed too early
Development becomes a visual scaling problem rather than a biological growth problem

To solve this, we need a different internal representation of the organism.

Two Possible Solutions

Approach 1 — Rendering Scale

One solution is purely visual.

We could render the creature smaller when it is young and scale it up as it grows.

In this case, the organism structure never changes. Only the rendering scale changes.

While simple, this approach does not actually model development. It only simulates growth visually.

Approach 2 — Vector-Based Body Representation

A more interesting approach is to change how the organism itself is represented.

Instead of storing the organism as a pixel map, we represent it as a vector-based structure encoded inside a texture.

In this model, the organism becomes a compact set of control points or joints, similar to a scene graph.

Packing an Organism into a Texture

Imagine we have a small texture.

For example:

4 × 4 texture
RGBA channels

Flattening this gives us 16 slots.

Each slot can store structured data describing part of the organism.

Example encoding:

i = 0  → dx, dy, dz of first spine segment relative to head
i = 1  → dx, dy, dz of second spine relative to first spine
i = 2  → third spine segment
...
i = 9  → left shoulder socket
i = 10 → right shoulder socket
...
i = n  → hip sockets

The head becomes the origin point, and all other body structures are defined relative to it.

Instead of drawing pixels directly, the shader interprets these packed values to reconstruct the body structure.

What This Changes

With this representation:

The organism is no longer a rigid pixel skeleton.
The body becomes a parametric structure.
Development can modify structural relationships over time.

Examples:

Infant spine spacing can differ from adult spacing
Limb attachment points can shift during development
Body proportions can change as the organism matures

This creates true developmental growth, not just visual scaling.

Simulation Inside the Shader

The simulation still happens in parallel on the GPU.

Each texel represents a node in the organism structure.

Like the tree simulation, the update process works in discrete frames:

state(t) → state(t+1)

Each node:

Reads its own state from the previous frame
Reads nearby nodes from the texture
Applies its local rules
Writes its updated state

This allows the entire organism to update fully in parallel.

Parallel Node Execution

Each node behaves like a cellular automaton unit, but instead of representing a pixel in space, it represents a structural component of the organism.

Nodes may represent:

spine segments
limb joints
sockets
organs
internal systems

Each node runs its own logic independently.

node_i(t+1) = F(node_i(t), neighbors_i(t))

Where neighbors are nearby structural nodes.

To keep the system efficient, we can restrict node lookups to a small neighborhood in the texture, such as adjacent indices.

This preserves the key property of the previous simulation:

massive parallelism with no write conflicts.

Beyond Skeletons

Although the structure resembles a skeleton, the goal is broader.

The same texture-packing strategy can encode additional biological systems.

Examples include:

blood circulation
lymphatic transport
respiratory system
digestive system
neural signals

Each system can occupy additional channels or slots in the texture, similar to how the tree simulation used multiple layers of environmental data.

The organism becomes a multi-system simulation encoded in compact GPU memory.

Why This Matters

The shift from pixel maps to vector-encoded organisms enables something important.

It allows organisms to develop, rather than simply appear.

Growth becomes a transformation of structure rather than a scaling of graphics.

This makes it possible to simulate:

body plan formation
proportional growth
internal biological systems
articulated movement

All while keeping the simulation fully parallel and GPU-friendly.

The Goal

The goal is not merely to animate creatures.

The goal is to create a developmental substrate where organisms can grow according to structural rules.

Just as the tree simulation created believable plant growth through local interactions, this system aims to create animal development through structural emergence.

This approach preserves the core philosophy:

complex life emerging from simple rules

But it moves the simulation from pixel space to structural vector space.

Branch - Sim - Leaves (and CHICK idea)

Sprited Dev — Thu, 05 Mar 2026 20:47:42 GMT

After 45th iteration, we have stable FUN growth of trees.

It still lacks staggered budding, no thickness rendering, but at least we have something that's working and gives something visually pleasing.

grows towards light direction.
leaf points die off when thickness increases.
stable stochastic rendering of leaves.
stable growth with ATP and NU burns.
ATP sourced from sunlight.
NU sourced from DIRT.

This is a big milestone for MACHI since we've gone from mere IDEA to something that we can SEE.

WHAT'S NEXT

We can continue on improvements (dormant bud points, thickness increasing, diversification, etc) here, however, I think this is a place to PAUSE since we have something SEMANTICally RICH that is also VISUALy RICH.

It bridges the gap between the world that ai agent will see and feel.
and the world that humans will see and feel.

What's next though?

One interesting ANGLE is that we could repeat this exercise of modeling real world stuff in 2D pixel grid using textures as a source.

Trees are mostly stable and "GROUNDED". So, they are innately easy to model it in stable pixel grid.

But, let's try to PROJECT and apply this "modeling algorithm" into digital beings.

Staring with real humanoids are going to be disastrous I presume.

What if… we modeled say……………………. a CHICK.

a chick is like a tiny pixel being.
They start as an egg, and they hatch under certain conditions are met.
and move around.
eat grain to grow
then they become chickens.
then they age and may die.

Yeah, I think that could be modeled in pixel grid given that the chicks can only move limited amount of distances.

Would it be a good idea to model it in pixel grid?

I don't necessarily think so. Even if we model it in pixel grid, I think we should have a STABLE PLANE for the simulation of growing it. Instead of using WOLRD plane, we would use ENTITY PLANE where the being is center of the universe. the coordinates are WRT the being.

How does this CHICK animate?

I think, when we model the character, we will model it just like the TREE.

where we do something like:

                [HEA]
                [NEC]
      [ARM][ARM][BRE][ARM][ARM]
                [TOR]
           [LEG][ASS][LEG]
           [LEG]     [LEG]
           [FOO]     [FOO]

Not up to scale or anything but you get the idea.

Then rendering will just add some thickness around these pixels.

Animation will be reduced to animating a STICK figure in pixel art.

Honestly not sure if this is a good idea but at least it is a start for thinking of translating a "visual thing" into a "semantic thing."

Overall center of GRAVITY is going to be around how to CONNECT THE DOTs between SpriteDX and MACHI. They live in two different planes, but how do we bridge this gap between VISUAL WORLD and SEMANTIC WORLD. That's going to be the main PULL for my next task.

That's it for today. Gotta get back to my course work.

-- Sprited Dev 🐛

SpriteDX - Community Assets

Sprited Dev — Sat, 28 Feb 2026 04:10:53 GMT

We are adding additional features into SpriteDX to allow people to share their work under creative commons.

Not for prime time yet but here is the proof of concept.

People can browse the generations from others and download and use them.

-- Sprited Dev 🐛