Pixel Organism — From Cell to Being

This week so far:
Expressive Canvas — a 64×64 dynamic surface that continuously renders a digital being’s state in real time. It can serve as a visual communication channel: agents express themselves to each other and to humans through imagery, not just text. A shared visual layer for thoughts and emotions.
Approaches — We explored multiple directions for virtual embodied agents: (1) traditional 3D characters with motion controllers, and (2) the Expressive Canvas as a generative, state-driven alternative.
Strategy — While promising, the Expressive Canvas remains a rendering layer, not a complete solution. Its cos (build, train, run) is high, and its immediate value is unclear. It is differentiated, but not yet justified. We decided to refocus on pixel organism growth simulation — modeling the full lifecycle of a digital organism, not just its outward expression.
Today — I want to unpack this direction and decide where to go next.
Continuing the Expressive Canvas idea: So far, we have framed the Expressive Canvas primarily as a human-computer interaction (HCI) layer, effectively reducing it to a rendering concern.
After jotting down the summary above, we realized we hadn't fully explored the agent-to-agent dimension of this idea.
If the Expressive Canvas is treated only as an HCI layer, its value is naturally limited. But if it becomes a communication medium between agents, the framing changes. Instead of rendering for humans, it becomes a shared visual language-one where agents exchange state, intent, and emotion through a sequences of visually coherent frames, rather than discrete text.
Related Works: At present, agent-to-agent communication is predominantly token-based, with text as the primary encoding.
Closest project to using visual language transfer is Google Gemini Embedding. Google Gemini Embedding 2 is Google's first natively multimodal embedding model. It maps text, images, audio, video, and documents into a single shared vector space. Everything becomes a vector in one unified semantic space.
Industry direction is that the visual information is interleaved with the traditional text tokens in the same embedding space. This one embedding space idea is not new but it does provide very convenient setup and ability to use traditional LLM level interoperability.
Expressive Canvas can also be part of the same embedding and be discretized. We would no longer have continuous stream of information. In that way, may be we can help agents communicate this way.
Given that expressive canvas idea can help on agent-to-agent communication, it can help agents to communicate visuals directly. For example, if an agent wants the other agent to find specific character, she can show her a montage of that character then find them.
Recommendation: Put this on hold. We should focus on fundamentals before taking on the communication problem.
Pixel Organism Growth — We return to the idea of growing a character from a single cell. The premise is to model a full lifecycle—birth, growth, aging, and death—entirely within a 2D fragment shader. This approach does not rely heavily on advanced AI, but instead on local-rule cellular automata. The system is constrained to 2D, with the goal of producing a continuous living, expressive agent that can move and act within its environment.
So far, we have formulated the problem in two ways: (1) a pixel-based model, where each pixel represents a macro cell, and (2) a vector-packing approach where 3D vectors are encoded into a 2D texture.
We began with the pixel-based approach, leveraging our prior tree-growth work. However, it quickly breaks down under structural requirements: elongation requires coordinated updates across dependent regions (e.g. hands, feet), which is difficult to express in a purely local system. Although an elongation mechanism may be possible in a fragment shader, it is currently speculative.
Animation further complicates the model. One possible direction is to assign semantic labels to cells (e.g. limbs), enabling a derived structure that can be animated via rigging or generative techniques.
Vector-based option is more traditional and easy to think about. Instead of simulating behaviors we only simulate the joints. Imagine making a infant character using only the cylinders and then growing each cylinder's height and radius as organism grows.
Animation of vector-based being will also be relatively straight forward, we animate the joints. We can also use text-to-motion methods like Nvidia Kimodo.
Hybrid Approach: A hybrid-approach is also possible in that, we can divide the character in multiple parts--head, torso, arm, and so on. The vector based approach determines the dimensions of the parts. Then each parts are individually small pixel grid with a pixel-level growth simulation.
For example, for a left arm, the length and girth will be simulated by the vector-based method and the actual silhouette of the arm will be simulated by fragment-shader method.
This also helps with dividing the problem into smaller problems. That when generating a face (lots of high impact parts like eye, lips), we are able to scope the problem to just the face rather than the whole body which would be extremely difficult.
Closing: Discussion points to using hybrid approach where we have both pixel-based and vector-based method. I have to go now, but we will continue from here next.
— Sprited Dev 🐛



