Skip to main content

Command Palette

Search for a command to run...

Machi - Lab Note 1

Updated
3 min read
Machi - Lab Note 1

Previously generated scenes (link 1, link2) look aesthetically beautiful but semantically challenging.

Let’s hand annotate some of these examples.

A and D are flat scenes.

B and C are perspective scenes.

Machi’s purpose is understandable 2-2.5D environment for AI agents. So let’s focus only on the flat scenes.

Left B as is because primary axis of movement is fully horizontal giving it a 2.5D vibe.

Let’s define our goals:

- [ ] Agent must be able to navigate
- [ ] Agent must be able to edit the scene
- [ ] Agent must be able to enter the secne
- [ ] Agent must be able to exit the scene

Edit-ability constraint make the search space much more constrained. It’s a good thing. We like smaller search spaces.

Given this requirement we question:

Is it feasible to support editable 2.5D environment?

2.5D space is defined as a 3D grid where Z depth is fixed to a very small integer (like 3-8 cells deep). Let’s try to project 2.5D spaces to above examples.

2.5D allows for limited amount of exploration depth. For example in C:

  • background will be simulated separately.

  • the train will be farthest away.

  • interior of the building next (cross section)

  • the building will be next.

  • then stair case.

  • then the railing that serves as occlusion layer.

  • then the pedestrian block.

  • then the road. where cars pass by.

By separating the layers, we are able to make a non-zero depth world that is thin.

This allows for z-directional exploration where ai agents can stack stuff and design interior and exteriors.

2D world on the other hand is classic platformer like Super Mario or MapleStory. There is no z-space to explore. Let’s try out some examples.

2D does not necessarily make things easier. Supporting structures like stairs become recognizing the stairs.

Here is current proposal with what we know now:

  1. Given a scene, we try to convert it into 3D.

  2. Then try to squeeze the depth while keeping navigability in mind.

  3. Then build 3D voxel block world on top of it.

Exploration on 2D to 3D AI tools

Meshy 6

This is impressive and at the same time broken. It is amazing to see that it was able to generate such a detailed geometry just from simple image. At the same time quite limited because it generates lots of artifacts like wrong train location, layered stairs, etc.

Textured version is also impressive but perhaps has too many artifacts.

Granted this is one of the hardest scene (flat 2d scene with hundreds of complicated geometry), human intervention may be required.

Let’s try few more examples.

Similar results. Amazing but not yet production grade.

Let’s also experiment with Text-to-3D models.

rustic japanese trains crossing in suburb with beautiful green field

Not something I was expecting. But perhaps for vehicles, I could generate 3D versions first then make them in to 2D projections.

2D-to-3D is not super reliable.

Let’s figure out if there is away to do this another way. Let’s try layer separation technique.

https://huggingface.co/Qwen/Qwen-Image-Layered

This is amazing but not necessarily going to serve our purpose of being able to divide the image by depth.


Let’s pause here today.

— Sprited Dev 🐛

Machi

Part 9 of 37

Follow the development of Machi, a side-scrolling simulation world built to test AI agents, tile-based emergence, and the future of embodied intelligence. Coming Soon: https://machi.sprited.app

Up next

Machi: State="lost"

Okay, I’m very very lost at this point. Enough to make me nauseous when staring into the laptop. Machi project feels like learning Blender or Maya for the first time. There are so much knobs. So many options. Difficult to put a finger on it. There ...