Machi

Previously generated scenes (link 1, link2) look aesthetically beautiful but semantically challenging.

Let’s hand annotate some of these examples.

A and D are flat scenes.

B and C are perspective scenes.

Machi’s purpose is understandable 2-2.5D environment for AI agents. So let’s focus only on the flat scenes.

Left B as is because primary axis of movement is fully horizontal giving it a 2.5D vibe.

Let’s define our goals:

- [ ] Agent must be able to navigate
- [ ] Agent must be able to edit the scene
- [ ] Agent must be able to enter the secne
- [ ] Agent must be able to exit the scene

Edit-ability constraint make the search space much more constrained. It’s a good thing. We like smaller search spaces.

Given this requirement we question:

Is it feasible to support editable 2.5D environment?

2.5D space is defined as a 3D grid where Z depth is fixed to a very small integer (like 3-8 cells deep). Let’s try to project 2.5D spaces to above examples.

2.5D allows for limited amount of exploration depth. For example in C:

background will be simulated separately.
the train will be farthest away.
interior of the building next (cross section)
the building will be next.
then stair case.
then the railing that serves as occlusion layer.
then the pedestrian block.
then the road. where cars pass by.

By separating the layers, we are able to make a non-zero depth world that is thin.

This allows for z-directional exploration where ai agents can stack stuff and design interior and exteriors.

2D world on the other hand is classic platformer like Super Mario or MapleStory. There is no z-space to explore. Let’s try out some examples.

2D does not necessarily make things easier. Supporting structures like stairs become recognizing the stairs.

Here is current proposal with what we know now:

Given a scene, we try to convert it into 3D.
Then try to squeeze the depth while keeping navigability in mind.
Then build 3D voxel block world on top of it.

Exploration on 2D to 3D AI tools

Meshy 6

This is impressive and at the same time broken. It is amazing to see that it was able to generate such a detailed geometry just from simple image. At the same time quite limited because it generates lots of artifacts like wrong train location, layered stairs, etc.

Textured version is also impressive but perhaps has too many artifacts.