Machi

In previous lab note, we explored few of the approaches for learning 2.5D voxel map of the world.

Let’s explore 2D alternatives.

In 2D scenes, the task simplifies to I2I task where from a picture, create a 2D tile map (bitmap).

This is in a way an ill-posed problem because there is millions of different solutions in this space. This almost needs to be modeled as generative process than a regression or fitting.

Let’s start with an image:

Flux.2: Create Depth Map

Depth is not correct.

Flux.2: side scrolling platformer tile map

Flux.2: side scrolling platformer tile map with the vive in the given image

Flux.2: side scrolling platformer tile map with the pixel art forest style japanese platformer

This isn’t what I’m looking for but at least it is going towards the direction. I was concerned that the flat world no longer will look beautiful but now that I look at it, per haps we don’t need to be worried that much. In most of the designs, platforms are placed on top of of background such that large structures are “backgrounded.“ This perhas is a learning that the characters does not have to live in the background space. They can live one layet above where platforms determine the rules of the world. This does mean that there won’t be any way to “Edit“ the background for AI agents. AI agents will have to “live“ on top of static images. That may not be the worst design because it allows for visual freedom while allowing agents to place platforms wherever they deem appropriate.

Cityscape Example

In this reference image, it is rather difficult to create reasonable platforms.

Flux.2: place 2d game platforms in this background (dim the background)

Flux.2: Add 2d side scrolling game platforms with playable character

Donald? Why are you there?

Also tried out: https://app.artificialstudio.ai/tools/image-depth-map-generator

This depthmap is much better than the Flux2 one. We can use it to separate the layers depth wise.

The depth-wise layer decomposition can be used for various effects like parallax and what not.

But let’s focus on platform placements. We will make an assumption.

Every platform is solid.
Every platform floats on space.
Generated reference image is always part of background.
We generate the platform placements based on the reference image.

Tile Placement

Now, the problem reduces to creating a low resolution mask predicting the probability of a platform.

Since we are imagining the world to be editable, AI agents can “make ways.”

The reference image serves as a SEED for the environment, and it helps generating tile placements that make reasonable sense. In the future the reference image is also going to help guide the neural rendering, but our focus now is the placement of tiles.

Let’s see how human would do this.

Image Assisted Level Design

Let’s design an achievable goal.

We will create “platform tileset image generator”
Then we will create tile map editor in spritedx.com/tile-map-editor
1. Upload reference image
2. Human specify tile locations
3. We run CLIP scores on the tile location to figure out what would be the best tile to add there.
4. Human can force change the properties.
5. Human can place hazards.
6. User can place “start” location
Then user can click on “play test“ button to test the map.
V2 ideas are documented on how to extend the reference image progressively to make elongated level.
V3 ideas are documented on how to train model to predict tile placements.
V4 ideas are documented on how to train agents to navigate the world.
V5 ideas are documented on how to train agents to edit the world.
V6 idea on parallax BG animation and BG animation using video generation models.

Devil’s Advocate: Is all this the right way to go about it?

Wouldn’t mapping 3D nav mesh on 2D scene be more fruitful exercise?
That is, does ai agent have to be able to EDIT the world? That is, humans rarely dig or construct nowadays. Most of the human beings do their work by going to places, talking, making, and selling things.
Does good sustainable world simulation require editable world?

Pixel says