Skip to main content

Command Palette

Search for a command to run...

SpriteDX - Stage 2 Prompt Entry UX

Published
6 min readView as Markdown
SpriteDX - Stage 2 Prompt Entry UX

In previous post, we created a prompt language format for Stage 2:

<shot num="1" id="greet" camera="fixed" zoom="1" loop="true"
  style="style: attributes;"
  alt="description of the shot"
  // … other attributes … 
>
  <character name="Character Name"
    state="character animation state identifier"
    src="character_animation_state_gif_file_name.gif"
    style="style: attributes;"
    alt="Description of what this object/character is doing"other attributes … 
  >
  … other entities if present … 
</shot>
… other shots if present …

Today, we discuss how to represent this to our users.

Going back to our principle, we need SpriteDX to just work tucking away the intricacies. To achieve that, it would be good to hide away the XML structure and formatting.

What about Pro users who want the full control? The idea is that the prompt structure is part of the pipeline definition. It is part of the recipe for stable generation. Altering this inside pipeline is not allowed. If Pro users want to customize this, they can obtain the recipe (pipeline definition and fork it from there on).


UI Options for XML Editer

What are some options?

Option 1 — Computed Property

We can do what we’ve been doing for the natural language prompt. We can receive array of object inputs then generate the XML structure from it using Computed Property.

This option is the base line option but rather clunky for pipeline developers. Is there somethign better we can do?

Option 2 — Plain XML Text Input

We can just present the users with the text input containing full XML string.

This is the most flexible and easiest to implement, but also provides users with conceptual overhead of looking ant < and >s. Limitation is that representing full XML is not going to be really look compelling in demos.

If we go with this option, we can also add code formatting and editor.

Option 3 — Generic XML Node Editor

XML is a basically an object with “tags.“ Above structure can be written as:

[{
  "tag": "shot",
  "attributes": [ … key value pairs … ], 
  "children": [ … nested stuff … ]
}, … ]

This means we can build a XML builder UI which uses mostly the same UI components as object editor. We would hide the complexities like tags, attributes and children but otherwise everyhing should be very similar to object editor. Then automatically generate the XML as output.

This is actually very similar to computed properties option. It just abstracts away extra layer of complexities of computed properties.


Decision

At this time, we are going with Option 1. This uses the existing infrastructure and Option 3 only really adds values for pipeline creators.

We do however now want to make the object input to be more flexible where users can enter in any key value pairs instead of the ones that are predefinted in pipelin input definitions.


Implementation

We will be using Updated Prompt 2 from before. If we rewrite that as PipelineInput:

        "shots": {
          "label": "Shots",
          "type": "array",
          "items": {
            "type": "object",
            "fields": {
              "id": { "type": "string", "label": "Shot ID" },
              "shotDescription": { "type": "string", "label": "Setup" },
              "loop": { "type": "boolean", "label": "Loop", "visibility": "advanced" },
              "camera": { "type": "string", "label": "Camera Mode", "visibility": "advanced", "defaultValue": "fixed" },
              "zoom": { "type": "number", "label": "Zoom", "defaultValue": 1 },
              "style": { "type": "string", "label": "Style", "defaultValue": "background: white;", "visibility": "advanced" },
              "tags": { "type": "string", "label": "Tags", "defaultValue": "pixelart spriteanim fullbody loopanim 角色帧动画", "visibility": "advanced" },
              "duration": { "type": "number", "label": "Duration (s)", "defaultValue": 1 },
              "character": { "type": "object", "label": "Character Details", "fields": {
                "alt": { "type": "string", "label": "Action" },
                "style": { "type": "string", "label": "Style", "defaultValue": "rendering: pixelart; shadow: none; effects: none; decoration: none; direction: front; wing: none;", "visibility": "advanced" },
                "direction": { "type": "string", "label": "Direction" }
              }}
            }
          },
          "defaultValue": [
            {
              "id": "greet",
              "shotDescription": "Pixel art 2D game sprite character for the game 'Machi'. Character waves or says 'hi.'",
              "loop": false,
              "camera": "fixed",
              "zoom": 1,
              "style": "background: white;",
              "tags": "pixelart spriteanim fullbody loopanim 角色帧动画",
              "duration": 1,
              "character": {
                "style": "rendering: pixelart; shadow: none; effects: none; decoration: none; direction: front; wing: none;",
                "alt": "pixel art character says HI.",
                "direction": "front"
              }
            },
            … 
          ],
          "enabled": "advanced"
        }

Some of these inputs are marked with "advanced": true so that it gets hidden from the default pipeline view.

Then, next, let’s define ComputedProperty.

        "animationPrompt": {
          "type": "string",
          "computed": {
            "sandbox": "quickjs",
            "with": {
              "characterName": "..generate.characterName",
              "shots": "shots"
            },
            "src": "pipelines/character/v1/animationPrompt.js",
            "script": "main(characterName, shots)"
          },
          "mapTo": "39.inputs.prompt"
        }

In here, we are providing src attribute with a link to a JavaScript file. I tried writing it into a single line but it was too long, so I decided to support loading of external javascript files. SpriteDX will first load and evaluate the src and then optionally run script if present.

// animationPrompt.js
function esc(text) {
  return String(text).replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;");
}

/**
 * Encode nodes into XML string.
 * Format:
 * <node
 *   attr="value"
 * >
 *   <child
 *     attr="value"
 *   />
 * </node>
 */
function encodeNodes(nodes, indent = 0) {
  // generic serializer that serializes nodes into XML
  const space = ' '.repeat(indent);
  return nodes.map(node => {
    const attrs = node.attrs.map(([k, v]) => `\n${space}  ${k}="${esc(v)}"`).join("");
    if (node.children && node.children.length > 0) {
      const children = encodeNodes(node.children, indent + 2);
      return `${space}<${node.tag}${attrs}\n${space}>\n${children}\n${space}</${node.tag}>`;
    } else {
      return `${space}<${node.tag}${attrs}\n${space}/>`;
    }
  }).join("\n");
}


/**
 * Maps shots array into XML prompt.
 *
 * @param {string} characterName
 * @param {Shot[]} shots
 */
function main(characterName, shots) {
  const filenamePrefix = characterName.slice(0, 8).toLowerCase();
  const classPrefix = characterName.slice(0, 3).toUpperCase();
  const nodes = shots.map((shot, i) => {
    const node = {
      tag: "shot",
      attrs: [
        ["num", i + 1],
        ["id", shot.id],
        ["camera", shot.camera],
        ["zoom", shot.zoom],
        ["duration", shot.duration + "s"],
        ["loop", shot.loop ? "true" : "false"],
        ["style", shot.style],
        ["alt", shot.shotDescription],
        ["tags", shot.tags]
      ],
      children: [
        {
          tag: "character",
          attrs: [
            ["class", `${classPrefix}_93Q`],
            ["name", characterName],
            ["state", shot.id],
            ["direction", shot.character.direction],
            ["src", `${filenamePrefix}-sprite-${shot.id}-loop.gif`],
            ["style", shot.character.style],
            ["alt", shot.character.alt]
          ]
        }
      ]
    };
    return node;
  });
  const xml = encodeNodes(nodes);
  return xml;
}

That’s it, now we have our pipeline running an external javascript inside a QuickJS sandbox and producing XML prompts that we desire.


Next Steps

Stage 2 Prompt Optimization is part of the effort to lower the error rate in the system. Having high success rate at Stage 2 is very important because errors tend to get compounded as we go downstream down the pipeline.

The next step is to implement Error Discovery and Retry. Even with better prompts, we still have around 70-80% success rate on Stage 2, and we need to implement (1) scoring mechanism for the stage 2 output, and (2) retry mechanism to retry on malformed output.

That said, my personal next step is to prepare for the midterm for 2 of my courses 😅. Wish me luck.

— Sprited Dev 🌱

SpriteDX

Part 1 of 50

Tracks development of sprite generator AI tool. https://spritedx.com