Image to scene and shot

Turn a sketch or reference image into a 3D scene and shot. Image-to-3D scene generation that matches the photograph's framing and contents.

Drop a photograph in. The system reads it, builds a 3D scene that approximates the contents, and creates a Compose-mode shot that matches the photograph's framing. A faster way to riff on a real reference than building from scratch.

Image to scene and shot in progress – a reference photograph dropped into the AI Composer, the scene assembling with auto-placed objects and a shot composed to match the photograph's framing

What it does

A common agency request: "Match this reference." Traditionally you'd open a blank 3D scene, drag in the rough props, place a camera, and try to match the framing of the reference photograph by eye. Image to scene and shot collapses that into one upload.

The system reads the input image for content and composition, instantiates a populator-style scene that approximates what's in the photograph, and creates a shot in Compose mode whose camera is positioned to roughly match the framing of the original. From there you tune – swap props, add hero assets, refine the camera, attach image references on the items that matter.

It is not a one-shot pixel-perfect recreation. It's a fast starting state.

Available on every plan, including Free.

How to use it

  1. In the AI Composer prompt bar, click the image button next to the text input. (Same composer you'd use to type "city block at sunset" – this time you upload the reference image instead of writing the prompt.)

  2. Upload a photograph. JPEG or PNG. The clearer the contents, the better the scene match.

  3. Wait for processing. The system runs vision analysis, picks library assets that match the recognized objects, places them in a populator-style layout, and configures a Compose-mode shot with the camera roughly matched to the reference's framing.

  4. Land in the workspace. You're dropped into Build mode with the new scene populated and a shot already created. Switch to Compose to confirm the camera framing.

  5. Refine. Swap props that didn't match, add the hero asset (which the system probably won't have guessed), attach an image reference to the hero so the rendered version honors the photograph's specific look.

When to reach for it

Three signals:

  • You have a reference photograph and you're trying to match its framing. Faster than eyeballing it from a blank scene.

  • You want a starting state that already approximates the brief. Tuning beats building.

  • You're working on multiple variations on a theme. Drop the source, fork the scene, modify each fork.

When not to reach for it:

  • You don't have a reference. Start from a blank project.

  • The reference is for style, not composition. A reference image of "the look I want" is a job for Image reference attached to a hero object, not a scene-construction input.

  • You need pixel-perfect recreation. This is a starting state; the result will approximate, not reproduce.

What the system can and can't infer

What it gets right most of the time:

  • Major scene contents (cars, buildings, vegetation, characters in obvious poses).

  • Rough camera angle and framing.

  • Time of day and broad lighting mood.

What it doesn't get:

  • Specific brands or models. A Bronco gets matched to "an SUV"; bring your own Smart Import for the actual product.

  • Specific characters. Faces aren't replicable from a single reference; use image reference on a placeholder character object.

  • Complex camera moves. The shot is static. To author motion, you build it in Compose mode.

Limits and known issues

  • Vision quality varies. Cluttered photographs produce noisy scenes. A clean reference photograph (one or two hero subjects, clear background) gets a much better starting state than a busy street scene.

  • Camera framing is approximate. Expect to nudge in Compose mode after.

Last updated

Was this helpful?