Image reference

The single biggest lever for professional fidelity. Bind a reference image to a 3D object and the visualizer locks the rendered look across every shot – the same character, vehicle, or product, consis

The most important step a user takes for brand-true, professionally controlled output.

What is an image reference?

An image reference in Intangible is a photograph attached to a specific 3D object in your scene. The visualizer locks the rendered appearance of that object to the photograph at render time, so the rendered output is the actual thing in the picture rather than the model's generic interpretation. For agency work this is non-negotiable: a 2024 Bronco needs to render as a 2024 Bronco, a Lululemon storefront as a Lululemon storefront, an iPhone-and-not-a-generic-phone as the SKU you're advertising.

If you take one workflow away from these docs, take this one.

A grayboxed Adirondack chair in Build mode before any reference image is attached, sitting on a clean studio grid

Image Reference vs LoRA vs Custom Style

Three different ways to lock visual consistency in Intangible. They serve different jobs.

Image Reference
LoRA
Custom Style

Scope

One specific object

One object, model-level bias

Whole render, every shot

Controls

What this object looks like

A learned style or character bias

Project-wide brand treatment

Setup

Drop in a photo

Train or download .safetensors or .pt

Upload references in Visualize, save

Best for

Brand-true product, hero with photos

Illustrator style, internal aesthetic, characters without photos

Brand- or campaign-wide look

Cross-shot consistency

Yes, automatic

Yes, automatic

Yes, automatic

Plan requirement

All plans

All plans

All plans

Reach first when

Almost always start here

Can't capture the look in a single photo

The style is bigger than any one object

For most agency briefs, Image Reference is the first reach. The other two layer on top when the job needs them. See LoRAs and Custom styles for the deeper coverage.

Why this is the headline lever

A diffusion model handed a text description of a chair generates a chair. A diffusion model handed an image of the actual rattan-and-teak chair the agency is pitching renders that chair. The first is sometimes useful, the second is always.

Reference images are how you give the model the visual truth of a specific object on a per-object basis. Three production cases this unlocks:

  • Branded assets and products with custom geometry. Logos, store fronts, uniquely shaped products, deeptech hardware – anything the model can't be expected to know without a picture. The reference image is what closes the gap from "looks like the brief" to "is the brief".

  • Cross-shot character consistency. A hero with a reference image renders the same face, costume, and silhouette across every shot in the project. Without reference, the model treats each generation as independent and the character drifts.

  • CAD plus hero photo. Bring the client's actual CAD as the 3D mass via Smart Import. Attach a hero photograph as the image reference. The model gets both: structural truth from the mesh, surface truth from the photo.

Two ways to attach one

1. Through the Object Details panel (manual)

Select the object in the viewport. The Details panel docks on the left side of Build mode. Scroll to the Reference images section and click Upload or Add from Media Library. JPEG and PNG accepted. There's no hard cap, but around 10 images per object is the practical recommendation for multi-view references.

Object Details panel docked on the left after a reference image has been attached. The Reference images section shows the attached image's thumbnail beside the object's name and description fields

2. Through the AI Composer (conversational)

The fastest path. Type a sentence into the AI Composer prompt input at the bottom of Build mode:

Attach this as a reference image to the chair: <image URL>

The composer finds the matching object in the scene, attaches the reference, and the viewport updates immediately to show the textured / colored variant of that object.

AI Composer Chat History on the right showing the user prompt "Attach this as a reference image to the chair: <URL>" and the AI's response acknowledging the chair entity, with the chair still grayboxed
The same scene a moment later: the chair has updated from grayboxed to a green-painted Adirondack chair, matching the reference image. The AI Composer chat confirms "The reference image has been attached to the Adirondack Chair in your scene"

The conversational path is the right reach for any object you can describe ("the red couch", "the lamborghini") or any object you've selected before typing.

What attaches actually do

The visualizer's auto-prompt picks up references at prompt-build time. The Subjects block of the prompt mentions that the object has a reference image; the model receives both the prompt text and the image at generation. See How the visualizer thinks for the deeper flow.

When to reach for it

The signal is consistent: a generation has the right composition but the wrong specifics.

  • Right place, wrong car. The vehicle is positioned correctly, the angle is right, the environment is right – but it's a generic mid-sized SUV instead of the actual model in the brief.

  • Right pose, drifting face. The character is at the right scale and the right pose, but the face is changing shot-to-shot.

  • Right product silhouette, generic surface. The shape and proportion are right, but the materials, finishes, or branding are off.

Re-prompting won't fix any of these. The model isn't being given the spatial truth, it's being asked to invent one. Attach the reference and the next render comes back resolved.

Tips that level up the result

Vanilla object names beat descriptive ones for character consistency. "Mark" and "Jeff" hold their look across renders better than "pirate captain" and "British sailor", because descriptive names pull the model toward generic interpretation and away from the specific reference image. The webinar covers this directly around the 18-minute mark.

Multi-view references give better mesh-aware results. Front, three-quarter, and side photographs of the same object produce more consistent renders than a single photograph from one angle. Use the multi-image slot when you have them.

Build mode renders the reference live. Once a reference attaches, the gray-boxed object updates in the viewport to reflect the reference's color and surface. You're seeing approximately what the visualizer will render, even before you generate.

For imported CAD, pick reference images that match the shots you plan to render. Mix close-ups with the camera angles you expect in the final video. A reference set heavy on macro detail won't help the model on a wide tracking pass; a set of only side profiles falls apart on a three-quarter render. Pick the photos you're confident will appear on screen.

Limits

  • Image reference doesn't override composition. A reference image of a car doesn't tell the model where to put the car – the 3D scene does. Reference is for what the object looks like, not where it goes.

  • Model fidelity to references varies. Nano Banana Pro and Flux Pro Kontext are tuned for reference-image conditioning. Older video models honor references less tightly. If a reference seems to be ignored, try a different image model first; if the result is still off, the reference may need to be sharper.

  • Sub-megapixel images are fine; multi-megapixel adds nothing. Every image reference is downsampled to a 720 × 1024 sheet before the visualizer hands it to the model, regardless of source size. A clean 1024 × 1024 file and a 4000 × 4000 file land at the same effective resolution. If an object has multiple reference images attached, they're consolidated onto a single 720 × 1024 sheet at the same step (front, three-quarter, and side angles composited together rather than passed in separately).

Further viewing

Phil walks through image reference in the AI You Can Direct webinar (55 minutes; the pirate-scene segment around the 18-minute mark covers reference images directly).

Last updated

Was this helpful?