AI rendering has moved from experiment to everyday workflow for most architecture practices, but the vocabulary hasn't followed. Architects are using terms like "diffusion model" and "ControlNet" without a clear shared definition — and the gap between what the technology does and what it's described as doing is getting wider.
Here are twenty terms worth knowing. Each one has a plain-language definition and a sentence on why it actually matters to your workflow.
The basics
1. Diffusion model The AI architecture behind most rendering tools. A diffusion model starts with an image of pure random noise and iteratively denoises it, guided by a text prompt and conditioning inputs, until it reaches a coherent image. Each step of the denoising is a small learned transformation. Why it matters: understanding that outputs are generated through iterative denoising explains why the same input can produce slightly different outputs each time — the starting noise is random, and the path through denoising is probabilistic.
2. Text prompt The instruction you give the AI: "a sunlit Scandinavian kitchen with white oak cabinets and concrete floors." The prompt shapes the output alongside your conditioning image. Why it matters: architecture-specific tools reduce your dependence on prompt skill by handling the conditioning internally — you specify materials and lighting presets rather than writing a paragraph of instructions.
3. Conditioning image Your input: the 3D model viewport, SketchUp screenshot, or Revit export. The conditioning image anchors the output to your geometry. This is what separates architectural AI rendering from text-to-image tools — the AI isn't imagining a space, it's rendering your space. Why it matters: the quality of your conditioning image (clean model, good camera angle, consistent materials) directly affects render quality.
4. Image-to-image (img2img) Generating a new image by transforming an existing one, rather than generating from pure noise. Most architectural AI rendering is img2img — you provide a 3D viewport and receive a photorealistic version. Why it matters: this is the mechanism that converts your model screenshot into a render, as opposed to text-to-image which would generate an arbitrary space from a description.
5. Denoising strength Controls how much the model deviates from your input image. High denoising strength: the model has more creative freedom and the output can diverge significantly from your input. Low denoising strength: the output stays very close to your input but may not achieve photorealism. Why it matters: finding the right balance is the core tension in AI architectural rendering — realism vs fidelity.
Geometry and fidelity
6. ControlNet A neural network extension that conditions a diffusion model's output on a structural input — edge lines, depth maps, surface normals extracted from your 3D model. ControlNet is what makes AI rendering architectural rather than generative: it gives the model a spatial map of your design to work from. Why it matters: ControlNet conditioning strength is the primary control over geometry preservation. Higher weight = more faithful to your design.
7. Geometry preservation The ability of a rendering tool to reproduce your input geometry exactly — walls at the correct position, openings at the correct proportion, structural elements where you placed them — without creative reinterpretation. Why it matters: for client deliverables, planning submissions, and competition entries, the render must match the design. Geometry drift is a professional problem, not just an aesthetic one.
8. Geometry drift / hallucination When the AI alters your input geometry: moving walls, adding furniture you didn't model, changing window proportions, adjusting ceiling heights to match training data expectations. Caused by low ControlNet conditioning strength or model training that prioritised visual quality over architectural accuracy. Why it matters: a render that doesn't match your design misleads clients and creates expectations that don't exist in the actual project.
9. Depth map A greyscale representation of your 3D scene where pixel brightness indicates distance from the camera. Bright pixels are close; dark pixels are far. Used as a ControlNet input to preserve the three-dimensional spatial structure of your model. Why it matters: depth map conditioning helps the AI understand which elements are in front of which, preventing surfaces from being flattened or spatial relationships from being lost.
10. Edge detection (Canny/HED) An algorithm that extracts edge lines from your viewport — wall boundaries, door frames, structural columns, furniture outlines — and uses these as ControlNet conditioning. Canny and HED are two specific algorithms with different sensitivity settings. Why it matters: edge-conditioned renders preserve the linear geometry of your architecture very closely. If wall lines in your model are clean, edge conditioning produces highly faithful geometry in the output.
Lighting and materials
11. Lighting preset A named lighting condition — golden hour, overcast, blue hour, midday — that sets sun angle, colour temperature, sky state, and shadow quality in one selection. Why it matters: replaces the need to manually configure HDRI environment images, sun position, and exposure settings. Consistent, repeatable results without lighting engineering knowledge.
12. Golden hour The period approximately one hour after sunrise and one hour before sunset, when the sun is low and light is warm amber/orange. In photography and rendering, golden hour light creates long soft shadows, warm colour temperature (~2500–3500K), and a cinematic quality. Why it matters: the most universally flattering lighting for residential interiors and exterior architectural shots. Clients respond to warmth.
13. Blue hour The period after sunset (or before sunrise) when the sky is deep blue and artificial interior light appears warm by contrast. Blue hour creates a dramatic colour split between cool exterior and warm interior. Why it matters: the go-to lighting condition for luxury residential, hospitality, and high-end commercial renders. The mood it creates — evening light, inhabited space — is difficult to achieve with other conditions.
14. Colour temperature Measured in Kelvin (K). Low Kelvin (~2700K) = warm, orange/amber light, like an incandescent bulb or the setting sun. High Kelvin (~6500K) = cool, blue/white light, like overcast daylight. Why it matters: colour temperature is the single most significant variable in the emotional tone of a render. The same room feels cosy at 2700K and clinical at 6500K.
15. HDRI (High Dynamic Range Image) A 360° panoramic photograph of a real environment, used as a lighting source in traditional rendering. The HDRI provides realistic ambient light, sky colour, and reflections for the scene. Why it matters: AI rendering tools with HDRI-equivalent conditioning can produce physically plausible ambient light from any real-world environment without requiring you to source or configure HDRI files.
Workflow terms
16. Inpainting Editing a specific region of an existing render while leaving the rest unchanged. You mask an area (a chair you want to replace, a window treatment that's wrong) and the model regenerates only that region consistently with the surroundings. Why it matters: lets you fix specific problems in an otherwise good render without re-generating the whole image from scratch — significant time saving in a production workflow.
17. Upscaling Using AI to increase the resolution of an image beyond its native size without losing apparent detail. AI upscalers (ESRGAN, Real-ESRGAN) add learned detail rather than just interpolating pixels. Why it matters: lets you render at moderate resolution for quick iteration and upscale only the approved output, rather than running every iteration at full 4K.
18. Seed The random number used to initialise the noise for a diffusion generation. The same seed with the same prompt produces approximately the same output — it's a way to reproduce a result you liked or to make controlled variations. Different seed = different variation, all else being equal. Why it matters: seeds are how you create coherent variation sets — "show me this room in three different lighting conditions" while keeping everything else consistent.
19. Latent space The abstract mathematical space in which diffusion models operate. Your prompt and conditioning image are encoded into latent representations; the model denoises within this space; the final output is decoded back to pixel space. Why it matters: understanding latent space explains why prompts that seem similar can produce very different outputs — small distances in pixel space can be large distances in latent space, and vice versa.
20. Render farm A network of servers processing renders in parallel, traditionally used by large studios to reduce render times from hours to minutes. Cloud AI rendering tools effectively provide every user with a render farm for every job. Why it matters: cloud rendering means consistent ~30-second output regardless of what else is running on your machine, no hardware maintenance, and costs structured as operating expense rather than capital investment.
What is a diffusion model in simple terms? A diffusion model starts with random noise and repeatedly applies small learned transformations to turn it into a coherent image, guided by a text prompt and structural inputs from your 3D model. Think of it as sculpting from noise rather than painting from blank canvas.
What does ControlNet do in AI rendering? ControlNet conditions a diffusion model's output on a structural representation of your 3D model — edge lines, depth maps, surface normals. It's the mechanism that keeps the AI's output anchored to your geometry rather than generating an arbitrary space. Higher ControlNet weight means more geometric fidelity.
What is geometry preservation? Geometry preservation is a rendering tool's ability to reproduce your input geometry — wall positions, opening proportions, spatial relationships — exactly as modelled, without adding, removing, or shifting elements. It's the primary engineering priority for architecture-specific tools like Maquete and the main point of difference from general-purpose AI image generators.
What is the difference between a lighting preset and a prompt? A lighting preset is a pre-engineered conditioning signal that reliably produces a specific lighting condition — golden hour, overcast, blue hour. A prompt-controlled lighting instruction ("warm evening golden light from the west") depends on how well you phrase it and produces less consistent results across different scenes. Presets are more repeatable; prompts are more flexible.