AI Room Makeover: Reskinning Reality With ControlNet, Stable Diffusion & EbSynth

Artists have wished for deeper levels on control when creating generative imagery, and ControlNet brings that control in spades. Let's make a video-to-video AI workflow with it to reskin a room.

Feb 20, 2023

Hey Creative Technologists! Today we’ll be covering an AI video experiment I created to learn and prepare a deep dive on ControlNet.

Motivation: Why we need more than text2img and img2img

Let’s face it… text is a pretty fickle way to describe the intricate composition of a scene, the lighting conditions, the pose of the characters, the objects within in the scene etc. Typically artists need to go through hundreds of iterations, using a myriad of unspecific negative prompts (bad anatomy anyone?) to get a satisfactory final result.

Mood boarding is one thing, but what if you have a very specific creative vision in mind? Enter ControlNet — game-changing method (by two grad students!) that allows artists to take large pre-trained diffusion models (like Stable Diffusion 1.5) and extend them to support additional input conditions.

These input conditions (e.g. depth maps, full body pose, edge maps, normal maps) give artists new ways to exert control over the otherwise chaotic diffusion process. And once you compose them together, magic happens:

toyxyz @toyxyz3

As always, the best way to learn about a new capability is to use the darn thing to make something — so I gave myself a challenge to create a video animation reskinning a room.

Okay, now let's quickly break down this hacky but fun workflow!

Bilawal Sidhu @bilawalsidhu

ControlNet experiment where I'm toggling through different styles of contemporary Indian décor, while keeping a consistent scene layout. Loving how ControlNet is putting the artists back in control of AI image generation process. 🧵Thread #ControlNet #StableDiffusion #EbSynth

1/ INPUT VIDEO: DEFINING THE SCENE LAYOUT & CAMERA ANIMATION

For my input, I made an animation from a photogrammetry 3D scan I did a few years ago of my parents' living room in India. After setting up interest points in SketchFab, I simply screen recorded a 3D viewer while I animated the camera from point to point.

Bilawal Sidhu @bilawalsidhu

For the input sequence I use a short animation made from a photogrammetry 3d scan I did a few years back of my parents living room in India. - Top: Output generated with ControlNet + EbSynth - Bottom: Input video sequence from my 3D scan

2/ DEPTH MAP GENERATION: DESCRIBING SCENE STRUCTURE FROM THE CAMERA'S POV

I used MiDaS to generate depth formatted correctly to work with ControlNet. It's a SOTA model that can infer depth using a single 2D photo as an input on the clip. For an indoor scene like this it'll do great. Quick and simple.

Bilawal Sidhu @bilawalsidhu

I used the ControlNet depth method for this experiment. - Left: MiDaS depth map generated from a screen recording of my 3d scan - Right: Awesome results reskinning the room! Pretty majestic if I say so myself Next time I want to use the synthetic depth from my 3d scan

3/ CONTROLNET + SD 1.5: CONTROLLING THE CHAOS WITH DEPTH CONDITIONING

Depth conditioning is sweet, and produced stunning results reskinning this room. The higher res depth guidance makes it easy to nail in the contemporary look I want because a lot of that is encapsulated in the furniture. Not only is CN high res vs. SD 2.0 (while being cheaper to train!) it brings depth conditioning to SD 1.5, which many artists still prefer.

I’ve covered Depth2Image extensively, so be sure to check out these posts if you’re interested in why it’s important and the creative use cases it enables:

Creative Tech Digest

🧠 Stable Diffusion 2.0 Out, Adding a New Dimension of Depth — Here’s Why It Matters 🎨

AI christmas came early because Stable Diffusion 2.0 is out — and the feature I’m most excited about is depth2img. Inferring a depth map to maintain structural coherence will be pretty sweet for all sorts of #img2img use cases. Let’s explore why. Why Depth Aware Image-to-Image Matters…

3 years ago · 2 likes · 3 comments · Bilawal Sidhu

Creative Tech Digest

Depth2Image AI Use Case Round Up

Hey Creative Technologists — welcome to another AI round up! Previously, we’ve covered the potential of depth2img to maintain structural coherence when making AI art. As a quick refresher…

2 years ago · 5 likes · 1 comment · Bilawal Sidhu

4/ MAKING IT SMOOOOTH WITH EBSYNTH + AFTER EFFECTS

I generated 16 keyframes where I toggle the styles, and ran it through EbSynth to create a temporally coherent end result across the 300 frame clip. EbSynth uses "non-parametric example-based synthesis" to propagate those keyframes (classical CG ftw!). But you will definitely need to bring the final result into your video editor to blend and composite to to be less jarringly between keyframes.

5/ TAKEAWAY: MIX METHODS FOR BEST RESULTS WITH COMPLEX SCENES

Unsurprisingly, different ControlNet methods have their pros and cons depending on your subject matter and they will need to be tuned.

For instance the example below compares processing the same video at 30fps with two different ControlNet methods. "Canny" does well in keeping the wall paintings consistent (thanks to the contrast-y edges), but struggles with the table. Meanwhile, "Depth" crushes it with the furniture but turns the walls into a flickering mess (since it's flat w/ no depth cues).

Bilawal Sidhu @bilawalsidhu

Different ControlNet methods have their pros/cons: - Left: Canny does well to keep the wall paintings consistent (thanks to the contrasty edges), but fumbles with the table - Right: Depth turns walls 🖼 into a flickering mess (since no depth), but crushes it with the furniture

My next experiment will be to combine the strengths of various ControlNet methods into one composite, which will produce even better results for a video2video pipeline.

Closing Notes:

I plan to share all my findings in a ControlNet deep dive post later this week! Subscribe to get the deep dive right to your email:

Twitter fam — Clearly ya’ll want equal parts tech + creativity :)
Want more visual umami? Check out a highlight of my top tweets here here, something which I plan to keep updating periodically.
Got feedback on topics you’d like me to cover, or just want to get in touch? Connect on Twitter or drop me an email here.