Depth2Image AI Use Case Round Up

/imagine reskinning real & virtual worlds with your imagination

Jan 31, 2023

Hey Creative Technologists — welcome to another AI round up!

Previously, we’ve covered the potential of depth2img to maintain structural coherence when making AI art. As a quick refresher:

Bilawal Sidhu @bilawalsidhu

Stable Diffusion 2.0 is out, and the feature I’m most excited about is #depth2img. Inferring a depth map to maintain structural coherence will be pretty sweet for all sorts of #img2img use cases. For instance...

Now, let’s take things a step further and explore concrete use cases, and evaluate the strengths and weaknesses of this technique in action.

Reskinning Places & Objects — Both Real & Virtual

With Depth2Image, the structure depicted in your image is the guide. What’s important is you block out your scene, the elements in it, and take a few photos. Critically, it doesn’t matter how you make this image i.e. if it’s real life or a synthetic image made with simple primitives in a 3D tool like Blender.

Whether you infer a depth map or export a synthetic one, it’s all the same thing. Think of your input images in this Depth2Image workflow akin to a “3D” rough sketch — also called “greyboxing” in the game development world. Once you’re happy with a image reference, run it through depth2img and voila:

Grey boxing is a popular technique in game development to block out a scene using simple shapes. It’s the game design version of UI wireframes.

Let’s start by making a virtual environment. Here are some fun results I explored while creating a new background for my video conferencing sessions. Of course what I wanted is my own personal holodeck:

Bilawal Sidhu @bilawalsidhu

Depth2image in Stable Diffusion is a blast ⬇️ Love the structural consistency, allowing me to explore different styles for this virtual set. This technique will be a boon for set design and quite frankly home decor.

Now let’s consider the more complex composition below — you’re standing inside a control room, looking out the window to see a massive stargate ring, with a ramp leading up to it. Plus all the intricate details layered on top of it.

Achieving this type of an output consistently would be hard to nail, even with complex prompt weighting and construction. By using depth to guide the image generation process, we can easily try completely different aesthetics - from a sci-fi reactor to a Lego reprise while retaining the essence of the scene

Bilawal Sidhu @bilawalsidhu

Take this iconic shot from my favorite sci-fi show ever - Stargate SG-1 and you can quickly see how this’ll be hugely valuable for 3D artists looking for inspiration while modelling and kit bashing. Love the Lego-ish version on the top left lol

Whether you’re reskinning your game level or bedroom, depth is such a powerful way to provide concrete guidance to the sometimes chaotic AI image generation process.

stableboost @stableboost

🔥We added a new model to our site: depth2img model🔥 It infers the depth of an input image and then generates new images using both the prompt and depth information Great for generating interior design ideas for a room in your house or variations of a game asset Input 1 get 2!

But this technique can be practical too. Need to remodel your room? Get a massive pinterest board customized to your own needs. Having launched ARCore APIs in the past, it blows my mind that this workflow to build a AR shopping app doesn’t really need ARCore or ARKit.

Rik Nieu 🤔 @RikNieu

I finally got SD 2.1 to work. The depth-2-img setting is really good

Got scale models? Snap a photo with your phone and use depth2image to reskin them. Use random things you have laying around to “greybox” in real life — think like an architect… Because IMO this is basically a prototype for a future XR application :)

Karen X. Cheng @karenxcheng

Using AI for design inspiration We used Stable Diffusion Depth to Image to get the consistency - collab with @justLV See below for our process #ArtificialIntelligence #stablediffusion #interiordesign

Of course, you can also apply depth2image to video to get some fun results. In this case, I used a NeRF capture of a seal I came across on the SF embarcadero. To add some spice, I made a fun animation, dramatically playing with the field of view to get some interesting perspective shifts.

Bilawal Sidhu @bilawalsidhu

🎨 Well this is kinda cool. 3D scan of a psychedelic seal in SF → transformed with stable diffusion depth2image. #neuralrendering #nerf #depth2image #stablediffusion

Taking things a step further, you can also exploit geometric understanding to selectively change aspects of the scene, and elements contained within it. Think semantic segmentation + depth aware diffusion, like the following:

Mishig @mishig25

context aware image transformation using @diffuserslib v0.10.0 (utilizing Depth2Image model) example: transforming the initial image into "a photo of a bag of gummy bears on a basketball court" gr.demo: huggingface.co/spaces/radames…

Not Just Places & Things — People Too!

If you like the composition of a shot — like this one here, you can easily reskin it and swap the entities depicted in this world to what you really want, which is obviously elmo taking the leap to live his dreams. So you could go to any stock image site and grab something. Heck, go into your own camera roll and grab something that tickles your fancy as a reference composition.

Omar Sanseviero @osanseviero

You can now try a demo for Stable Diffusion Depth 2 Image huggingface.co/spaces/radames… 🔥 Add an image and a prompt, get the depth, and generate a new image!🤯 professional photo of a Elmo jumping between two high rises, beautiful colorful city landscape in the background

Naturally, we have to talk about synthetically generated 3D characters as well. In this case Blender Sushi has rendered out a short CG animation (unsurprisingly using Blender). This example beautifully illustrates how you can try so many different styles reskinning this character while keep the coarser geometry of the character fairly consistent.

BLENDER SUSHI 🫶 MONK-AI 24/7 Blend Peace 4 All @jimmygunawanapp

#stablediffusion2 #depth2image

A unique use case of depth2image is making digital humans — wherein you can take live action footage, or “low quality” synthetic 3D renders, and “up res” them using depth guided img2img. Check out these results:

TomLikesRobots @TomLikesRobots

A very quick test using depth guided #img2img and #EbSynth from @scrtwpns Temporal coherence is far better than vid2vid and #depth2img creates a really accurate keyframe. I need to do a deep dive into this. Masking and using AI generated environments? #stablediffusion #aiart

TomLikesRobots @TomLikesRobots

1st Animation test with depth2img. Pretty interesting - seems to be less flickery than standard img2img/vid2vid. Very quick. Each 50 frame animation (768x448) took about 1 minute 52. I expect this to be really good once we get to grips with it. #stablediffusion #AIart https://t.co/P7RbnuvjLQ

How about a more cartoon-ey result this time? Like a high quality GTA cover. Kind of looks like a scruffier Jason Statham 😁

TomLikesRobots @TomLikesRobots

Another test combining #stablediffusion #depth2img with #ebsynth This time a cel-shaded animation from a video of me (Not usually so croaky and full of cold!). Background masked out. Need to see if better source using DSLR and lighting improves quality of end animation. #aiart

TomLikesRobots @TomLikesRobots

Now let’s flip it. Here, Jesse nicely illustrates using screenshots from GTA 5 to compose a scene, then making it photorealistic using depth2image. How crazy is that? Using AI to take a game from 2013 to resemble real world imagery. Talk about breathing new life into decade old game.

Jesse Luoto @luotojesse

A quick #stablediffusion #depth2img test with a game screenshot as the input (GTA5)

Synthetic imagery as an input for Depth2Image is even more perfect for animation. In the example below, Blimey shows how you can use a cheap, real-time 3D tool to quickly compose your scene and animation, then adding a far more eye catching hand drawn style using depth2img. It’s kind of like cell shading on steroids… Except, you can apply cell shading to anything you can image with a camera or rendering engine.

Blimey Create @blimeycreate

Process of creating "Cars on a bridge" #stablediffusion #depth2img

Of course since we’re using 2.5D depth maps to guide the diffusion process, we can also take the output 2D image from depth2image, and project it back into 3D space — like in the following example by Stijn:

Stijn Spanhove @stspanho

Using Stable Diffusion depth2img to automatic texture an entire scene with a single prompt in #threejs #stablediffusion

Depth2img is fun — And I enjoyed making this round up of all the cool stuff creators are making with depth2image. You might be surprised how much structure can be conveyed with a single depth map image. but remember it is called 2.5D for a reason. Go too far outside the camera frustum of the depth map, and it won’t really work.

But, let’s be honest… really where all of this stuff is going is… well, 3D. And that is the perfect subject for a future round up.

Emm @emmanuel_2m

He who has ears to hear, let him hear. #StableDiffusion2 #depth2img #AI #3D

Thank you for reading Creative Technology Digest. If you enjoyed this, consider sharing it with your homies who might also enjoy it.

Cheers,

Bilawal Sidhu PS: Check out the creative technology podcast

Spatial Intelligence

Discussion about this post