Is AI the end of VFX?

The recent SAG and Writers Guild strikes have highlighted concerns about generative AI in the arts. People are scared. They are worried that machines will take their jobs. These are reasonable concerns but probably way overblown for the time being. In this post I’ll give my perspective on the issue and hopefully answer questions you might have as a student of VFX.

By now I’m sure you’ve seen several of the tools that have hit the wild like Stable Diffusion, Midjourney and the many other generative AI tools currently available. You may have seen the recent announcement of Adobe Generative Fill which moves this technology into mainstream adoption (if it wasn’t already). After seeing the capabilities of these tools you may have asked yourself: Do we as visual artists need to worry? Is this the end of visual effects? What's the point of learning how to do things the old school “hard way” when so many of these tools seem to make it so easy to produce extremely high level work with practically no effort at all? Here is my current personal opinion on the matter.

First of all, while these tools are impressive in many ways, they are also somewhat underwhelming in other ways. While they create surprisingly rich output the problem is they don’t create deterministic and specific enough output without extensive training and input from the user. For VFX work we are expected to deliver very specific and high quality results and in order to complete our professional missions we typically need to use supervised ML tools to achieve those very specific results. In that respect, for production work the new ML based tools are more like an extension of our traditional tools. They may save us some time once trained but we still need to provide a lot of input to guide the results of the ML tools. In many cases more traditional techniques are more efficient and controllable. At best, the ML tools save us some tedious effort on easy to automate tasks. However, to get the best results all of the tools still require a competent operator with a good understanding of composition and aesthetics. They also still presume a high level of understanding regarding technical considerations like bit depth, resolution, color space,  etc. At present, the ML tools still can't make technical or artistic decisions for you. You are still the one in control. You need to go into the project knowing what the outcomes are that you're hoping to achieve. You also have to have developed your “eye” to the point you can judge if the quality of the output is up to professional standards. It also needs to be stressed that in order to use the output of any of these tools in a traditional visual effects production context still requires that you ultimately interact with our traditional “old school” workflows also. For example, while it may be possible to use a ML based approach to producing mattes, the operator still needs to know how to use the mattes effectively in subsequent compositing steps. While it may be possible to use a tool like Generative Fill or Stable Diffusion to create impressive backgrounds or infill/outfill, getting a pleasing final result still requires that the operator have a good sense of composition and aesthetics in order to create a composition that has any level of appeal. In short, there is still a very long list of both artistic and technical considerations that the artist must still take control of.  In that respect I've started to think of a lot of these ML based tools as a form of enhanced automation. While they DO provide an incredible level of capability in the end it's still the content creator/artist that needs to make all the creative and technical decisions required to produce the desired output. These new tools move every creator up the value chain more towards something like an art director by removing some of the tedious work that we would have to do to produce the output. Depending on your target output you might still need to be pretty technical also… on par with a traditional old school VFX artist. Let's call this new type of operator a Technical Art Director, a hybrid of an Art Director and a traditional Technical Director.

While some people really enjoy the manual steps necessary to create final work I personally don't see a problem with skipping the tedious steps and getting straight to the last 10% of the content creation. It does bring up the question of ethics when producing certain kinds of work for example Text 2 Image with tools like Stable Diffusion. If the AI is the one generating most of the work what was the artist's input really? It takes a long time to become a competent illustrator or character modeler capable of making appealing characters. The same is true for modeling most types of models from scratch actually. It’s only that characters require additional understanding of anatomy and musculature in order to be convincing. I think this is one place where we need to be careful for how we take credit for images produced primarily via TXT2IMG. I'm not sure if there's even a name yet for creators that are more of  “Prompt Crafter” than an actual artist themselves.  But what about creators that use tools like controlnet or image to image to heavily influence the output of a system like Stable Diffusion? What about an artist who uses Generative Fill to carefully build up a final output image from a complex series of prompt-based Generative Fill substitutions in an image? That’s definitely more input from the operator than a single simple text prompt but less than someone that has studied and practiced figure drawing for 5 years. Still, like it or not these technologies are here to stay so it's best to embrace them and master them like all the other DCC technologies that came before them. We may need to create a new vocabulary to give a title to the operators who master these tools. In some cases “Prompt Crafter” doesn’t give them enough credit where as “C.G. artist” would be giving them too much. However, they were still the prime mover and curator of the image, even if an ML algorithm like Stable Diffusion generated the pixels. Remember, it wasn’t too long ago that many people did not consider photographers artists, nor after that C.G. artists! Most people have since moved on to now acknowledge both as forms of art. We’ll probably need to do the same for generative art at some point. There will always be some creators that are better at making “good” output than others, regardless of if it’s using a camera to create images, a C.G. program, or coaxing the images out of the mind of an ML model.

This new wave of generative AI tools are in some ways the beginning of what I believe will be a Renaissance in digital content creation. How so? They lower the barrier to entry for creating incredibly high quality work. They democratize image creation. They raise the bar for everyone, even novices. What would have required massive amounts of effort and expertise in Photoshop in the past can now be done with image selections, a text prompts and some clean up with the clone tool. In some ways Generative Fill makes Photoshop actually work the way many lay-people believed it worked for years!  The latest ML based tools in Nuke remove a huge amount of the tedium from match moving and roto. Text2Image tools like Stable Diffusion are mind blowing in their ability to pluck images from the latent space “brain” of the AI. While it seems like magic, it’s just the march of progress. It’s an amazing time to be a creator. The more comfortable we become with these new tools and how to integrate them into our routine work flows, the more super powers we will have.

I’ll be covering more advanced workflows in Nuke in the coming weeks including stuff like Smart Vectors (which is not ML based) and CopyCat (which is!)  so stay tuned!