GREATER AESTHETIC CONTROL SEPARATING STYLE & CONTENT PROMPTS
Art Directing GenAI… or Narrative Style Creation & Transfer with LLMs & Text-to-Image Generative AI Systems
Creating images with generative systems like MidJourney, Dall·e and similar can be fun and useful, but many people struggle to control the aesthetic style of the image, especially when trying to create multiple images within a series. This happens for a couple reasons, the most basic is that people underestimate that these applications are just tools, and this still requires an expert user to fully utilize them. Just like anyone can pick up a pencil or brush, nearly anyone can use these tools, having access to a tool does not immediately convey expertise.
Knowing how to describe a scene, the framing, colors, application of the mediums, the way that light, and form, and shadow interact often comes with an arts education, or significant experience in the space. I think creation should be as democratized as possible, I can’t teach you a full degree program’s worth of art history, but I can give you a shortcut to getting the look you want out of a GenAI system, and help you more easily create multiple image with the same look and feel.
Separating your image style from content will give you a much greater control over both, and offer greater image to image consistency when creating multiple stylistically similar images. Let’s learn how!
Defining the Style Prompt
You have a couple ways to start, in order from least to most complex:
- You can use one more more stylistically similar images existing image
- You can leverage an LLM to create a Style Prompt
- You can create an initial image with a GenAI system
- You can create a Style Prompt from scratch
All 4 methods have differences and similarities, I’ll go through each, but if this is your first time, I’d recommended starting at the top and working your way through them one by one as you gain confidence and experience.
Starting with an existing Image
Choose an image and use a multi-modal LLM (such as ChatGPT) to upload an image whose style you’d like to use as your Style Prompt. For this example, I’ll use this image by painter and printmaker Wayne Thiebaud.
Upload the image to the system and use the following prompt with your upload.
“Describe the style of this image in exhaustive details without any mention of the content or objects depicted in the scene. Describe it in a way that a text to image system could use the description to create a new image that would appear to be part of a set or created by the same artist as part of a series, be careful not to include any description of the content of the image. Provide this descriptions in a very high level of detail including things like perspective, vanishing point, angles, lighting direction, palette, and style of the scene as well as any distinct elements within the scene, etc. Pay special attention to framing, quantity and use of color (or lack of) as well as how the paint is applied, the brush strokes, and how form and mass are created, details like shadow angles, lengths, and colors. Make the Style Descriptions approximately 3,000 characters long.”
While every interaction with an LLM will have slightly different output, even if the input is identical you should get something like this:
The lighting in the scene is diffuse, suggesting an overcast or indirect light source, with shadows that are soft and not overly pronounced. This lighting does not cast harsh shadows but does create subtle gradients which help to define form and mass. The direction of the light seems to come from above and slightly to the right, judging by the soft shadows that are cast to the left of objects within the scene.
The color palette is controlled and leans towards a muted spectrum with a dominance of cool tones interspersed with occasional warm accents. There is a range of blues, grays, and off-whites, with strategic uses of yellow and green that draw the eye without overwhelming the scene. The saturation of these colors is not high, which contributes to a realistic and subdued atmosphere.
The painting technique appears to have a precision to it, with clean, defined edges and lines that suggest careful planning and execution. Brush strokes, when visible, are deliberate and contribute to the form and texture of the surfaces. There is a balance between flat areas of color and textured sections, indicating a variety of brushwork. Some areas of the painting exhibit a smoother finish, while others display a more tactile quality.
In terms of form and mass, the artist has used shading and tonal variation rather than line to define shapes. There is a judicious use of highlights and shadow to model the forms, giving them a three-dimensional quality. The shadows tend to have cool undertones, further enhancing the overall cool palette of the scene.
The framing of the image is such that it allows for a dynamic composition, with lines that lead the viewer’s eye through the picture. The composition itself is balanced, with a distribution of visual weight that keeps the eye moving but not resting too long in any one area.
Overall, the style of the scene is one that combines a precise, almost architectural rendering with a painterly touch that adds texture and depth. The approach to composition is both methodical and creative, allowing for a scene that is engaging and full of movement, despite the stillness of the content.
For generating additional images in this style, it would be important to maintain the perspective techniques, the controlled palette, the balance between detailed and smooth brushwork, and the diffuse lighting. A generation seed that could be used to guide the creation of images in this series might be: “architectural precision with painterly depth, cool-muted palette with warm accents, indirect lighting, dynamic composition.” This seed encapsulates the essence of the style and can be used as a reference point for creating images that would appear to be part of a set or created by the same artist.
This is a good start, but its structure is more narrative and long-form than will be useful for us moving forward. Organize it with the follow prompt.
Organize this Style Prompt into semantic and logical sections
The output should look something like the following
Perspective & Composition:
Utilize a dramatic angular perspective with a steep, off-center vanishing point.
Ensure the composition has dynamic lines that guide the viewer’s eye through the image, with a sense of depth and scale.Lighting & Shadow:
Simulate diffuse and indirect lighting from above and slightly to the right.
Create soft shadows with subtle gradients to define forms without harsh contrasts.
Employ soft shadow angles with cool undertones to complement the color palette.Color Palette:
Adopt a controlled, muted spectrum focusing on cool tones with occasional warm accents.
Integrate a range of blues, grays, off-whites, with strategic uses of yellow and green.
Maintain low color saturation to contribute to a subdued and realistic atmosphere.Brushwork & Technique:
Apply paint with precision, defined edges, and deliberate brush strokes.
Combine flat areas of color with textured sections for variety in brushwork.
Use tonal variation and shading to define shapes, with a balance of highlights and shadow for a three-dimensional effect.Image Framing & Balance:
Frame the scene to allow for a balanced composition with a distribution of visual weight.
Ensure the framing supports a dynamic composition that engages the viewer without overwhelming any section.
Architectural precision with painterly depth, cool-muted palette with warm accents, indirect lighting, dynamic composition.
This is much better and easier for you to edit as needed going forward. The next step is to check your work. I like to ensure that my Style Prompt is robust enough and that I’m not getting any carry-over from machine memory so I’ll do two things, the first is to generate a new image in the current session of an unrelated subject matter, to do this, I use the prompt
Using the Style Prompt above, generate a new image of X, use a 20:10 aspect ratio and share the Generation Seed of the image so that I can create a new image that is stylistically similar.
For this step I suggest you use a simple Content Prompt I used “Desolate Americana Town Square” at this point don’t worry about your end goals for compositions, you just want something simple enough to see if the Style Prompt is is robust enough to effectively capture the aesthetics you are looking for. At this point you can make any tweeks you want, shifting the palette warmer or cooler, changing the level of detail, etc. Once done ask the system to regenerate an updated Style Prompt and include the Generation Seed.
Open a new session and repeat the previous exercise to make sure the generation isn’t being influenced by any session memory.
Now you have your Style Prompt. You should be able to copy this Style Prompt out and reuse it at any time. The way I usually start a new session is with the following prompt:
We’re going to generate a series of images with a Content Prompt and a Style Prompt, each image should be 20:10 ratio and fill the frame completely without borders. I’ll provide the Content Prompt afterwards. The Style Prompt is : (paste your style prompt)
The system will likely summarize the Style Prompt and ask you for your Content Prompt. In you next response write
Content Prompt : Description of the image contents you’d like to create
In your Content Prompt be careful to limit it to a description of the objects, setting, and placement of elements in your image, if you describe anything that conflicts with the Style Prompt (e.g. colors, framing, materiality, etc) the system will likely get a bit confused and the look and feel of the scene will diverge aesthetically from your source image more than you’d like.
Creating a Style Prompt with an LLM
You can create a Style Prompt by asking an LLM about the work of a famous or historic artist, an art movement, or even a tangentially related art form (e.g. “Describe Noir Film Aesthetics”) once you have the initial response, you need to clean it up to make it into a reusable Style Prompt that won’t trigger any issues with future image generation, let’s use Salvador Dalí for this example. First ask the system to describe the thing you’d like to use as a reference, in this example I’ve told it to limit the scope to paintings by Dalí, rather than his entire body of work which included sculpture and media work. The image below is an example of Dalí’s work, but was not provided to the system, instead for this example we’re leveraging the information the system has been trained on already.
Its response below:
Salvador Dalí, a prominent Spanish surrealist artist, is renowned for his distinctive and imaginative paintings. His works are characterized by the following features:
Surrealism: Dalí’s paintings are quintessential examples of surrealism, a movement that sought to release the creative potential of the unconscious mind. His artworks often depict dreamlike scenes and bizarre, illogical sequences.
Vivid Imagery: His paintings are filled with striking and vivid imagery. Objects are often morphed or melted, creating a sense of fluidity and transformation.
Symbolism: Dalí’s work is rich in symbolism. Common themes include time, religion, and the subconscious. For example, melting clocks in his famous painting “The Persistence of Memory” (1931) are interpreted as a symbol of the relativity of time and space.
Attention to Detail: Despite the surreal nature of his work, Dalí’s paintings are highly detailed and meticulously crafted, often resembling high-resolution photographs in their precision.
Influence of Freudian Theory: His work was heavily influenced by the theories of Sigmund Freud, particularly those related to dreams and the subconscious. This influence is evident in the dreamlike, often bizarre nature of his imagery.
Optical Illusions and Double Images: Dalí often used optical illusions and double images in his work. A single painting could present multiple scenes or images depending on the viewer’s perspective.
Themes of Death and Decay: Many of Dalí’s works explore themes of mortality, decay, and the passage of time, often in a disturbing or unsettling manner.
Use of Religious Symbolism: Despite his unconventional approach, Dalí frequently incorporated Christian and other religious symbols into his work, reflecting his complex relationship with religion.
Landscapes: Many of Dalí’s paintings include landscapes of his native Catalonia, which are often rendered in a dreamlike, surreal manner.
Paranoia-Critical Method: Dalí developed this unique method, which involved inducing a paranoid state to create art, allowing him to transform and reinterpret reality in bizarre and fantastical ways.
Dalí’s paintings are not just visually striking; they are thought-provoking, often challenging the viewer’s perception of reality and encouraging deeper contemplation of the subconscious and dream worlds.
Telling many text to image systems to “Generate an image in the style of Dalí” will now frequently trigger a copyright protection or intellectual property error. You’ll need to clean up the prompt with a modified version of the one I provided above.
Remove any reference to the artist, specific works, and subject matters depicted.
Describe the style of the artists work in extremely exhaustive details without any mention of objects depicted in the scene. describe it in a way that a text to image system could use the description to create a new image that would appear to be part of a set or created by the same person as part of a series. Be careful not to include any description of the content of the image. Provide this descriptions in a very high level of detail including things like perspective, vanishing point, angles, lighting direction, palette, and style of the scene as well as any distinct elements within the scene, etc. Pay special attention to framing, quantity and use of color (or lack of) as well as how the paint is applied, the brush strokes, and how form and mass are created, details like shadow angles, lengths, and colors. Make the Style Descriptions approximately 3000 characters long.
The system should generate a Style Prompt similar to the first example we went trough, follow the previous steps to generate a couple trial images, update the Style Prompt, and ask for a Generation Seed for future uses.
One of the pitfalls of trying to copy a broad style, like an artist or art movement, is that the Style Prompt might not be as detailed as when you start with a single image or a couple of images as your initial set.
Creating a Style Prompt from a Generated Image
If you’ve already create an image that you like with a text-to-image system you can use the prompts I’ve provided to generate a Style Prompt from your own image, if you’ve just created it, get the Generation Seed from the system. If you created it in a previous session and aren’t able to get the Generation See follow the instructions above for using an existing image to generate a new one from the Style Prompt you create by having the multi-modal LLM analyze the image.
Creating a Style Prompt from Scratch
You can also create a Style Prompt completely from scratch. But why would you do that? I’ve found that separating your Style Prompt from you Content Prompt, while it takes a bit more effort, enabled a much more effective recreation of a style and aesthetic from image to image. You can choose your own format or utilize one of the examples above. Every project is different, but here are some examples of the sections you might want to include when creating your own Style Prompt from scratch.
- Perspective and Composition
- Color and Palette
- Lighting and Shadow
- Brushwork and Texture
- Form and Mass
Best Practiced for a Content Prompt
Your content prompt should describe the scene’s components and composition, the relative scale and placement of objects in the scene, environmental aspects such as the location, lighting situation and sources. I sometimes find it best to start with a very short narrative that describes the overall scene. Try this first and see what the output looks like. After that expand on it with a more detailed description of elements within the scene that you’d like to control more specifically.
Next Steps
I’d love to see what you make with this technique, I think it could be a great tool for serial mediums like comics or storyboard, slide decks, or concept art. I post some of the work that I create with systems like this to one of my instagram accounts especially when I’m trying out a new technique or trying to stress test a system to have it do something particular. Any questions feel free to respond here with a comment or message me on Mastodon.
Troubleshooting
- Sometimes the Style Prompt still contains objects or elements in the image that you provided. You can manually remove them when you paste the Style Prompt back in, or ask the system to remove them and regenerate the Style Prompt
- The image you generate from the Style Prompt doesn’t look close enough to the source image. There can be quite a few reasons for this but the two most common issues I’ve found are that you’ve overspecified in your Content Prompt or the element you’re looking for in the output images wasn’t fully captured in your Style Prompt, if you can go back to the original image you can ask the system to be more verbose in describing a specific aspect (e.g. color palette, artistic style, framing, etc.). Finally make sure your subsequent generation prompts include the same Generation Seed from the image you used when creating your Style Prompt
- Within a session generating new image from a style prompt the style seems to drift after you’ve created a few images. First ensure you’ve used the prompts I’ve shared above, for subsequent prompts make sure you use the format “Content Prompt: Description of your image” If that still doesn’t fix things, include the Style Prompt in every request for a new image.
- Elements from your Content Prompt are missing or ignored. Dalle 3’s prompt memory is 4,000 characters, this means that if your Style Prompt and Content Prompt together are more than 4,000 characters you may get errors or pieces of your prompt could get ignored.
I’ll update more issues or clarifications as people try it out and have questions that I can help with