Use JSON to get consistent details in your images

You don't need to make a science out of it: Creating images with AI tools like ChatGPT is a piece of cake. However, it is much more difficult to achieve consistent results, as image generation in particular is prone to variations or subtle differences in detail.
Here is a typical process:
I describe my image with as much detail as possible and the AI creates it. It worked so well that I save the prompt. Now I want to create the same image again. And use my prompt, of course. But be careful: the more often I do this, the more irregularities there are. Details differ, colors are different or the image composition looks like a dice. I intuitively adapt my original prompt and enrich it with details. This quickly results in a confused and overly detailed prompt where I no longer have an overview of what should actually be in the picture... There's a better way
This is where the advantages of structured data formats such as JSON come into play. A frighteningly simple but incredibly effective method aims to translate the image into a structured JSON format, which in turn can be given to language models to produce reproducible results without deviating much from the source image. Simply put:
Prompt > Image > JSON > Image
instead of
Prompt > Image > Prompt > Image
After the language model has created the perfect image, I have this image translated into JSON by the same model. This JSON object should contain as much information about the image as possible: image size, color profile, image content, composition, mood, subject, camera settings, etc.. Instead of just saving the original prompt, I save the JSON and use it when I want to create the same image again.
Here is a simple example that demonstrates the advantages of this method:
Prompt 1:
Create me a photorealistic image of a book entirely made out of glass. The book should be opened and resting on a dark wooden table. The camera should point at it from the top. The glass should shimmer in a sapphire blue. The book should be tilted slightly.
Result:

Prompt 2:
Now I want you to create a JSON object from that image that contains all the information of that image. Resolution, size, aspect ratio, color profile, elements, content, arrangements and composition, and everything else you can think of.
Result:
CREATE IMAGE: {
"image_metadata": {
"filename": "glass_book_on_wooden_table.png",
"resolution": {
"width": 768,
"height": 768,
"unit": "pixels"
},
"aspect_ratio": "1:1",
"file_size_estimate": "approx. 500 KB - 1 MB (PNG format)",
"color_profile": "sRGB",
"color_palette": {
"primary": "#4CA3DD",
"secondary": "#1C1C1C",
"highlight": "#A7D5F8",
"shadows": "#101010"
},
"dominant_colors": [
"sapphire blue",
......
Now if we take this JSON object and feed it back into the model, we get nearly the same result again. This creates consistency across generations:

For developers, the whole thing has another advantage. The output of common language model providers has long been controlled using a JSON schema. You can therefore give models a schema that they must always adhere to. If you now create a schema from the "JSON image," you can use it as a template for all images in your application. This also significantly simplifies communication via interfaces. Variable adaptations are also child's play once a fixed schema has been established.