Text-to-Image Task
How to define a text-to-image task
The Stable Diffusion Task Framework has two components:
A generalized schema to define a Stable Diffusion task.
An execution engine that runs the task defined in the above schema.
The task definition is represented in the key-value pairs that can be transformed into, among many other formats, a JSON string, which can be validated using a JSON schema. And the validation tools exist for most of the popular programming languages.
The execution engine is integrated into the node of the Hydrogen Network, and the JSON string format of the task definition is used to send tasks in the Hydrogen Network.
The following is an intuitive look at a task definition:
{
"version": "2.0.0",
"base_model": {
"name": "stabilityai/sdxl-turbo"
},
"prompt": "best quality, ultra high res, photorealistic++++, 1girl, desert, full shot, dark stillsuit, "
"stillsuit mask up, gloves, solo, highly detailed eyes,"
"hyper-detailed, high quality visuals, dim Lighting, ultra-realistic, sharply focused, octane render,"
"8k UHD",
"negative_prompt": "no moon++, buried in sand, bare hands, figerless gloves, "
"blue stillsuit, barefoot, weapon, vegetation, clouds, glowing eyes++, helmet, "
"bare handed, no gloves, double mask, simplified, abstract, unrealistic, impressionistic, "
"low resolution,",
"task_config": {
"num_images": 9,
"steps": 1,
"cfg": 0
},
"lora": {
"model": "https://civitai.com/api/download/models/178048"
},
"controlnet": {
"model": "diffusers/controlnet-canny-sdxl-1.0",
"image_dataurl": "...",
"preprocess": {
"method": "canny"
},
"weight": 70
},
"scheduler": {
"method": "EulerAncestralDiscreteScheduler",
"args": {
"timestep_spacing": "trailing"
}
}
}
More examples of the different Stable Diffusion tasks can be found in the GitHub repository.
Acceleration of the Image Generation
SDXL Turbo
SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. To use SDXL Turbo in your task:
1. Use the SDXL Turbo model as the base model:
"base_model": {
"name": "crynux-ai/sdxl-turbo"
},
2. Set the timestep_spacing
scheduler argument:
timestep_spacing
scheduler argument:"scheduler": {
"method": "EulerAncestralDiscreteScheduler",
"args": {
"timestep_spacing": "trailing"
}
}
3. Set cfg
to zero, and set steps to 1-4:
cfg
to zero, and set steps to 1-4:"task_config": {
"steps": 1,
"cfg": 0
}
Latent Consistency Models (LCM)
Negative prompts won't work with LCM methods.
Latent Consistency Models (LCMs) enable fast high-quality image generation by directly predicting the reverse diffusion process in the latent rather than pixel space. In other words, LCMs try to predict the noiseless image from the noisy image in contrast to typical diffusion models that iteratively remove noise from the noisy image. By avoiding the iterative sampling process, LCMs are able to generate high-quality images in 2-4 steps instead of 20-30 steps.
There are two ways LCM could be used in a Stable Diffusion task: LCM and LCM-LoRA:
1.Load the LCM model corresponding to your base model using the unet
argument:
unet
argument:"base_model": {
"name": "stabilityai/stable-diffusion-xl-base-1.0"
},
"unet": "latent-consistency/lcm-sdxl",
2.Use the LCMScheduler
:
LCMScheduler
:"scheduler": {
"method": "LCMScheduler"
}
3.Set cfg
to 3-13, and set steps
to 4:
cfg
to 3-13, and set steps
to 4:"task_config": {
"steps": 4,
"cfg": 5
},
Base Model
The base model could be the original Stable Diffusion models, such as the Stable Diffusion 1.5 and the Stable Diffusion XL, or a checkpoint that is fine-tuned based on the original Stable Diffusion models.
The model can be specified in two ways: a Huggingface model ID, or a file download URL.
Huggingface Model ID
The Huggingface model ID for the original Stable Diffusion models are listed below:
Stable Diffusion 1.5
{
"base_model": "runwayml/stable-diffusion-v1-5"
}
Stable Diffusion 2.1
{
"base_model": "stabilityai/stable-diffusion-2-1"
}
Stable Diffusion XL
{
"base_model": "stabilityai/stable-diffusion-xl-base-1.0"
}
Custom Fine-tuned Checkpoints
Other custom fine-tuned checkpoints based on the original SD models can also be used, for example, the ChilloutMix model on the Huggingface:
{
"base_model": "emilianJR/chilloutmix_NiPrunedFp32Fix"
}
File Download URL
A URL can also be used as the base model. The execution engine will download the file before executing the task.
For example, if we want to use an SDXL fined-tuned checkpoint on Civitai. The webpage of the model is https://civitai.com/models/169868/thinkdiffusionxl and the download link of the model file can be copied from the download button on the webpage:
https://civitai.com/api/download/models/190908
We could use the model in the task as following:
{
"base_model": "https://civitai.com/api/download/models/190908"
}
LoRA Model
LoRA models can be specified using the same format as the base model: the Huggingface model ID or the file download URL. The weight of the LoRA model can also be set in the arguments:
{
"lora": {
"model": "https://civitai.com/api/download/models/31284",
"weight": 80
}
}
The weight should be an integer between 1 and 100.
If the LoRA model given is not compatible with the base model, for example, a LoRA model fine-tuned on the Stable Diffusion 1.5 is used, but the base model is set to be Stable Diffusion XL, the execution engine will also throw an exception.
Controlnet
The Controlnet section has two parts: the Controlnet model, and the preprocess method.
The Controlnet model also supports the Huggingface ID and the download URL, which is exactly the same as the LoRA model.
The control image should be a PNG image encoded in the DataURL format. The DataURL string should be filled in the image_dataurl
field.
{
"controlnet": {
"model": "lllyasviel/control_v11p_sd15_openpose",
"weight": 90,
"image_dataurl": "base64,image/png:..."
}
}
Image Preprocessing
The image preprocessing function is implemented using the controlnet_aux
project. All the preprocessing methods and models in this project can be used:
{
"controlnet": {
"model": "lllyasviel/sd-controlnet-canny",
"weight": 90,
"image_dataurl": "base64,image/png:...",
"preprocess": {
"method": "canny",
"args": {
"high_threshold": 200,
"low_threshold": 100
}
}
}
}
Here is a list of all the available preprocess methods and their arguments:
canny
high_threshold, low_threshold
scribble_hed
scribble_hedsafe
softedge_hed
softedge_hedsafe
depth_midas
mlsd
thr_v, thr_d
openpose
openpose_face
openpose_faceonly
openpose_full
openpose_hand
dwpose
scribble_pidinet
apply_filter
softedge_pidinet
apply_filter
scribble_pidisafe
apply_filter
softedge_pidisafe
apply_filter
normal_bae
lineart_coarse
lineart_realistic
lineart_anime
depth_zoe
gamma_corrected
depth_leres
thr_a, thr_b
depth_leres++
thr_a, thr_b
shuffle
h, w, f
mediapipe_face
max_faces, min_confidence
If preprocessing is not needed, just set the value of the controlnet
section to be null, or just delete the section from the JSON.
Prompt
Unlike the basic SD models, the length of the prompt is not limited in this framework. The prompt and the negative prompt are specified separately:
{
"prompt": "a realistic portrait photo of a beautiful girl, blonde hair+++, smiling, facing the viewer",
"negative_prompt": "low resolution++, bad hands"
}
Prompt Weighting
Prompt weighting is supported using the Compel library. The basic idea is to put more plus signs (+
) to give the word more weights. More complex usages can be found in the documentation of the Compel library.
Textual Inversion
Textual Inversion models are also supported:
{
"textual_inversion": "sd-concepts-library/cat-toy"
}
VAE
The VAE model used in the Stable Diffusion pipeline can also be replaced with another one, either from the Huggingface ID, or a file download URL:
{
"vae": "stabilityai/sd-vae-ft-mse"
}
SDXL Refiner
If the Stable Diffusion XL is selected as the base model in the task, the SDXL Refiner could also be used to further refine the image, which is by design of the SDXL:
{
"refiner": {
"model": "stabilityai/stable-diffusion-xl-refiner-1.0",
"denoising_cutoff": 80
}
}
The denoising_cutoff
is used to stop the denoising process earlier in the pipeline, when the noise level reaches the cutoff value, and leave the rest to the refiner model, which is called the ensemble of expert denoisers.
Task Config
There are also some config options that can be tuned:
{
"task_config": {
"image_width": 512, // The width of the generated image
"image_height": 512, // The height of the generated image
"steps": 30, // Step to run
"seed": 34736484, // The seed used to initialize the random processes
"num_images": 6, // The number of images to generate in a single task
"safety_checker": true, // Filter the unsafe images
"cfg": 5 // Classifier-Free Guidance, how close the images should be to the prompt given
}
}
Last updated