Question about Training #11

XinYu-Andy · 2022-05-01T18:13:52Z

Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

jychoi118 · 2022-05-02T01:47:50Z

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

XinYu-Andy · 2022-05-02T02:34:23Z

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

Thank you, did you use any learning rate scheduler or just keep it a constant during training?

jychoi118 · 2022-05-02T02:58:45Z

I used constant learning rate. I did not try any learning rate scheduler.

XinYu-Andy · 2022-05-19T08:13:27Z

I used constant learning rate. I did not try any learning rate scheduler.

Thank you very much. BTW, can I ask how many training iterations are needed to produce "resonable" results on FFHQ? (I understand that you trained it for 1.2M iterations as the paper said) I have trained on this dataset for 400K iters currently, but it can only produces faces like this:

jychoi118 · 2022-05-20T01:30:50Z

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

shahdghorsi · 2022-07-31T18:08:37Z

Hi! Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

Hi @jychoi118 @XinYu-Andy I trained a new model with my own dataset using the image_train.py, and the following hyperparameters:

python scripts/image_train.py --data_dir datasets/art --image_size 64 --num_channels 128 --num_res_blocks 1 --diffusion_steps 100 --noise_schedule linear --lr 1e-4 --batch_size 32

and the following command to run the sampler:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 64 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_540000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

But I am getting the following error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for UNetModel:

Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias",

One of the suggestions here is to increase the depth :
openai/guided-diffusion#7 (comment)
But I am not really sure if I understood what is meant and wether it will solve the problem.

Could you please help?

jychoi118 · 2022-08-02T07:07:46Z

Maybe because the hyperparameters you used for training and sampling are different. Try modifying your sampling command by: --diffusion_steps 100 --resblock_updown False and remove --num_head_channels. You can check the default hyperparameters here.
By the way, I recommend using --diffusion_steps 1000 for training.

shahdghorsi · 2022-08-02T19:55:42Z

Thank you that worked, I also found your command in one of the other issues and I used it it worked. Hyperparameters were mismatched in my error.

Eventually these both worked

I used this command for training:
python scripts/image_train.py --data_dir datasets/art --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_res_blocks 1 --num_head_channels 64 --resblock_updown True --use_fp16 False --use_scale_shift_norm True

And this for sampling:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_430000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

LinWeiJeff · 2023-12-15T04:33:52Z

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

@jychoi118 Thanks for your explanation for the number of images seen during training! However, I wonder if the iteration you said in batch_size x iteration is equal to the step in log information when runs the training code (as shown as the picture).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Training #11

Question about Training #11

XinYu-Andy commented May 1, 2022

jychoi118 commented May 2, 2022

XinYu-Andy commented May 2, 2022

jychoi118 commented May 2, 2022

XinYu-Andy commented May 19, 2022

jychoi118 commented May 20, 2022

shahdghorsi commented Jul 31, 2022

jychoi118 commented Aug 2, 2022

shahdghorsi commented Aug 2, 2022 •

edited

Loading

LinWeiJeff commented Dec 15, 2023

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Question about Training #11

Question about Training #11

Comments

XinYu-Andy commented May 1, 2022

jychoi118 commented May 2, 2022

XinYu-Andy commented May 2, 2022

jychoi118 commented May 2, 2022

XinYu-Andy commented May 19, 2022

jychoi118 commented May 20, 2022

shahdghorsi commented Jul 31, 2022

jychoi118 commented Aug 2, 2022

shahdghorsi commented Aug 2, 2022 • edited Loading

LinWeiJeff commented Dec 15, 2023

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

shahdghorsi commented Aug 2, 2022 •

edited

Loading