Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Training #11

Open
XinYu-Andy opened this issue May 1, 2022 · 9 comments
Open

Question about Training #11

XinYu-Andy opened this issue May 1, 2022 · 9 comments

Comments

@XinYu-Andy
Copy link

Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

@jychoi118
Copy link
Owner

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

@XinYu-Andy
Copy link
Author

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

Thank you, did you use any learning rate scheduler or just keep it a constant during training?

@jychoi118
Copy link
Owner

I used constant learning rate. I did not try any learning rate scheduler.

@XinYu-Andy
Copy link
Author

I used constant learning rate. I did not try any learning rate scheduler.

Thank you very much. BTW, can I ask how many training iterations are needed to produce "resonable" results on FFHQ? (I understand that you trained it for 1.2M iterations as the paper said) I have trained on this dataset for 400K iters currently, but it can only produces faces like this:
media_images_evaluate_sample_7_410000_322494068ca1bf2c5a45
media_images_evaluate_sample_7_412000_848123f7e7201907e930

@jychoi118
Copy link
Owner

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

@shahdghorsi
Copy link

Hi! Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

Hi @jychoi118 @XinYu-Andy I trained a new model with my own dataset using the image_train.py, and the following hyperparameters:

python scripts/image_train.py --data_dir datasets/art --image_size 64 --num_channels 128 --num_res_blocks 1 --diffusion_steps 100 --noise_schedule linear --lr 1e-4 --batch_size 32

and the following command to run the sampler:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 64 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_540000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

But I am getting the following error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for UNetModel:

Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias",

One of the suggestions here is to increase the depth :
openai/guided-diffusion#7 (comment)
But I am not really sure if I understood what is meant and wether it will solve the problem.

Could you please help?

@jychoi118
Copy link
Owner

Maybe because the hyperparameters you used for training and sampling are different. Try modifying your sampling command by: --diffusion_steps 100 --resblock_updown False and remove --num_head_channels. You can check the default hyperparameters here.
By the way, I recommend using --diffusion_steps 1000 for training.

@shahdghorsi
Copy link

shahdghorsi commented Aug 2, 2022

Thank you that worked, I also found your command in one of the other issues and I used it it worked. Hyperparameters were mismatched in my error.

Eventually these both worked

I used this command for training:
python scripts/image_train.py --data_dir datasets/art --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_res_blocks 1 --num_head_channels 64 --resblock_updown True --use_fp16 False --use_scale_shift_norm True

And this for sampling:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_430000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

@LinWeiJeff
Copy link

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

@jychoi118 Thanks for your explanation for the number of images seen during training! However, I wonder if the iteration you said in batch_size x iteration is equal to the step in log information when runs the training code (as shown as the picture).
messageImage_1702614524243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy