-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Training #11
Comments
I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4. |
Thank you, did you use any learning rate scheduler or just keep it a constant during training? |
I used constant learning rate. I did not try any learning rate scheduler. |
The training progress depends on the number of images seen during training, which is At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples. |
Hi @jychoi118 @XinYu-Andy I trained a new model with my own dataset using the image_train.py, and the following hyperparameters: python scripts/image_train.py --data_dir datasets/art --image_size 64 --num_channels 128 --num_res_blocks 1 --diffusion_steps 100 --noise_schedule linear --lr 1e-4 --batch_size 32 and the following command to run the sampler: python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 64 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_540000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output But I am getting the following error:
RuntimeError: Error(s) in loading state_dict for UNetModel:
One of the suggestions here is to increase the depth : Could you please help? |
Maybe because the hyperparameters you used for training and sampling are different. Try modifying your sampling command by: |
Thank you that worked, I also found your command in one of the other issues and I used it it worked. Hyperparameters were mismatched in my error. Eventually these both worked I used this command for training: And this for sampling: python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_430000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output |
@jychoi118 Thanks for your explanation for the number of images seen during training! However, I wonder if the |
Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?
The text was updated successfully, but these errors were encountered: