Skip to content

Releases: huggingface/pytorch-image-models

Release v1.0.15

23 Feb 05:07
Compare
Choose a tag to compare

Feb 21, 2025

  • SigLIP 2 ViT image encoders added (https://huggingface.co/collections/timm/siglip-2-67b8e72ba08b09dd97aecaf9)
    • Variable resolution / aspect NaFlex versions are a WIP
  • Add 'SO150M2' ViT weights trained with SBB recipes, great results, better for ImageNet than previous attempt w/ less training.
    • vit_so150m2_patch16_reg1_gap_448.sbb_e200_in12k_ft_in1k - 88.1% top-1
    • vit_so150m2_patch16_reg1_gap_384.sbb_e200_in12k_ft_in1k - 87.9% top-1
    • vit_so150m2_patch16_reg1_gap_256.sbb_e200_in12k_ft_in1k - 87.3% top-1
    • vit_so150m2_patch16_reg4_gap_256.sbb_e200_in12k
  • Updated InternViT-300M '2.5' weights
  • Release 1.0.15

Feb 1, 2025

  • FYI PyTorch 2.6 & Python 3.13 are tested and working w/ current main and released version of timm

Jan 27, 2025

What's Changed

New Contributors

Full Changelog: v1.0.14...v1.0.15

Release v1.0.14

19 Jan 23:05
Compare
Choose a tag to compare

Jan 19, 2025

  • Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
  • Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
    • vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k - 86.7% top-1
    • vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k - 87.4% top-1
    • vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k
  • Misc typing, typo, etc. cleanup
  • 1.0.14 release to get above LeViT fix out

What's Changed

New Contributors

Full Changelog: v1.0.13...v1.0.14

Release v1.0.13

09 Jan 18:49
Compare
Choose a tag to compare

Jan 9, 2025

  • Add support to train and validate in pure bfloat16 or float16
  • wandb project name arg added by https://github.com/caojiaolong, use arg.experiment for name
  • Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
  • 1.0.13 release

Jan 6, 2025

  • Add torch.utils.checkpoint.checkpoint() wrapper in timm.models that defaults use_reentrant=False, unless TIMM_REENTRANT_CKPT=1 is set in env.

Dec 31, 2024

What's Changed

New Contributors

Full Changelog: v1.0.12...v1.0.13

Release v1.0.12

03 Dec 19:05
Compare
Choose a tag to compare

Nov 28, 2024

Nov 12, 2024

  • Optimizer factory refactor
    • New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
    • Add list_optimizers, get_optimizer_class, get_optimizer_info to reworked create_optimizer_v2 fn to explore optimizers, get info or class
    • deprecate optim.optim_factory, move fns to optim/_optim_factory.py and optim/_param_groups.py and encourage import via timm.optim
  • Add Adopt (https://github.com/iShohei220/adopt) optimizer
  • Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
  • Fix original Adafactor to pick better factorization dims for convolutions
  • Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
  • dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke

Oct 31, 2024

Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat

Oct 19, 2024

  • Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from MengqingCao that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.

What's Changed

New Contributors

Full Changelog: v1.0.11...v1.0.12

v1.0.11 Release

16 Oct 21:19
Compare
Choose a tag to compare

Quick turnaround from 1.0.10 to fix an error impacting 3rd party packages that still import through a deprecated path that isn't tested.

Oct 16, 2024

Oct 14, 2024

  • Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
  • Release 1.0.10

Oct 11, 2024

  • MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
model img_size top1 top5 param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k 384 87.506 98.428 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 288 86.912 98.236 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 224 86.632 98.156 101.66
mambaout_base_tall_rw.sw_e500_in1k 288 84.974 97.332 86.48
mambaout_base_wide_rw.sw_e500_in1k 288 84.962 97.208 94.45
mambaout_base_short_rw.sw_e500_in1k 288 84.832 97.27 88.83
mambaout_base.in1k 288 84.72 96.93 84.81
mambaout_small_rw.sw_e450_in1k 288 84.598 97.098 48.5
mambaout_small.in1k 288 84.5 96.974 48.49
mambaout_base_wide_rw.sw_e500_in1k 224 84.454 96.864 94.45
mambaout_base_tall_rw.sw_e500_in1k 224 84.434 96.958 86.48
mambaout_base_short_rw.sw_e500_in1k 224 84.362 96.952 88.83
mambaout_base.in1k 224 84.168 96.68 84.81
mambaout_small.in1k 224 84.086 96.63 48.49
mambaout_small_rw.sw_e450_in1k 224 84.024 96.752 48.5
mambaout_tiny.in1k 288 83.448 96.538 26.55
mambaout_tiny.in1k 224 82.736 96.1 26.55
mambaout_kobe.in1k 288 81.054 95.718 9.14
mambaout_kobe.in1k 224 79.986 94.986 9.14
mambaout_femto.in1k 288 79.848 95.14 7.3
mambaout_femto.in1k 224 78.87 94.408 7.3

Sept 2024

Release v1.0.10

15 Oct 04:44
Compare
Choose a tag to compare

Oct 14, 2024

  • Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
  • Release 1.0.10

Oct 11, 2024

  • MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.
model img_size top1 top5 param_count
mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k 384 87.506 98.428 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 288 86.912 98.236 101.66
mambaout_base_plus_rw.sw_e150_in12k_ft_in1k 224 86.632 98.156 101.66
mambaout_base_tall_rw.sw_e500_in1k 288 84.974 97.332 86.48
mambaout_base_wide_rw.sw_e500_in1k 288 84.962 97.208 94.45
mambaout_base_short_rw.sw_e500_in1k 288 84.832 97.27 88.83
mambaout_base.in1k 288 84.72 96.93 84.81
mambaout_small_rw.sw_e450_in1k 288 84.598 97.098 48.5
mambaout_small.in1k 288 84.5 96.974 48.49
mambaout_base_wide_rw.sw_e500_in1k 224 84.454 96.864 94.45
mambaout_base_tall_rw.sw_e500_in1k 224 84.434 96.958 86.48
mambaout_base_short_rw.sw_e500_in1k 224 84.362 96.952 88.83
mambaout_base.in1k 224 84.168 96.68 84.81
mambaout_small.in1k 224 84.086 96.63 48.49
mambaout_small_rw.sw_e450_in1k 224 84.024 96.752 48.5
mambaout_tiny.in1k 288 83.448 96.538 26.55
mambaout_tiny.in1k 224 82.736 96.1 26.55
mambaout_kobe.in1k 288 81.054 95.718 9.14
mambaout_kobe.in1k 224 79.986 94.986 9.14
mambaout_femto.in1k 288 79.848 95.14 7.3
mambaout_femto.in1k 224 78.87 94.408 7.3

Sept 2024

Release v1.0.9

23 Aug 23:42
Compare
Choose a tag to compare

Aug 21, 2024

  • Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models
model top1 top5 param_count img_size
vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k 87.438 98.256 64.11 384
vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k 86.608 97.934 64.11 256
vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k 86.594 98.02 60.4 384
vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k 85.734 97.61 60.4 256
  • MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe
model top1 top5 param_count img_size
resnet50d.ra4_e3600_r224_in1k 81.838 95.922 25.58 288
efficientnet_b1.ra4_e3600_r240_in1k 81.440 95.700 7.79 288
resnet50d.ra4_e3600_r224_in1k 80.952 95.384 25.58 224
efficientnet_b1.ra4_e3600_r240_in1k 80.406 95.152 7.79 240
mobilenetv1_125.ra4_e3600_r224_in1k 77.600 93.804 6.27 256
mobilenetv1_125.ra4_e3600_r224_in1k 76.924 93.234 6.27 224
  • Add SAM2 (HieraDet) backbone arch & weight loading support

  • Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k

model top1 top5 param_count
hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k 84.912 97.260 35.01
hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k 84.560 97.106 35.01

Aug 8, 2024

Release v1.0.8

29 Jul 05:18
Compare
Choose a tag to compare

July 28, 2024

  • Add mobilenet_edgetpu_v2_m weights w/ ra4 mnv4-small based recipe. 80.1% top-1 @ 224 and 80.7 @ 256.
  • Release 1.0.8

July 26, 2024

  • More MobileNet-v4 weights, ImageNet-12k pretrain w/ fine-tunes, and anti-aliased ConvLarge models
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.99 15.01 97.294 2.706 32.59 544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.772 15.228 97.344 2.656 32.59 480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.64 15.36 97.114 2.886 32.59 448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.314 15.686 97.102 2.898 32.59 384
mobilenetv4_conv_aa_large.e600_r384_in1k 83.824 16.176 96.734 3.266 32.59 480
mobilenetv4_conv_aa_large.e600_r384_in1k 83.244 16.756 96.392 3.608 32.59 384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.99 17.01 96.67 3.33 11.07 320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.364 17.636 96.256 3.744 11.07 256
model top1 top1_err top5 top5_err param_count img_size
efficientnet_b0.ra4_e3600_r224_in1k 79.364 20.636 94.754 5.246 5.29 256
efficientnet_b0.ra4_e3600_r224_in1k 78.584 21.416 94.338 5.662 5.29 224
mobilenetv1_100h.ra4_e3600_r224_in1k 76.596 23.404 93.272 6.728 5.28 256
mobilenetv1_100.ra4_e3600_r224_in1k 76.094 23.906 93.004 6.996 4.23 256
mobilenetv1_100h.ra4_e3600_r224_in1k 75.662 24.338 92.504 7.496 5.28 224
mobilenetv1_100.ra4_e3600_r224_in1k 75.382 24.618 92.312 7.688 4.23 224
  • Prototype of set_input_size() added to vit and swin v1/v2 models to allow changing image size, patch size, window size after model creation.
  • Improved support in swin for different size handling, in addition to set_input_size, always_partition and strict_img_size args have been added to __init__ to allow more flexible input size constraints
  • Fix out of order indices info for intermediate 'Getter' feature wrapper, check out or range indices for same.
  • Add several tiny < .5M param models for testing that are actually trained on ImageNet-1k
model top1 top1_err top5 top5_err param_count img_size crop_pct
test_efficientnet.r160_in1k 47.156 52.844 71.726 28.274 0.36 192 1.0
test_byobnet.r160_in1k 46.698 53.302 71.674 28.326 0.46 192 1.0
test_efficientnet.r160_in1k 46.426 53.574 70.928 29.072 0.36 160 0.875
test_byobnet.r160_in1k 45.378 54.622 70.572 29.428 0.46 160 0.875
test_vit.r160_in1k 42.0 58.0 68.664 31.336 0.37 192 1.0
test_vit.r160_in1k 40.822 59.178 67.212 32.788 0.37 160 0.875
  • Fix vit reg token init, thanks Promisery
  • Other misc fixes

June 24, 2024

  • 3 more MobileNetV4 hyrid weights with different MQA weight init scheme
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_hybrid_large.ix_e600_r384_in1k 84.356 15.644 96.892 3.108 37.76 448
mobilenetv4_hybrid_large.ix_e600_r384_in1k 83.990 16.010 96.702 3.298 37.76 384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 83.394 16.606 96.760 3.240 11.07 448
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 82.968 17.032 96.474 3.526 11.07 384
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 82.492 17.508 96.278 3.722 11.07 320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 81.446 18.554 95.704 4.296 11.07 256
  • florence2 weight loading in DaViT model

Release v1.0.7

19 Jun 06:52
Compare
Choose a tag to compare

June 12, 2024

  • MobileNetV4 models and initial set of timm trained weights added:
model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_hybrid_large.e600_r384_in1k 84.266 15.734 96.936 3.064 37.76 448
mobilenetv4_hybrid_large.e600_r384_in1k 83.800 16.200 96.770 3.230 37.76 384
mobilenetv4_conv_large.e600_r384_in1k 83.392 16.608 96.622 3.378 32.59 448
mobilenetv4_conv_large.e600_r384_in1k 82.952 17.048 96.266 3.734 32.59 384
mobilenetv4_conv_large.e500_r256_in1k 82.674 17.326 96.31 3.69 32.59 320
mobilenetv4_conv_large.e500_r256_in1k 81.862 18.138 95.69 4.31 32.59 256
mobilenetv4_hybrid_medium.e500_r224_in1k 81.276 18.724 95.742 4.258 11.07 256
mobilenetv4_conv_medium.e500_r256_in1k 80.858 19.142 95.768 4.232 9.72 320
mobilenetv4_hybrid_medium.e500_r224_in1k 80.442 19.558 95.38 4.62 11.07 224
mobilenetv4_conv_blur_medium.e500_r224_in1k 80.142 19.858 95.298 4.702 9.72 256
mobilenetv4_conv_medium.e500_r256_in1k 79.928 20.072 95.184 4.816 9.72 256
mobilenetv4_conv_medium.e500_r224_in1k 79.808 20.192 95.186 4.814 9.72 256
mobilenetv4_conv_blur_medium.e500_r224_in1k 79.438 20.562 94.932 5.068 9.72 224
mobilenetv4_conv_medium.e500_r224_in1k 79.094 20.906 94.77 5.23 9.72 224
mobilenetv4_conv_small.e2400_r224_in1k 74.616 25.384 92.072 7.928 3.77 256
mobilenetv4_conv_small.e1200_r224_in1k 74.292 25.708 92.116 7.884 3.77 256
mobilenetv4_conv_small.e2400_r224_in1k 73.756 26.244 91.422 8.578 3.77 224
mobilenetv4_conv_small.e1200_r224_in1k 73.454 26.546 91.34 8.66 3.77 224
  • Apple MobileCLIP (https://arxiv.org/pdf/2311.17049, FastViT and ViT-B) image tower model support & weights added (part of OpenCLIP support).
  • ViTamin (https://arxiv.org/abs/2404.02132) CLIP image tower model & weights added (part of OpenCLIP support).
  • OpenAI CLIP Modified ResNet image tower modelling & weight support (via ByobNet). Refactor AttentionPool2d.
  • Refactoring & improvements, especially related to classifier_reset and num_features vs head_hidden_size for forward_features() vs pre_logits

Release v1.0.3

15 May 18:19
Compare
Choose a tag to compare

May 14, 2024

  • Support loading PaliGemma jax weights into SigLIP ViT models with average pooling.
  • Add Hiera models from Meta (https://github.com/facebookresearch/hiera).
  • Add normalize= flag for transorms, return non-normalized torch.Tensor with original dytpe (for chug)
  • Version 1.0.3 release

May 11, 2024

  • Searching for Better ViT Baselines (For the GPU Poor) weights and vit variants released. Exploring model shapes between Tiny and Base.
model top1 top5 param_count img_size
vit_mediumd_patch16_reg4_gap_256.sbb_in12k_ft_in1k 86.202 97.874 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k_ft_in1k 85.418 97.48 60.4 256
vit_mediumd_patch16_rope_reg1_gap_256.sbb_in1k 84.322 96.812 63.95 256
vit_betwixt_patch16_rope_reg4_gap_256.sbb_in1k 83.906 96.684 60.23 256
vit_base_patch16_rope_reg1_gap_256.sbb_in1k 83.866 96.67 86.43 256
vit_medium_patch16_rope_reg1_gap_256.sbb_in1k 83.81 96.824 38.74 256
vit_betwixt_patch16_reg4_gap_256.sbb_in1k 83.706 96.616 60.4 256
vit_betwixt_patch16_reg1_gap_256.sbb_in1k 83.628 96.544 60.4 256
vit_medium_patch16_reg4_gap_256.sbb_in1k 83.47 96.622 38.88 256
vit_medium_patch16_reg1_gap_256.sbb_in1k 83.462 96.548 38.88 256
vit_little_patch16_reg4_gap_256.sbb_in1k 82.514 96.262 22.52 256
vit_wee_patch16_reg1_gap_256.sbb_in1k 80.256 95.360 13.42 256
vit_pwee_patch16_reg1_gap_256.sbb_in1k 80.072 95.136 15.25 256
vit_mediumd_patch16_reg4_gap_256.sbb_in12k N/A N/A 64.11 256
vit_betwixt_patch16_reg4_gap_256.sbb_in12k N/A N/A 60.4 256
  • AttentionExtract helper added to extract attention maps from timm models. See example in #1232 (comment)
  • forward_intermediates() API refined and added to more models including some ConvNets that have other extraction methods.
  • 1017 of 1047 model architectures support features_only=True feature extraction. Remaining 34 architectures can be supported but based on priority requests.
  • Remove torch.jit.script annotated functions including old JIT activations. Conflict with dynamo and dynamo does a much better job when used.

April 11, 2024

  • Prepping for a long overdue 1.0 release, things have been stable for a while now.
  • Significant feature that's been missing for a while, features_only=True support for ViT models with flat hidden states or non-std module layouts (so far covering 'vit_*', 'twins_*', 'deit*', 'beit*', 'mvitv2*', 'eva*', 'samvit_*', 'flexivit*')
  • Above feature support achieved through a new forward_intermediates() API that can be used with a feature wrapping module or direclty.
model = timm.create_model('vit_base_patch16_224')
final_feat, intermediates = model.forward_intermediates(input) 
output = model.forward_head(final_feat)  # pooling + classifier head

print(final_feat.shape)
torch.Size([2, 197, 768])

for f in intermediates:
    print(f.shape)
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])
torch.Size([2, 768, 14, 14])

print(output.shape)
torch.Size([2, 1000])
model = timm.create_model('eva02_base_patch16_clip_224', pretrained=True, img_size=512, features_only=True, out_indices=(-3, -2,))
output = model(torch.randn(2, 3, 512, 512))

for o in output:    
    print(o.shape)   
torch.Size([2, 768, 32, 32])
torch.Size([2, 768, 32, 32])
  • TinyCLIP vision tower weights added, thx Thien Tran
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy