Content-Length: 271406 | pFad | https://github.com/huggingface/datasets/issues/7526

65 Faster downloads/uploads with Xet storage · Issue #7526 · huggingface/datasets · GitHub
Skip to content

Faster downloads/uploads with Xet storage #7526

Open
@lhoestq

Description

@lhoestq

Image

Xet is out !

Over the past few weeks, Hugging Face’s Xet Team took a major step forward by migrating the first Model and Dataset repositories off LFS and to Xet storage.

See more information on the HF blog: https://huggingface.co/blog/xet-on-the-hub

You can already enable Xet on Hugging Face account to benefit from faster downloads and uploads :)

We finalized an official integration with the huggingface_hub library that means you get the benefits of Xet without any significant changes to your current workflow.

Previous versions of datasets

For older versions of datasets you might see this warning in push_to_hub():

Uploading files as bytes or binary IO objects is not supported by Xet Storage.

This means the huggingface_hub + Xet integration isn't enabled for your version of datasets.

You can fix this by updating to datasets>=3.6.0 and huggingface_hub>=0.31.0

pip install -U datasets huggingface_hub

The future

Stay tuned for more Xet optimizations, especially on Xet-optimized Parquet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions









      ApplySandwichStrip

      pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


      --- a PPN by Garber Painting Akron. With Image Size Reduction included!

      Fetched URL: https://github.com/huggingface/datasets/issues/7526

      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy