Content-Length: 213053 | pFad | http://github.com/intel/neural-compressor/issues/1972

93 Quantization failed · Issue #1972 · intel/neural-compressor · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization failed #1972

Open
endomorphosis opened this issue Aug 11, 2024 · 1 comment
Open

Quantization failed #1972

endomorphosis opened this issue Aug 11, 2024 · 1 comment
Assignees

Comments

@endomorphosis
Copy link

https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only

bash run_quant.sh --input_model=./Meta-Llama-3.1-8B --output_model=./Meta-Llama-3.1-8B_AWQ --batch_size=1 --dataset=NeelNanda/pile-10k --tokenizer=meta-llama/Meta-Llama-3.1-8B --algorithm=AWQ

/usr/local/lib/python3.10/dist-packages/diffusers/models/vq_model.py:20: FutureWarning: `VQEncoderOutput` is deprecated and will be removed in version 0.31. Importing `VQEncoderOutput` from `diffusers.models.vq_model` is deprecated and this will be removed in a future version. Please use `from diffusers.models.autoencoders.vq_model import VQEncoderOutput`, instead.
 deprecate("VQEncoderOutput", "0.31", deprecation_message)
/usr/local/lib/python3.10/dist-packages/diffusers/models/vq_model.py:25: FutureWarning: `VQModel` is deprecated and will be removed in version 0.31. Importing `VQModel` from `diffusers.models.vq_model` is deprecated and this will be removed in a future version. Please use `from diffusers.models.autoencoders.vq_model import VQModel`, instead.
 deprecate("VQModel", "0.31", deprecation_message)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'.
The class this function is called from is 'LlamaTokenizer'.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message
Traceback (most recent call last):
 File "/root/neural-compressor/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only/main.py", line 118, in <module>
   tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer)
 File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
   return cls._from_pretrained(
 File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
   tokenizer = cls(*init_inputs, **init_kwargs)
 File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 171, in __init__
   self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
 File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
   tokenizer.Load(self.vocab_file)
 File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
   return self.LoadFromFile(model_file)
 File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
   return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
@mengniwang95
Copy link
Contributor

Hi @endomorphosis , which version of transformers are you using? Could you please try the latest version 4.44.0 for transformers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/intel/neural-compressor/issues/1972

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy