[XPU User Empathy Day][whisper][Arc770][Win]XPU performance is worse than CPU

### 🐛 Describe the bug

Try [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny/tree/main)
on one desktop machine with Arc 770 Windows 11
and find the performance of XPU is worse than CPU. which is not expected. 

reproduce step: 
1. install windows 11 on the target machine 
2. install XPU driver: https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html => the  version is 32.0.101.6734 WHQL Certified  4/8/2025
   after install and restart, please check your XPU working in task manager 
3. install python from https://www.python.org/downloads/release/python-31210/, using python 3.12.104.
4.  install pytorch environment:
open CMD,  run C:\Users\gta\AppData\Local\Programs\Python.exe
  python -m venv venv_py27_xpu
venv_py27_xpu\Scripts\activate
(venv_py27_xpu) C:\Users\gta>pip3 install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

python -c "import torch; print(torch.xpu.is_available())"

5. install whisper dependency 
pip install transformers
pip install datasets
pip install librosa

6. python test_whisper
run XPU performance is less than CPU  
```
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset

## comment the two line for CPU run
#device = "cpu"
device = torch.device("xpu" if torch.xpu.is_available() else "cpu")
print(f"Using device: {device}")

import  time
start_time = time.time()
print(f"Starting voice conversion at {time.strftime('%Y-%m-%d %H:%M:%S')}")

from torch.profiler import profile, ProfilerActivity

# load model and processor
print("\nLoading Whisper Model...")
whisper_start_time = time.time()

processor = WhisperProcessor.from_pretrained("openai/whisper-tiny")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny").to(device)
forced_decoder_ids = processor.get_decoder_prompt_ids(language="en", task="transcribe")

whisper_end_time = time.time()
print(f"Loading Whisper Model time taken: {whisper_end_time - whisper_start_time:.2f} seconds") 

# load dummy dataset and read audio files
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[2]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="pt",  
      return_attention_mask=True,  # Critical for reliable results  
    padding=True  # Required if batching multiple audios  
 ).input_features.to(device)

# generate token ids
print("\n Runing Whisper Model...")
whisper_start_time = time.time()
#with profile(activities=[ProfilerActivity.CPU,
#                       ProfilerActivity.XPU]) as prof:
predicted_ids = model.generate(input_features,forced_decoder_ids=forced_decoder_ids)
#print(prof.key_averages().table(sort_by="xpu_time_total"))

# decode token ids to text
#transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
whisper_end_time = time.time()
print(f"Runing Whisper Model time taken: {whisper_end_time - whisper_start_time:.2f} seconds") 

print(transcription) 

end_time = time.time()
print(f"Voice conversion completed at {time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Total time taken: {end_time - start_time:.2f} seconds") 
```

<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=OneNote.File>
<meta name=Generator content="Microsoft OneNote 15">
</head>

<body lang=en-US style='font-family:Calibri;font-size:11.0pt'>


<div style='direction:ltr'>


"openai/whisper-tiny | CPU (s) | XPU(s) | Torch.compile | 12s mp3
-- | -- | -- | -- | --
Load | 1.57 | 2.34 | 2.67 | hf-internal-testing/librispeech_asr_dummy   · Datasets at Hugging Face [2]
Run Generate | 0.57 | 3.58 | 3.58 |  
Total | 7.33 | 10.63 | 11.12 |  

### Versions

pip3 install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

cc @msaroufim @jerryzh168 @gujinghui @EikanWang @fengyuan14 @guangyey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU User Empathy Day][whisper][Arc770][Win]XPU performance is worse than CPU #151985

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

"openai/whisper-tiny	CPU (s)	XPU(s)	Torch.compile	12s mp3
Load	1.57	2.34	2.67	hf-internal-testing/librispeech_asr_dummy · Datasets at Hugging Face [2]
Run Generate	0.57	3.58	3.58
Total	7.33	10.63	11.12

[XPU User Empathy Day][whisper][Arc770][Win]XPU performance is worse than CPU #151985

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.