-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Closed
Labels
module: pt2 accuracyoncall: pt2pt2-pass-rate-regressionTrack regression of PT2 dashboard pass rateTrack regression of PT2 dashboard pass ratetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
I could repro locally.
time python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only DebertaForQuestionAnswering --training
The error message:
E0329 13:56:13.126000 139908396307456 torch/_dynamo/utils.py:1405] RMSE (res-fp64): 0.01772, (ref-fp64): 0.00543 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
indicate that increasing tolerance to 0.02 can make the test pass.
But on the other hand, dashboard run uses a different cuda version (12.1. Local server uses 12.0) and cause different numerical results. According to the error message on the dashboard run
2024-03-28T19:23:00.3165713Z E0328 19:23:00.315000 140426862847168 torch/_dynamo/utils.py:1405] RMSE (res-fp64): 0.01778, (ref-fp64): 0.00506 and shape=torch.Size([2]). res.dtype: torch.float32, multiplier: 3.000000, tol: 0.010000
we need tolerance 0.03 to pass the test.
I've also done ablation test. I'm on commit 57a9a64 (clean trunk) and then revert #122848, #122841, #121692 in that order, than the test pass again locally.
nanogpt fail for the same reasons.
Update: DebertaForQuestionAnswering starts passing the accurate test from June14 2024.
Error logs
.
Minified repro
No response
Versions
.
cc @ezyang @chauhang @penguinwu @msaroufim @bdhirsh @anijain2305 @zou3519 @Chillee
Metadata
Metadata
Assignees
Labels
module: pt2 accuracyoncall: pt2pt2-pass-rate-regressionTrack regression of PT2 dashboard pass rateTrack regression of PT2 dashboard pass ratetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module