Skip to content

[torch.compile]: Enhanced Error Reporting and Performance Canary Mode #126644

@bhack

Description

@bhack

🚀 The feature, motivation and pitch

Background

Handling PyTorch compile issues and ensuring reproducibility on minimal isolated code is currently quite labor-intensive. This challenge impacts both:

  • Users and developers trying to isolate and reproduce errors.
  • Triagers or compiler team members working with third-party compiled code, especially for public OSS models.

The complexity increases significantly when compiling full models or high-level def functions in a chain. Often, a single error might be hidden within a chain of errors, complicating error reporting and resolution.

Proposal

  1. Enhanced Error Isolation and Reporting:

    • Isolate Failed Function:
      Implement a mechanism to exactly isolate the function where the compilation failed. This will allow users to report the specific function causing the issue without additional effort.
    • Record Fake Inputs:
      Automatically record fake inputs to facilitate error reproduction without the need for users to fully reproduce their dataset setup. This ensures that developers and triagers can recreate the issue reliably with minimal setup.
  2. Performance Canary Mode:

    • Store Baseline Info:
      Introduce a mode where running an uncompiled model stores baseline performance data (e.g., memory usage, speed) on disk.
    • Automatic Regression Detection:
      When running the compiled model, automatically compare current performance against the stored baseline. If there are regressions in memory usage or speed, users should be warned.
    • Simplified Reporting:
      In case of performance regressions, provide an easy and straightforward way for users to report these issues.

Benefits

  • For Users/Developers:
    • Simplifies the process of isolating and reporting compile errors.
    • Enhances reproducibility by automatically recording necessary inputs.
  • For Triagers/Compiler Team:
    • Provides clearer insights into the specific functions causing issues.
    • Facilitates quicker diagnosis and resolution of performance regressions.

/cc @chauhang @penguinwu @ezyang @msaroufim @bdhirsh @anijain2305

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureA request for a proper, new feature.no-scrubExclude from "scrubbing" exercises, e.g., for fundamental issues that don’t need periodic check-in.oncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy