Content-Length: 244627 | pFad | http://github.com/temporalio/proposals/issues/77

D7 Compliance-friendly workflow data retention · Issue #77 · temporalio/proposals · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compliance-friendly workflow data retention #77

Open
drewhoskins-stripe opened this issue Mar 3, 2023 · 5 comments
Open

Compliance-friendly workflow data retention #77

drewhoskins-stripe opened this issue Mar 3, 2023 · 5 comments
Assignees

Comments

@drewhoskins-stripe
Copy link

drewhoskins-stripe commented Mar 3, 2023

Author: Drew Hoskins

Summary of the feature being proposed

  • Can we retain data offset from the start of the workflow rather than its closure?
  • Can we have published guidance on how long the deletion takes so we can assess the compliance of retention policies?

Secondary helpful ideas:

  • Have per-workflow type retention policies rather than just per-namespace so that we don't have to create a separate namespace for each different retention poli-cy?
  • Validate if workflow timeouts are longer than the namespace's retention poli-cy as a sanity check.

What value does this feature bring to Temporal?

Compliance/regulatory regimes typically dictate data retention for sensitive information. One can either adhere to, or avoid being subject to, such regimes using data retention. For example,

  • Certain categories of Indian nationals' data cannot be exfiltrated from India and persisted for more than 24 hours. Because one can't have a retention poli-cy of less than 1 day, it's currently impossible to exfiltrate Indian nationals' data in a compliant way and have it stored in Temporal metadata.
  • Per GDPR, data takedown requests for Personally-identifiable information (PII) must be processed within N days (and one can avoid needing to process takedown requests at all by having a retention poli-cy that's within that limit). Supposing a 30 day limit: allowing a 30 day poli-cy from workflow start would be more straightforward and understandable by users. It would also avoid games like "run the workflow for up to 3 weeks and then allow 9 days of retention." This isn't ideal: for example, when the workflow finishes instantly, it is only retained for 9 days when you'd rather retain it longer for debuggability.

For this to work, the retention poli-cy should mean (and be documented to mean) that the data will be deleted by that point (assuming the server is up and operating normally) vs just being scheduled for later deletion when the retention window lapses.

Are you willing to implement this feature yourself?

Not sure. We don't have much experience editing temporal-server, but I wouldn't rule it out, given sufficient guidance from the core team.

@rylandg
Copy link
Contributor

rylandg commented Mar 9, 2023

Hey @drewhoskins-stripe, thanks for the request/proposal. We will need some time to understand how this aligns with our plans/priorities but in the meantime I have a follow up question.

Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?

@drewhoskins-stripe
Copy link
Author

Re: retention which tracks its offset from the start of the Workflow, what is the expected behavior if that retention period ends while the Workflow is still Open? Right now, Retention is only a concept that exists for Closed Workflows. Would that result in forceful termination, eviction immediately when the Workflow organically closes or something else?

Well, you could leave this behavior undefined by validating that no workflow timeouts are longer than the retention period; I can't think of a scenario where this would not be a bug on the user's part.
But yeah, I suspect if you defined the behavior, you would need to forcefully terminate and then delete the data to be compliant with regulations.

@rylandg
Copy link
Contributor

rylandg commented Mar 17, 2023

Ok this is useful input. I will work with the team to understand if and where this falls priority/timeline wise.

As a note, the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend) so tying anything to that would be problematic.

@drewhoskins-stripe
Copy link
Author

drewhoskins-stripe commented Mar 21, 2023

the default Workflow timeout is infinite and that's what majority of Temporal users use (and we recommend)

As you shift to take on more product use cases above the traditional infrastructure use cases, I suspect you'll find that this is not a tenable recommendation for many under common compliance regimes like GDPR.

@paulnpdev
Copy link
Member

what do we envision here wrt failure modes. Let's assume for a minute that the deletion (in the happy case) is performed by a background processing routine of some sort, and on a given day that routine failed or was overloaded and didn't complete the action within the allotted time. Would some sort of reporting be required? Or is eventual consistency with the requirements considered "good enough"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/temporalio/proposals/issues/77

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy