-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(akkaPekko): Retry mechanism for ParquetPartitioningFlow #351
Comments
Hi! |
Hi! |
I am sorry, I meant the RestartFlow, which restarts the flow on error (or RestartSource/RestartSink) Then, on the restart, the flow is rebuilt, and all connections to Hadoop are reopened. You might have more experience with such IO errors. What do you say?I am sorry, I meant the RestartFlow, which restarts the flow on error (or RestartSource/RestartSink) Then, on the restart, the flow is rebuilt, and all connections to Hadoop are reopened. You might have more experience with such IO errors. What do you say? |
Firstly, I think if something can throw an exception it's always better to assume it definitely will 😄. For example, we can never be sure if the bucket we are writing to just decides to shut itself down only to come back online after a minute or two. So, I had to do some digging on RestartFlow, and found:
Using RestartFlow can cause elements to be discarded which defeats the purpose of having this function in the first place. I believe this would be useful when, for example, an exception that is impossible to handle is thrown in the Flow, so the Flow has to be restarted even though it means compromising completeness. Thus, I think it's still best if failed writes and exceptions are handled within ParquetPartitioningFlow. What you said about re-opening the writer does make sense so I'll refactor the PR to accommodate that and think a bit more about how to handle this case. Let me know what you think :) |
Hi,
In the akkaPekko module, I noticed that there is no handling mechanism in ParquetPartitioningFlow when calling
write
from the hadoop ParquetWriter which might throw an IOException. I thought it would also be neat to have a retrial mechanism in place for failed records so I created #350. I think this is something that would be nice to have, let me know if you agree.The text was updated successfully, but these errors were encountered: