Skip to content

[24.2] Introduce Weighted Fair Share ThreadPool IO engine #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: stable/24.2
Choose a base branch
from

Conversation

Krock21
Copy link
Member

@Krock21 Krock21 commented Feb 14, 2025

First PR to 24.1: #877
Second PR to 23.2: #995

Introduce Weighted Fair Share ThreadPool IO engine

This PR introduces a new IOEngine: WeightedFairShareThreadPool. Using this engine improves fairness of disk usage on our clusters

This PR consists of 3 main parts:

  1. Load coloring
    • IOEngine methods now accept workloadDescriptor and sessionId that are used for fair io
    • Codebase was adjusted to pass these parameters. They are mandatory, so nothing was missed
  2. Weighted fair queue and threadpool over it
    • Time-based WFQ already exists in yt/yt/core/concurrency/new_fair_share_thread_pool.cpp
    • It was copied to yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp with small adjustments
      • Now fairness is not time-based, but expected bytes-based
      • Now buckets can have weights
  3. WeightedFairShareThreadPool IOEngine
    • Uses new fair_share_weighted_thread_pool to execute read and write operations and balances them on expected disk load measured in bytes

This PR is on stable/24.2 for testing purposes. After getting LGTM it will be rebased to main


  • Changelog entry
    Type: feature
    Component: map-reduce

Introduce Weighted Fair Share ThreadPool IO engine

@Krock21 Krock21 added the mapreduce MapReduce related label Feb 14, 2025
@Krock21 Krock21 requested a review from don-dron February 14, 2025 18:45
, public TBucketBase
{
public:
TBucket(TString bucketName, TString poolName, TBucketMappingPtr parent, double bucketWeight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of copy paste from another class here. Why can't the current class be patched? It seems the current class will be able to exist with equal weights by default. What problems do you have when trying to change the current class?

Copy link
Member Author

@Krock21 Krock21 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is diff between yt/yt/core/concurrency/new_fair_share_thread_pool.h and yt/yt/core/concurrency/fair_share_weighted_thread_pool.h: https://www.diffchecker.com/0O4WjFgS/
Here is diff between yt/yt/core/concurrency/new_fair_share_thread_pool.cpp and yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp: https://www.diffchecker.com/O4n2QGrz/

It seems the current class will be able to exist with equal weights by default.

No
new_fair_share_thread_pool uses cpu time as a "size" of the operation. My version uses expected bytes as a "size" of the operation

What problems do you have when trying to change the current class?

  1. Changing new_fair_share_thread_pool will lead to a class that has 2 modes of work, and it should be specified at creation. This will lead to a code that is hard to understand and modify
    • balance based on cpu time spent
    • balance based on passed operation size
    • When using the class, a developer will need to remember what mode was chosen, as they will face 2 different methods for 2 different modes of work

I think it is possible to change new_fair_share_thread_pool to work with 2 modes. Public interface will have GetInvoker and GetInvokerWithExpectedBytes. Private part will have 2 types of methods when needed: for cpu and size balancing

Do you want me to do it?

I also can't find it in main. Seems like it was renamed or removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to put the weighing mechanism in a separate interface and transfer it when creating it? Should different methods for weight propagation (or cpu or weight) be encapsulated in implementations within this mechanism?

I stopped understanding a bit, it seemed to me that weight is a renormalization of cpu time in some way. I just don't understand why the weight of the request and the CPU spent on it are fundamentally different "resources".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, in general, this scheduler does not imply such changes? Perhaps this generally does not fit into the action scheduler model?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to put the weighing mechanism in a separate interface and transfer it when creating it? Should different methods for weight propagation (or cpu or weight) be encapsulated in implementations within this mechanism?

Maybe. It should be able to update ExcessTime at different points in time:

  1. When action is created
  2. When action started executing, knowing when it was stopped
  3. Every time action is stopped executing, knowing when it was started
  4. When action is finished, knowing everything

I stopped understanding a bit, it seemed to me that weight is a renormalization of cpu time in some way. I just don't understand why the weight of the request and the CPU spent on it are fundamentally different "resources".

Current new_fair_share_thread_pool is written like this. It assumes that size of the action is cpu time spent executing it
Adjusting it will require some class from user that can tell when and how "size" of the action should be computed, like I wrote above

Perhaps, in general, this scheduler does not imply such changes? Perhaps this generally does not fit into the action scheduler model?

I don't know what you mean. It's interface is simply GetInvoker() that can submit functions to execute. It is in implementation details how it balances them. In this case it is based on cpu time.

It may be possible to rewrite it to have 2 modes of execution (cpu time and user-provided size), but the current interface does not allow passing size of an action. Like this:

IFairShareWeightedThreadPoolPtr CreateFairShareWeightedThreadPool(
    int threadCount,
    const TString& threadNamePrefix,
    EBalancingMode mode, // cpuTime, size
    const TFairShareWeightedThreadPoolOptions& options = {})
IInvoker GetInvoker(pool, tag, tagWeight) // works only when mode = cpuTime, throws otherwise
IInvokerWithExpectedSize GetInvokerWithExpectedSize(pool, tag, tagWeight) // works only when mode = size, throws otherwise

But this is stange and it seems that it should be 2 different classes, to avoid bugs and misunderstanding and simplify development

TSessionId sessionId,
bool useDedicatedAllocations) override
{
std::vector<TFuture<void>> futures;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class has a lot of copy paste, why couldn't the current class be inherited or extended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inherited already

class TFairIOEngine
    : public TIOEngineBase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5cd5eb5ba2cae9ed490ea601e19c92f79ce7e294#diff-bcbd2c7eb3279fdba98b81410a2aa19801b2ed8d903c1ca88d7edb1df784a1f0R394

this line clearly contains code that was written elsewhere - here is TThreadPoolIOEngine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. To change method's behaviour I need to completely rewrite it
I copypaste the whole method and then add my small logic on lines 396 and 405

i64 expectedBytes = Config_.Acquire()->AdditiveCostOfOperationInBytes + slice.Request.Size;
...
.AsyncVia(New<TWrappedInvokerWithExpectedBytes>(invoker, expectedBytes))

Same for other methods

Copy link
Contributor

@don-dron don-dron Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but this still doesn't answer the question of why you had to copy the whole class, especially including the contents from pwrite/pread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but this still doesn't answer the question of why you had to copy the whole class, especially including the contents from pwrite/pread.

I don't copy the whole class, I copy methods I need to change

For DoWrite I change flushing logic a bit, but I see that we have DoWriteImpl that I can use too. I will use it

DoRead is actually the same as in io_engine.cpp. This is probably a leftover from earlier versions/experiments. I will remove it

I will check other methods to see if I can remove them in favor of inherited ones

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the problem is not only in this place, for example, buffer allocation.
Screenshot 2025-03-05 at 18 21 42

https://www.diffchecker.com/vA1mfaGS/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised why I copied them

My class inherits TIOEngineBase, which does not have read and write implementations

And TThreadPoolIOEngine (in io_engine.cpp) is not realy written to be extended. These private methods use a private Config_ variable

I will try to make it extendable by moving methods to Protected and making them use GetConfig instead of Config_, but I don't like this chain, it is more like a hack to avoid copypasting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is implemented like this
Untitled Diagram drawio
In this model Fair IO Engine have to reimplement read and write methods

Other options:

  1. Adjust TThreadPoolIOEngine to work with both regular and specific threadpools, based on some flag.
    • Seems like a god class which does 2 separate things
  2. Extract Read/Write implementations into 2nd base class TIOEngineBaseForReadAndWrite that implements them. Inherit it in both TThreadPoolIOEngine and TFairIOEngine. Submit these actions to regular threadpools in TThreadPoolIOEngine and to specific threadpools in TFairIOEngine

Option 2 seems good, but requires refactoring. I am not sure it is good to do this in this PR(which is already big and does many things). I suggest doing it separately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a PR for 2nd option: #1117

We can either merge it and then it is possible to avoid copypaste on rebase, or we can live with copypaste

@@ -10,6 +10,8 @@ message TWorkloadDescriptor
required int32 band = 2;
optional int64 instant = 3;
repeated string annotations = 4;
optional string disk_fair_share_bucket_tag = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that the tag-weight pair is a fairly general description of a consumer? Perhaps it's worth expanding the interface and making at least a set of tags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that the tag-weight pair is a fairly general description of a consumer?

Yes. We need to group load somehow and make fairness between groups of loads. Every request can be in one group only

Perhaps it's worth expanding the interface and making at least a set of tags?

Not clear how will it work with set of tags and whether it will work expected at all
It this division should happen, a load should have a set of WorkloadDescriptors, not tags. Because different tags will have separate weights, categories, etc.

@Krock21 Krock21 force-pushed the weighted-fair-share-threadpool-io-engine-1-stable-24-2 branch from 5cd5eb5 to c8717a5 Compare March 6, 2025 13:54
@Krock21
Copy link
Member Author

Krock21 commented Mar 6, 2025

Changed ExpectedBytes to ExpectedSize and added .With[Field] methods to TWorkloadDescriptor

@@ -0,0 +1,1432 @@
/*
This is a copy of new_fair_share_thread_pool.cpp with small changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we won't be able to accept this PR without significant refactoring.

There should be very strong concerns to add to core library a new class, with a 1500+ lines of implementation, which is merely a copy-paste with minor changes, as you admit. And there should be a clear plan on eliminating this duplicated code.

And frankly speaking, I don't understand why this new functionality is not just an evolution and generalization of the old class, with customized weights and costs.

It seems to me that such a generalization could have been a separate PR on its own, extending and improving the core library, with a proper unittest. And on top of that you could implement new IO engine.

Given all that, I'm afraid this PR is doomed :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mapreduce MapReduce related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy