[24.2] Introduce Weighted Fair Share ThreadPool IO engine #1078

Krock21 · 2025-02-14T18:45:44Z

First PR to 24.1: #877
Second PR to 23.2: #995

Introduce Weighted Fair Share ThreadPool IO engine

This PR introduces a new IOEngine: WeightedFairShareThreadPool. Using this engine improves fairness of disk usage on our clusters

This PR consists of 3 main parts:

Load coloring
- IOEngine methods now accept workloadDescriptor and sessionId that are used for fair io
- Codebase was adjusted to pass these parameters. They are mandatory, so nothing was missed
Weighted fair queue and threadpool over it
- Time-based WFQ already exists in yt/yt/core/concurrency/new_fair_share_thread_pool.cpp
- It was copied to yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp with small adjustments
  - Now fairness is not time-based, but expected bytes-based
  - Now buckets can have weights
WeightedFairShareThreadPool IOEngine
- Uses new fair_share_weighted_thread_pool to execute read and write operations and balances them on expected disk load measured in bytes

This PR is on stable/24.2 for testing purposes. After getting LGTM it will be rebased to main

Changelog entry
Type: feature
Component: map-reduce

Introduce Weighted Fair Share ThreadPool IO engine

don-dron · 2025-03-05T11:35:02Z

yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp

+    , public TBucketBase
+{
+public:
+    TBucket(TString bucketName, TString poolName, TBucketMappingPtr parent, double bucketWeight)


I see a lot of copy paste from another class here. Why can't the current class be patched? It seems the current class will be able to exist with equal weights by default. What problems do you have when trying to change the current class?

Here is diff between yt/yt/core/concurrency/new_fair_share_thread_pool.h and yt/yt/core/concurrency/fair_share_weighted_thread_pool.h: https://www.diffchecker.com/0O4WjFgS/
Here is diff between yt/yt/core/concurrency/new_fair_share_thread_pool.cpp and yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp: https://www.diffchecker.com/O4n2QGrz/

It seems the current class will be able to exist with equal weights by default.

No
new_fair_share_thread_pool uses cpu time as a "size" of the operation. My version uses expected bytes as a "size" of the operation

What problems do you have when trying to change the current class?

Changing new_fair_share_thread_pool will lead to a class that has 2 modes of work, and it should be specified at creation. This will lead to a code that is hard to understand and modify

balance based on cpu time spent

balance based on passed operation size

When using the class, a developer will need to remember what mode was chosen, as they will face 2 different methods for 2 different modes of work

I think it is possible to change new_fair_share_thread_pool to work with 2 modes. Public interface will have GetInvoker and GetInvokerWithExpectedBytes. Private part will have 2 types of methods when needed: for cpu and size balancing

Do you want me to do it?

I also can't find it in main. Seems like it was renamed or removed

is it possible to put the weighing mechanism in a separate interface and transfer it when creating it? Should different methods for weight propagation (or cpu or weight) be encapsulated in implementations within this mechanism?

I stopped understanding a bit, it seemed to me that weight is a renormalization of cpu time in some way. I just don't understand why the weight of the request and the CPU spent on it are fundamentally different "resources".

Perhaps, in general, this scheduler does not imply such changes? Perhaps this generally does not fit into the action scheduler model?

is it possible to put the weighing mechanism in a separate interface and transfer it when creating it? Should different methods for weight propagation (or cpu or weight) be encapsulated in implementations within this mechanism?

Maybe. It should be able to update ExcessTime at different points in time:

When action is created

When action started executing, knowing when it was stopped

Every time action is stopped executing, knowing when it was started

When action is finished, knowing everything

I stopped understanding a bit, it seemed to me that weight is a renormalization of cpu time in some way. I just don't understand why the weight of the request and the CPU spent on it are fundamentally different "resources".

Current new_fair_share_thread_pool is written like this. It assumes that size of the action is cpu time spent executing it
Adjusting it will require some class from user that can tell when and how "size" of the action should be computed, like I wrote above

Perhaps, in general, this scheduler does not imply such changes? Perhaps this generally does not fit into the action scheduler model?

I don't know what you mean. It's interface is simply GetInvoker() that can submit functions to execute. It is in implementation details how it balances them. In this case it is based on cpu time.

It may be possible to rewrite it to have 2 modes of execution (cpu time and user-provided size), but the current interface does not allow passing size of an action. Like this:

IFairShareWeightedThreadPoolPtr CreateFairShareWeightedThreadPool( int threadCount, const TString& threadNamePrefix, EBalancingMode mode, // cpuTime, size const TFairShareWeightedThreadPoolOptions& options = {}) IInvoker GetInvoker(pool, tag, tagWeight) // works only when mode = cpuTime, throws otherwise IInvokerWithExpectedSize GetInvokerWithExpectedSize(pool, tag, tagWeight) // works only when mode = size, throws otherwise

But this is stange and it seems that it should be 2 different classes, to avoid bugs and misunderstanding and simplify development

yt/yt/core/concurrency/fair_share_weighted_thread_pool.h

yt/yt/server/lib/hydra/file_changelog_index.cpp

yt/yt/server/lib/io/io_engine.h

don-dron · 2025-03-05T11:43:10Z

yt/yt/server/lib/io/io_engine_fair.cpp

+        TSessionId sessionId,
+        bool useDedicatedAllocations) override
+    {
+        std::vector<TFuture<void>> futures;


This class has a lot of copy paste, why couldn't the current class be inherited or extended?

It is inherited already

class TFairIOEngine : public TIOEngineBase

5cd5eb5ba2cae9ed490ea601e19c92f79ce7e294#diff-bcbd2c7eb3279fdba98b81410a2aa19801b2ed8d903c1ca88d7edb1df784a1f0R394

this line clearly contains code that was written elsewhere - here is TThreadPoolIOEngine

Yes. To change method's behaviour I need to completely rewrite it
I copypaste the whole method and then add my small logic on lines 396 and 405

i64 expectedBytes = Config_.Acquire()->AdditiveCostOfOperationInBytes + slice.Request.Size; ... .AsyncVia(New<TWrappedInvokerWithExpectedBytes>(invoker, expectedBytes))

Same for other methods

I'm sorry, but this still doesn't answer the question of why you had to copy the whole class, especially including the contents from pwrite/pread.

I'm sorry, but this still doesn't answer the question of why you had to copy the whole class, especially including the contents from pwrite/pread.

I don't copy the whole class, I copy methods I need to change

For DoWrite I change flushing logic a bit, but I see that we have DoWriteImpl that I can use too. I will use it

DoRead is actually the same as in io_engine.cpp. This is probably a leftover from earlier versions/experiments. I will remove it

I will check other methods to see if I can remove them in favor of inherited ones

It seems to me that the problem is not only in this place, for example, buffer allocation.

https://www.diffchecker.com/vA1mfaGS/

I realised why I copied them

My class inherits TIOEngineBase, which does not have read and write implementations

And TThreadPoolIOEngine (in io_engine.cpp) is not realy written to be extended. These private methods use a private Config_ variable

I will try to make it extendable by moving methods to Protected and making them use GetConfig instead of Config_, but I don't like this chain, it is more like a hack to avoid copypasting

Currently it is implemented like this

In this model Fair IO Engine have to reimplement read and write methods

Other options:

Adjust TThreadPoolIOEngine to work with both regular and specific threadpools, based on some flag.

Seems like a god class which does 2 separate things

Extract Read/Write implementations into 2nd base class TIOEngineBaseForReadAndWrite that implements them. Inherit it in both TThreadPoolIOEngine and TFairIOEngine. Submit these actions to regular threadpools in TThreadPoolIOEngine and to specific threadpools in TFairIOEngine

Option 2 seems good, but requires refactoring. I am not sure it is good to do this in this PR(which is already big and does many things). I suggest doing it separately

Here is a PR for 2nd option: #1117

We can either merge it and then it is possible to avoid copypaste on rebase, or we can live with copypaste

don-dron · 2025-03-05T11:47:20Z

yt/yt_proto/yt/client/misc/proto/workload.proto

@@ -10,6 +10,8 @@ message TWorkloadDescriptor
    required int32 band = 2;
    optional int64 instant = 3;
    repeated string annotations = 4;
+    optional string disk_fair_share_bucket_tag = 5;


Do you think that the tag-weight pair is a fairly general description of a consumer? Perhaps it's worth expanding the interface and making at least a set of tags?

Do you think that the tag-weight pair is a fairly general description of a consumer?

Yes. We need to group load somehow and make fairness between groups of loads. Every request can be in one group only

Perhaps it's worth expanding the interface and making at least a set of tags?

Not clear how will it work with set of tags and whether it will work expected at all
It this division should happen, a load should have a set of WorkloadDescriptors, not tags. Because different tags will have separate weights, categories, etc.

Krock21 · 2025-03-06T13:56:02Z

Changed ExpectedBytes to ExpectedSize and added .With[Field] methods to TWorkloadDescriptor

psushin · 2025-03-07T15:40:03Z

yt/yt/core/concurrency/fair_share_weighted_thread_pool.cpp

@@ -0,0 +1,1432 @@
+/*
+This is a copy of new_fair_share_thread_pool.cpp with small changes.


It seems to me that we won't be able to accept this PR without significant refactoring.

There should be very strong concerns to add to core library a new class, with a 1500+ lines of implementation, which is merely a copy-paste with minor changes, as you admit. And there should be a clear plan on eliminating this duplicated code.

And frankly speaking, I don't understand why this new functionality is not just an evolution and generalization of the old class, with customized weights and costs.

It seems to me that such a generalization could have been a separate PR on its own, extending and improving the core library, with a proper unittest. And on top of that you could implement new IO engine.

Given all that, I'm afraid this PR is doomed :/

Krock21 added the mapreduce MapReduce related label Feb 14, 2025

Krock21 requested a review from don-dron February 14, 2025 18:45

don-dron requested changes Mar 5, 2025

View reviewed changes

Introduce Weighted Fair Share ThreadPool IO engine

c8717a5

Krock21 force-pushed the weighted-fair-share-threadpool-io-engine-1-stable-24-2 branch from 5cd5eb5 to c8717a5 Compare March 6, 2025 13:54

Krock21 mentioned this pull request Mar 6, 2025

Extract implementation of read, write and flush methods to TIOEngineBaseCommon #1117

Closed

psushin reviewed Mar 7, 2025

View reviewed changes

		@@ -0,0 +1,1432 @@
		/*
		This is a copy of new_fair_share_thread_pool.cpp with small changes.

[24.2] Introduce Weighted Fair Share ThreadPool IO engine #1078

Are you sure you want to change the base?

[24.2] Introduce Weighted Fair Share ThreadPool IO engine #1078

Uh oh!

Conversation

Krock21 commented Feb 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Krock21 Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

don-dron Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Krock21 commented Mar 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Krock21 Mar 5, 2025 •

edited

Loading

don-dron Mar 5, 2025 •

edited

Loading