Skip to content

Commit d377229

Browse files
authored
Translate Kuhn's Maximum Bipartite Matching algorithm article (#646)
1 parent 7bf681a commit d377229

File tree

2 files changed

+219
-0
lines changed

2 files changed

+219
-0
lines changed
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
<!--?title Kuhn's Algorithm for Maximum Bipartite Matching-->
2+
3+
# Kuhn's Algorithm for Maximum Bipartite Matching
4+
5+
## Problem
6+
You are given a bipartite graph $G$ containing $n$ vertices and $m$ edges. Find the maximum matching, i.e., select as many edges as possible so
7+
that no selected edge shares a vertex with any other selected edge.
8+
9+
## Algorithm Description
10+
11+
### Required Definitions
12+
13+
* A **matching** $M$ is a set of pairwise non-adjacent edges of a graph (in other words, no more than one edge from the set should be incident to any vertex of the graph $M$).
14+
The **cardinality** of a matching is the number of edges in it. The maximum (or largest) matching is a matching whose cardinality is maximum among all possible matchings
15+
in a given graph. All those vertices that have an adjacent edge from the matching (i.e., which have degree exactly one in the subgraph formed by $M$) are called **saturated**
16+
by this matching.
17+
18+
* A **path** of length $k$ here means a *simple* path (i.e. not containing repeated vertices or edges) containing $k$ edges, unless specified otherwise.
19+
20+
* An **alternating path** (in a bipartite graph, with respect to some matching) is a path in which the edges alternately belong / do not belong to the matching.
21+
22+
* An **augmenting path** (in a bipartite graph, with respect to some matching) is an alternating path whose initial and final vertices are unsaturated, i.e.,
23+
they do not belong in the matching.
24+
25+
* The **symmetric difference** (also known as the **disjunctive union**) of sets $A$ and $B$, represented by $A \oplus B$, is the set of all elements that belong to exactly one of $A$ or $B$, but not to both.
26+
That is, $A \oplus B = (A - B) \cup (B - A) = (A \cup B) - (A \cap B)$.
27+
28+
### Berge's lemma
29+
30+
This lemma was proven by the French mathematician **Claude Berge** in 1957, although it already was observed by the Danish mathematician **Julius Petersen** in 1891 and
31+
the Hungarian mathematician **Denés Kőnig** in 1931.
32+
33+
#### Formulation
34+
A matching $M$ is maximum $\Leftrightarrow$ there is no augmenting path relative to the matching $M$.
35+
36+
#### Proof
37+
38+
Both sides of the bi-implication will be proven by contradiction.
39+
40+
1. A matching $M$ is maximum $\Rightarrow$ there is no augmenting path relative to the matching $M$.
41+
42+
Let there be an augmenting path $P$ relative to the given maximum matching $M$. This augmenting path $P$ will necessarily be of odd length, having one more edge not in $M$ than the number of edges it has that are also in $M$.
43+
We create a new matching $M'$ by including all edges in the original matching $M$ except those also in the $P$, and the edges in $P$ that are not in $M$.
44+
This is a valid matching because the initial and final vertices of $P$ are unsaturated by $M$, and the rest of the vertices are saturated only by the matching $P \cap M$.
45+
This new matching $M'$ will have one more edge than $M$, and so $M$ could not have been maximum.
46+
47+
Formally, given an augmenting path $P$ w.r.t. some maximum matching $M$, the matching $M' = P \oplus M$ is such that $|M'| = |M| + 1$, a contradiction.
48+
49+
2. A matching $M$ is maximum $\Leftarrow$ there is no augmenting path relative to the matching $M$.
50+
51+
Let there be a matching $M'$ of greater cardinality than $M$. We consider the symmetric difference $Q = M \oplus M'$. The subgraph $Q$ is no longer necessarily a matching.
52+
Any vertex in $Q$ has a maximum degree of $2$, which means that all connected components in it are one of the three -
53+
* an isolated vertex
54+
* a (simple) path whose edges are alternately from $M$ and $M'$
55+
* a cycle of even length whose edges are alternately from $M$ and $M'$
56+
57+
Since $M'$ has a cardinality greater than $M$, $Q$ has more edges from $M'$ than $M$. By the Pigeonhole principle, at least one connected component will be a path having
58+
more edges from $M'$ than $M$. Because any such path is alternating, it will have initial and final vertices unsaturated by $M$, making it an augmenting path for $M$,
59+
which contradicts the premise. $\blacksquare$
60+
61+
### Kuhn's algorithm
62+
63+
Kuhn's algorithm is a direct application of Berge's lemma. It is essentially described as follows:
64+
> First, we take an empty matching. Then, while the algorithm is able to find an augmenting path, we update the matching by alternating it along this path and
65+
> repeat the process of finding the augmenting path. As soon as it is not possible to find such a path, we stop the process - the current matching is the maximum.
66+
67+
It remains to detail the way to find augmenting paths. Kuhn's algorithm simply searches for any of these paths using [depth-first](./graph/depth-first-search.html) or [breadth-first](./graph/breadth-first-search.html) traversal. The algorithm
68+
looks through all the vertices of the graph in turn, starting each traversal from it, trying to find an augmenting path starting at this vertex.
69+
70+
The algorithm is more convenient to describe if we assume that the input graph is already split into two parts (although, in fact, the algorithm can be implemented in such a way
71+
that the input graph is not explicitly split into two parts).
72+
73+
The algorithm looks at all the vertices $v$ of the first part of the graph: $v = 1 \ldots n_1$. If the current vertex $v$ is already saturated with the current matching
74+
(i.e., some edge adjacent to it has already been selected), then skip this vertex. Otherwise, the algorithm tries to saturate this vertex, for which it starts
75+
a search for an augmenting path starting from this vertex.
76+
77+
The search for an augmenting path is carried out using a special depth-first or breadth-first traversal (usually depth-first traversal is used for ease of implementation).
78+
Initially, the depth-first traversal is at the current unsaturated vertex $v$ of the first part. Let's look through all edges from this vertex. Let the current edge be an edge
79+
$(v, to)$. If the vertex $to$ is not yet saturated with matching, then we have succeeded in finding an augmenting path: it consists of a single edge $(v, to)$;
80+
in this case, we simply include this edge in the matching and stop searching for the augmenting path from the vertex $v$. Otherwise, if $to$ is already saturated with some edge
81+
$(to, p)$,
82+
then will go along this edge: thus we will try to find an augmenting path passing through the edges $(v, to),(to, p), \ldots$.
83+
To do this, simply go to the vertex $p$ in our traversal - now we try to find an augmenting path from this vertex.
84+
85+
So, this traversal, launched from the vertex $v$, will either find an augmenting path, and thereby saturate the vertex $v$, or it will not find such an augmenting path (and, therefore, this vertex $v$ cannot be saturated).
86+
87+
After all the vertices $v = 1 \ldots n_1$ have been scanned, the current matching will be maximum.
88+
89+
### Running time
90+
91+
Kuhn's algorithm can be thought of as a series of $n$ depth/breadth-first traversal runs on the entire graph. Therefore, the whole algorithm is executed in time $O(nm)$, which
92+
in the worst case is $O(n^3)$.
93+
94+
However, this estimate can be improved slightly. It turns out that for Kuhn's algorithm, it is important which part of the graph is chosen as the first and which as the second.
95+
Indeed, in the implementation described above, the depth/breadth-first traversal starts only from the vertices of the first part, so the entire algorithm is executed in
96+
time $O(n_1m)$, where $n_1$ is the number of vertices of the first part. In the worst case, this is $O(n_1 ^ 2 n_2)$ (where $n_2$ is the number of vertices of the second part).
97+
This shows that it is more profitable when the first part contains fewer vertices than the second. On very unbalanced graphs (when $n_1$ and $n_2$ are very different),
98+
this translates into a significant difference in runtimes.
99+
100+
## Implementation
101+
102+
### Standard implementation
103+
Let us present here an implementation of the above algorithm based on depth-first traversal and accepting a bipartite graph in the form of a graph explicitly split into two parts.
104+
This implementation is very concise, and perhaps it should be remembered in this form.
105+
106+
Here $n$ is the number of vertices in the first part, $k$ - in the second part, $g[v]$ is the list of edges from the top of the first part (i.e. the list of numbers of the
107+
vertices to which these edges lead from $v$). The vertices in both parts are numbered independently, i.e. vertices in the first part are numbered $1 \ldots n$, and those in the
108+
second are numbered $1 \ldots k$.
109+
110+
Then there are two auxiliary arrays: $\rm mt$ and $\rm used$. The first - $\rm mt$ - contains information about the current matching. For convenience of programming,
111+
this information is contained only for the vertices of the second part: $\textrm{mt[} i \rm]$ - this is the number of the vertex of the first part connected by an edge with the vertex $i$ of
112+
the second part (or $-1$, if no matching edge comes out of it). The second array is $\rm used$: the usual array of "visits" to the vertices in the depth-first traversal
113+
(it is needed just so that the depth-first traversal does not enter the same vertex twice).
114+
115+
A function $\textrm{try\_kuhn}$ is a depth-first traversal. It returns $\rm true$ if it was able to find an augmenting path from the vertex $v$, and it is considered that this
116+
function has already performed the alternation of matching along the found chain.
117+
118+
Inside the function, all the edges outgoing from the vertex $v$ of the first part are scanned, and then the following is checked: if this edge leads to an unsaturated vertex
119+
$to$, or if this vertex $to$ is saturated, but it is possible to find an increasing chain by recursively starting from $\textrm{mt[}to \rm ]$, then we say that we have found an
120+
augmenting path, and before returning from the function with the result $\rm true$, we alternate the current edge: we redirect the edge adjacent to $to$ to the vertex $v$.
121+
122+
The main program first indicates that the current matching is empty (the list $\rm mt$ is filled with numbers $-1$). Then the vertex $v$ of the first part is searched by $\textrm{try\_kuhn}$,
123+
and a depth-first traversal is started from it, having previously zeroed the array $\rm used$.
124+
125+
It is worth noting that the size of the matching is easy to get as the number of calls $\textrm{try\_kuhn}$ in the main program that returned the result $\rm true$. The desired
126+
maximum matching itself is contained in the array $\rm mt$.
127+
128+
```cpp
129+
int n, k;
130+
vector<vector<int>> g;
131+
vector<int> mt;
132+
vector<bool> used;
133+
134+
bool try_kuhn(int v) {
135+
if (used[v])
136+
return false;
137+
used[v] = true;
138+
for (int to : g[v]) {
139+
if (mt[to] == -1 || try_kuhn(mt[to])) {
140+
mt[to] = v;
141+
return true;
142+
}
143+
}
144+
return false;
145+
}
146+
147+
int main() {
148+
//... reading the graph ...
149+
150+
mt.assign(k, -1);
151+
for (int v = 0; v < n; ++v) {
152+
used.assign(n, false);
153+
try_kuhn(v);
154+
}
155+
156+
for (int i = 0; i < k; ++i)
157+
if (mt[i] != -1)
158+
printf("%d %d\n", mt[i] + 1, i + 1);
159+
}
160+
```
161+
162+
We repeat once again that Kuhn's algorithm is easy to implement in such a way that it works on graphs that are known to be bipartite, but their explicit splitting into two parts
163+
has not been given. In this case, it will be necessary to abandon the convenient division into two parts, and store all the information for all vertices of the graph. For this,
164+
an array of lists $g$ is now specified not only for the vertices of the first part, but for all the vertices of the graph (of course, now the vertices of both parts are numbered
165+
in a common numbering - from $1$ to $n$). Arrays $\rm mt$ and are $\rm used$ are now also defined for the vertices of both parts, and, accordingly, they need to be kept in this state.
166+
167+
### Improved implementation
168+
169+
Let us modify the algorithm as follows. Before the main loop of the algorithm, we will find an **arbitrary matching** by some simple algorithm (a simple **heuristic algorithm**),
170+
and only then we will execute a loop with calls to the $\textrm{try\_kuhn}()$ function, which will improve this matching. As a result, the algorithm will work noticeably faster on
171+
random graphs - because in most graphs, you can easily find a matching of a sufficiently large size using heuristics, and then improve the found matching to the maximum using
172+
the usual Kuhn's algorithm. Thus, we will save on launching a depth-first traversal from those vertices that we have already included using the heuristic into the current matching.
173+
174+
For example, you can simply iterate over all the vertices of the first part, and for each of them, find an arbitrary edge that can be added to the matching, and add it.
175+
Even such a simple heuristic can speed up Kuhn's algorithm several times.
176+
177+
Please note that the main loop will have to be slightly modified. Since when calling the function $\textrm{try\_kuhn}$ in the main loop, it is assumed that the current vertex is
178+
not yet included in the matching, you need to add an appropriate check.
179+
180+
In the implementation, only the code in the $\textrm{main}()$ function will change:
181+
182+
```cpp
183+
int main() {
184+
// ... reading the graph ...
185+
186+
mt.assign(k, -1);
187+
vector<bool> used1(n, false);
188+
for (int v = 0; v < n; ++v) {
189+
for (int to : g[i]) {
190+
if (mt[to] == -1) {
191+
mt[to] = v;
192+
used1[v] = true;
193+
break;
194+
}
195+
}
196+
}
197+
for (int v = 0; v < n; ++v) {
198+
if (used1[v])
199+
continue;
200+
used.assign(n, false);
201+
try_kuhn(v);
202+
}
203+
204+
for (int i = 0; i < k; ++i)
205+
if (mt[i] != -1)
206+
printf("%d %d\n", mt[i] + 1, i + 1);
207+
}
208+
```
209+
210+
**Another good heuristic** is as follows. At each step, it will search for the vertex of the smallest degree (but not isolated), select any edge from it and add it to the matching,
211+
then remove both these vertices with all incident edges from the graph. Such greed works very well on random graphs; in many cases it even builds the maximum matching (although
212+
there is a test case against it, on which it will find a matching that is much smaller than the maximum).
213+
214+
## Notes
215+
216+
* Kuhn's algorithm is a subroutine in the **Hungarian algorithm**, also known as the **Kuhn-Munkres algorithm**.
217+
* Kuhn's algorithm runs in $O(nm)$ time. It is generally simple to implement, however, more efficient algorithms exist for the maximum bipartite matching problem - such as the
218+
**Hopcroft-Karp-Karzanov algorithm**, which runs in $O(\sqrt{n}m)$ time.

src/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,7 @@ and adding new articles to the collection.*
198198
- [Assignment problem. Solution using min-cost-flow in O (N^5)](./graph/Assignment-problem-min-flow.html)
199199
- **Matchings and related problems**
200200
- [Bipartite Graph Check](./graph/bipartite-check.html)
201+
- [Kuhn' Algorithm - Maximum Bipartite Matching](./graph/kuhn_maximum_bipartite_matching.html)
201202
- **Miscellaneous**
202203
- [Topological Sorting](./graph/topological-sort.html)
203204
- [Edge connectivity / Vertex connectivity](./graph/edge_vertex_connectivity.html)

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy