Content-Length: 963602 | pFad | http://github.com/tonybelloni/postgres/commit/f49842d1ee31b976c681322f76025d7732e860f3

53 Basic partition-wise join functionality. · tonybelloni/postgres@f49842d · GitHub
Skip to content

Commit f49842d

Browse files
committed
Basic partition-wise join functionality.
Instead of joining two partitioned tables in their entirety we can, if it is an equi-join on the partition keys, join the matching partitions individually. This involves teaching the planner about "other join" rels, which are related to regular join rels in the same way that other member rels are related to baserels. This can use significantly more CPU time and memory than regular join planning, because there may now be a set of "other" rels not only for every base relation but also for every join relation. In most practical cases, this probably shouldn't be a problem, because (1) it's probably unusual to join many tables each with many partitions using the partition keys for all joins and (2) if you do that scenario then you probably have a big enough machine to handle the increased memory cost of planning and (3) the resulting plan is highly likely to be better, so what you spend in planning you'll make up on the execution side. All the same, for now, turn this feature off by default. Currently, we can only perform joins between two tables whose partitioning schemes are absolutely identical. It would be nice to cope with other scenarios, such as extra partitions on one side or the other with no match on the other side, but that will have to wait for a future patch. Ashutosh Bapat, reviewed and tested by Rajkumar Raghuwanshi, Amit Langote, Rafia Sabih, Thomas Munro, Dilip Kumar, Antonin Houska, Amit Khandekar, and by me. A few final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com Discussion: http://postgr.es/m/CAFjFpRcitjfrULr5jfuKWRPsGUX0LQ0k8-yG0Qw2+1LBGNpMdw@mail.gmail.com
1 parent fe9ba28 commit f49842d

34 files changed

+4089
-140
lines changed

contrib/postgres_fdw/expected/postgres_fdw.out

+120
Original file line numberDiff line numberDiff line change
@@ -7346,3 +7346,123 @@ AND ftoptions @> array['fetch_size=60000'];
73467346
(1 row)
73477347

73487348
ROLLBACK;
7349+
-- ===================================================================
7350+
-- test partition-wise-joins
7351+
-- ===================================================================
7352+
SET enable_partition_wise_join=on;
7353+
CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
7354+
CREATE TABLE fprt1_p1 (LIKE fprt1);
7355+
CREATE TABLE fprt1_p2 (LIKE fprt1);
7356+
INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
7357+
INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
7358+
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
7359+
SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
7360+
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
7361+
SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
7362+
ANALYZE fprt1;
7363+
ANALYZE fprt1_p1;
7364+
ANALYZE fprt1_p2;
7365+
CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
7366+
CREATE TABLE fprt2_p1 (LIKE fprt2);
7367+
CREATE TABLE fprt2_p2 (LIKE fprt2);
7368+
INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
7369+
INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
7370+
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
7371+
SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
7372+
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
7373+
SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
7374+
ANALYZE fprt2;
7375+
ANALYZE fprt2_p1;
7376+
ANALYZE fprt2_p2;
7377+
-- inner join three tables
7378+
EXPLAIN (COSTS OFF)
7379+
SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
7380+
QUERY PLAN
7381+
--------------------------------------------------------------------------------------------------------------------
7382+
Sort
7383+
Sort Key: t1.a, t3.c
7384+
-> Append
7385+
-> Foreign Scan
7386+
Relations: ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) INNER JOIN (public.ftprt1_p1 t3)
7387+
-> Foreign Scan
7388+
Relations: ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) INNER JOIN (public.ftprt1_p2 t3)
7389+
(7 rows)
7390+
7391+
SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
7392+
a | b | c
7393+
-----+-----+------
7394+
0 | 0 | 0000
7395+
150 | 150 | 0003
7396+
250 | 250 | 0005
7397+
400 | 400 | 0008
7398+
(4 rows)
7399+
7400+
-- left outer join + nullable clasue
7401+
EXPLAIN (COSTS OFF)
7402+
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
7403+
QUERY PLAN
7404+
-----------------------------------------------------------------------------------
7405+
Sort
7406+
Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c
7407+
-> Append
7408+
-> Foreign Scan
7409+
Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
7410+
(5 rows)
7411+
7412+
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
7413+
a | b | c
7414+
---+---+------
7415+
0 | 0 | 0000
7416+
2 | |
7417+
4 | |
7418+
6 | 6 | 0000
7419+
8 | |
7420+
(5 rows)
7421+
7422+
-- with whole-row reference
7423+
EXPLAIN (COSTS OFF)
7424+
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
7425+
QUERY PLAN
7426+
---------------------------------------------------------------------------------
7427+
Sort
7428+
Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
7429+
-> Append
7430+
-> Foreign Scan
7431+
Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
7432+
-> Foreign Scan
7433+
Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
7434+
(7 rows)
7435+
7436+
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
7437+
t1 | t2
7438+
----------------+----------------
7439+
(0,0,0000) | (0,0,0000)
7440+
(150,150,0003) | (150,150,0003)
7441+
(250,250,0005) | (250,250,0005)
7442+
(400,400,0008) | (400,400,0008)
7443+
(4 rows)
7444+
7445+
-- join with lateral reference
7446+
EXPLAIN (COSTS OFF)
7447+
SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
7448+
QUERY PLAN
7449+
---------------------------------------------------------------------------------
7450+
Sort
7451+
Sort Key: t1.a, t1.b
7452+
-> Append
7453+
-> Foreign Scan
7454+
Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
7455+
-> Foreign Scan
7456+
Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
7457+
(7 rows)
7458+
7459+
SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
7460+
a | b
7461+
-----+-----
7462+
0 | 0
7463+
150 | 150
7464+
250 | 250
7465+
400 | 400
7466+
(4 rows)
7467+
7468+
RESET enable_partition_wise_join;

contrib/postgres_fdw/sql/postgres_fdw.sql

+53
Original file line numberDiff line numberDiff line change
@@ -1764,3 +1764,56 @@ WHERE ftrelid = 'table30000'::regclass
17641764
AND ftoptions @> array['fetch_size=60000'];
17651765

17661766
ROLLBACK;
1767+
1768+
-- ===================================================================
1769+
-- test partition-wise-joins
1770+
-- ===================================================================
1771+
SET enable_partition_wise_join=on;
1772+
1773+
CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
1774+
CREATE TABLE fprt1_p1 (LIKE fprt1);
1775+
CREATE TABLE fprt1_p2 (LIKE fprt1);
1776+
INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
1777+
INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
1778+
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
1779+
SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
1780+
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
1781+
SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
1782+
ANALYZE fprt1;
1783+
ANALYZE fprt1_p1;
1784+
ANALYZE fprt1_p2;
1785+
1786+
CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
1787+
CREATE TABLE fprt2_p1 (LIKE fprt2);
1788+
CREATE TABLE fprt2_p2 (LIKE fprt2);
1789+
INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
1790+
INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
1791+
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
1792+
SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
1793+
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
1794+
SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
1795+
ANALYZE fprt2;
1796+
ANALYZE fprt2_p1;
1797+
ANALYZE fprt2_p2;
1798+
1799+
-- inner join three tables
1800+
EXPLAIN (COSTS OFF)
1801+
SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
1802+
SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
1803+
1804+
-- left outer join + nullable clasue
1805+
EXPLAIN (COSTS OFF)
1806+
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
1807+
SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
1808+
1809+
-- with whole-row reference
1810+
EXPLAIN (COSTS OFF)
1811+
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
1812+
SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
1813+
1814+
-- join with lateral reference
1815+
EXPLAIN (COSTS OFF)
1816+
SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
1817+
SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
1818+
1819+
RESET enable_partition_wise_join;

doc/src/sgml/config.sgml

+20
Original file line numberDiff line numberDiff line change
@@ -3632,6 +3632,26 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
36323632
</listitem>
36333633
</varlistentry>
36343634

3635+
<varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
3636+
<term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
3637+
<indexterm>
3638+
<primary><varname>enable_partition_wise_join</> configuration parameter</primary>
3639+
</indexterm>
3640+
</term>
3641+
<listitem>
3642+
<para>
3643+
Enables or disables the query planner's use of partition-wise join,
3644+
which allows a join between partitioned tables to be performed by
3645+
joining the matching partitions. Partition-wise join currently applies
3646+
only when the join conditions include all the partition keys, which
3647+
must be of the same data type and have exactly matching sets of child
3648+
partitions. Because partition-wise join planning can use significantly
3649+
more CPU time and memory during planning, the default is
3650+
<literal>off</>.
3651+
</para>
3652+
</listitem>
3653+
</varlistentry>
3654+
36353655
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
36363656
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
36373657
<indexterm>

doc/src/sgml/fdwhandler.sgml

+20
Original file line numberDiff line numberDiff line change
@@ -1292,6 +1292,26 @@ ShutdownForeignScan(ForeignScanState *node);
12921292
</para>
12931293
</sect2>
12941294

1295+
<sect2 id="fdw-callbacks-reparameterize-paths">
1296+
<title>FDW Routines For reparameterization of paths</title>
1297+
1298+
<para>
1299+
<programlisting>
1300+
List *
1301+
ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
1302+
RelOptInfo *child_rel);
1303+
</programlisting>
1304+
This function is called while converting a path parameterized by the
1305+
top-most parent of the given child relation <literal>child_rel</> to be
1306+
parameterized by the child relation. The function is used to reparameterize
1307+
any paths or translate any expression nodes saved in the given
1308+
<literal>fdw_private</> member of a <structname>ForeignPath</>. The
1309+
callback may use <literal>reparameterize_path_by_child</>,
1310+
<literal>adjust_appendrel_attrs</> or
1311+
<literal>adjust_appendrel_attrs_multilevel</> as required.
1312+
</para>
1313+
</sect2>
1314+
12951315
</sect1>
12961316

12971317
<sect1 id="fdw-helpers">

src/backend/optimizer/README

+26
Original file line numberDiff line numberDiff line change
@@ -1075,3 +1075,29 @@ be desirable to postpone the Gather stage until as near to the top of the
10751075
plan as possible. Expanding the range of cases in which more work can be
10761076
pushed below the Gather (and costing them accurately) is likely to keep us
10771077
busy for a long time to come.
1078+
1079+
Partition-wise joins
1080+
--------------------
1081+
A join between two similarly partitioned tables can be broken down into joins
1082+
between their matching partitions if there exists an equi-join condition
1083+
between the partition keys of the joining tables. The equi-join between
1084+
partition keys implies that all join partners for a given row in one
1085+
partitioned table must be in the corresponding partition of the other
1086+
partitioned table. Because of this the join between partitioned tables to be
1087+
broken into joins between the matching partitions. The resultant join is
1088+
partitioned in the same way as the joining relations, thus allowing an N-way
1089+
join between similarly partitioned tables having equi-join condition between
1090+
their partition keys to be broken down into N-way joins between their matching
1091+
partitions. This technique of breaking down a join between partition tables
1092+
into join between their partitions is called partition-wise join. We will use
1093+
term "partitioned relation" for either a partitioned table or a join between
1094+
compatibly partitioned tables.
1095+
1096+
The partitioning properties of a partitioned relation are stored in its
1097+
RelOptInfo. The information about data types of partition keys are stored in
1098+
PartitionSchemeData structure. The planner maintains a list of canonical
1099+
partition schemes (distinct PartitionSchemeData objects) so that RelOptInfo of
1100+
any two partitioned relations with same partitioning scheme point to the same
1101+
PartitionSchemeData object. This reduces memory consumed by
1102+
PartitionSchemeData objects and makes it easy to compare the partition schemes
1103+
of joining relations.

src/backend/optimizer/geqo/geqo_eval.c

+3
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,9 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
264264
/* Keep searching if join order is not valid */
265265
if (joinrel)
266266
{
267+
/* Create paths for partition-wise joins. */
268+
generate_partition_wise_join_paths(root, joinrel);
269+
267270
/* Create GatherPaths for any useful partial paths for rel */
268271
generate_gather_paths(root, joinrel);
269272

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/tonybelloni/postgres/commit/f49842d1ee31b976c681322f76025d7732e860f3

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy