0%

Block Nested Loop Join

引言

在数据库中,join 是经常使用的操作,但其工作量也可能是很大的。那么如何高效地完成 join 操作呢?

这篇 blog 将介绍:

  • 常见的 join 实现方式
  • 如何在 Postgresql 源代码中实现 block nested loop join

常见 join 算法

Simple Nested-Loop Join

简单来说,Simple Nested-Loop Join 就是一个双层 for 循环 ,通过循环外层表的行数据,逐个与内层表的所有行数据进行比较来获取结果。

用伪代码描述大致如下:

1
2
3
4
5
6
7
8
9
for (each tuple i in outer relation) {
for (each tuple j in inner relation) {
if (join_condition(tuple i, tuple j) is true)
emit (tuple i, tuple j)
else
continue
}
}

特点:

Nested-Loop Join 简单粗暴容易理解,就是通过双层循环比较数据来获得结果,但是这种算法显然太过于粗鲁,如果每个表有1万条数据,那么对数据比较的次数=1万 * 1万 = 1亿次,很显然这种查询效率会非常慢。

当然,数据库肯定不会这么粗暴地 join,所以就出现了后面的两种对Nested-Loop Join 优化算法。在执行 join 查询时,数据库会根据情况选择后面的两种 join 优化算法的进行join查询。


Index Nested-Loop Join

Index Nested-Loop Join 的优化思路主要是:减少内层表数据的匹配次数。

简单来说,Index Nested-Loop Join 就是通过外层表匹配条件,直接与内层表索引进行匹配,避免和内层表的每条记录去进行比较,这样极大的减少了对内层表的匹配次数,从原来的 “匹配次数=外层表行数 * 内层表行数”,变成了 “外层表的行数 * 内层表索引的高度”,极大的提升了 join 的性能。

Index Nested-Loop Join

但是,Index Nested-Loop Join 算法的前提是:匹配的字段必须建立了索引。


Block Nested-Loop join

Block Nested-Loop Join 其优化思路是:减少内层表的扫表次数。

Block Nested-Loop Join 算法意在通过一次性缓存外层表的多条数据,以此来减少内层表的扫表次数,从而达到提升性能的目的。

Block Nested-Loop Join

如果无法使用 Index Nested-Loop Join 时候,一般都会使用 Block Nested-Loop Join 算法。

PostgreSQL 实验

准备工作

  1. 下载 postgresql 源代码:

    https://www.postgresql.org/ftp/source/v12.0/

  2. 在本地编译并安装 postgresql:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # 切换到源代码目录
    cd postgresql-12.0

    # 初始化 make 配置
    /configure --enable-depend --enable-cassert --enable-debug CFLAGS="-O0" --prefix=$HOME/pgsql

    # 编译
    make

    # 安装(一般会安装到$HOME/pgsql)
    make install
  3. 初始化本地 postgresql 服务器:

    若你不幸把 pgsql 装到别的位置,把 $HOME/pgsql 替换成你安装的路径。

    1
    $HOME/pgsql/bin/initdb -D $HOME/pgsql/data --locale=C
  4. 启动 postgresql 服务器:

    1
    /home/[guanz]/pgsql/bin/pg_ctl -D /home/[guanz]/pgsql/data -l logfile start
  5. 连接 postgresql 服务器:

    1
    $HOME/pgsql/bin/psql postgres
  6. 关闭 postgresql 服务器:

    “Ctrl + D” 和 \q 均可以退出 postgresql 命令行。

    1
    2
    # This command depends on your installation location, please modify it
    /home/[guanz]/pgsql/bin/pg_ctl -D /home/[guanz]/pgsql/data stop
  7. 加载测试数据:

    1
    2
    3
    postgres=# CREATE DATABASE similarity;
    postgres-# \c similarity
    postgres-# \i /home/[guanz/data/]similarity_data.sql

    select_example


源代码解读

虽然源代码很多,但我们只关注以下几个文件:

  • src/backend/executor/nodeNestloop.c
  • src/include/executor/nodeNestloop.h
  • src/include/nodes/execnodes.h

Nested-Loop 主要算法都在 nodeNestloop.c 中实现:

其中 ExecNestLoop() 是执行嵌套循环连接,其余两个 ExecInitNestLoop() 和 ExecEndNestLoop() 则分别完成初始化和终止。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
/*
* xht03:
* ExecNestLoop 是迭代器函数,不是一次执行完所有循环,
* 而是每次循环时执行一次的函数
*/
static TupleTableSlot *
ExecNestLoop(PlanState *pstate)
{
/*
* xht03:
* node 指的是查询计划树中的一个节点。
* 在这段代码中,NestLoopState *node 是
* 一个指向 NestLoopState 结构体的指针,
* 这个结构体通常包含了执行Nested Loop Join 所需要的状态信息。
*/
NestLoopState *node = castNode(NestLoopState, pstate);
NestLoop *nl;

/*
* xht03:
* Plan 指的是查询计划,
* 它是数据库查询优化器生成的一种数据结构,
* 用于描述如何执行一个SQL查询。
* 查询计划是由一系列的操作步骤(或称为操作符)组成的树状结构,
* 每一个操作步骤都对应查询计划树中的一个节点。
*/
PlanState *innerPlan;
PlanState *outerPlan;

/*
* xht03:
* slot 意为“槽”
* outerTupleSlot和innerTupleSlot是用来:
* 存储从外部/内部关系(或称为左/右表)获取的当前元组
*/
TupleTableSlot *outerTupleSlot;
TupleTableSlot *innerTupleSlot;

// 连接条件
// 其他条件(比如:where后面跟的表达式)
ExprState *joinqual;
ExprState *otherqual;

// 存储执行过程中的上下文
ExprContext *econtext;
ListCell *lc;

// 检查中断(不用管)
CHECK_FOR_INTERRUPTS();

/*
* get information from the node
*/
ENL1_printf("getting info from node");

nl = (NestLoop *) node->js.ps.plan;
joinqual = node->js.joinqual;
otherqual = node->js.ps.qual;
outerPlan = outerPlanState(node); // 记录外表所在的tuple
innerPlan = innerPlanState(node); // 记录内表所在的tuple
econtext = node->js.ps.ps_ExprContext;

/*
* Reset per-tuple memory context to free any expression evaluation
* storage allocated in the previous tuple cycle.
*/
ResetExprContext(econtext);

/*
* Ok, everything is setup for the join so now loop until we return a
* qualifying join tuple.
*/
ENL1_printf("entering main loop");

for (;;)
{
/*
* xht03:
* 当需要新的外部元组时,获取下一个的外部元组,
* 并让内表从头开始扫描(重置)
*/
if (node->nl_NeedNewOuter)
{
ENL1_printf("getting new outer tuple");
outerTupleSlot = ExecProcNode(outerPlan);

/*
* xht03:
* 如果下一个外部元组为空,则 join 完成了
*/
if (TupIsNull(outerTupleSlot))
{
ENL1_printf("no outer tuple, ending join");
return NULL;
}

ENL1_printf("saving new outer tuple information");
econtext->ecxt_outertuple = outerTupleSlot;
node->nl_NeedNewOuter = false;
node->nl_MatchedOuter = false;

/*
* fetch the values of any outer Vars that must be passed to the
* inner scan, and store them in the appropriate PARAM_EXEC slots.
*
* xht03:
* 传递一些参数,不用管,也不要改
*/
foreach(lc, nl->nestParams)
{
NestLoopParam *nlp = (NestLoopParam *) lfirst(lc);
int paramno = nlp->paramno;
ParamExecData *prm;

prm = &(econtext->ecxt_param_exec_vals[paramno]);
/* Param value should be an OUTER_VAR var */
Assert(IsA(nlp->paramval, Var));
Assert(nlp->paramval->varno == OUTER_VAR);
Assert(nlp->paramval->varattno > 0);
prm->value = slot_getattr(outerTupleSlot,
nlp->paramval->varattno,
&(prm->isnull));
/* Flag parameter value as changed */
innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
paramno);
}

/*
* now rescan the inner plan
*/
ENL1_printf("rescanning inner plan");
ExecReScan(innerPlan);
}

/*
* we have an outerTuple, try to get the next inner tuple.
*/
ENL1_printf("getting new inner tuple");

innerTupleSlot = ExecProcNode(innerPlan);
econtext->ecxt_innertuple = innerTupleSlot;

if (TupIsNull(innerTupleSlot))
{
ENL1_printf("no inner tuple, need new outer tuple");

node->nl_NeedNewOuter = true;

if (!node->nl_MatchedOuter &&
(node->js.jointype == JOIN_LEFT ||
node->js.jointype == JOIN_ANTI))
{
/*
* We are doing an outer join and there were no join matches
* for this outer tuple. Generate a fake join tuple with
* nulls for the inner tuple, and return it if it passes the
* non-join quals.
*
* xht03:
* 你可能也发现了:外部元组和内部元组的值,除了会存进 outerTupleSlot
* 和 innerTupleSlot,还要存进上下文 econtext 里。
* 这是因为:ExecQual() 判断是否满足等式条件的函数,是使用上下文中
* 存储的内外部元组信息进行判断的。
* 这样的函数还有很多,所以一定要记得更新上下文
*/
econtext->ecxt_innertuple = node->nl_NullInnerTupleSlot;

ENL1_printf("testing qualification for outer-join tuple");

if (otherqual == NULL || ExecQual(otherqual, econtext))
{
/*
* qualification was satisfied so we project and return
* the slot containing the result tuple using
* ExecProject().
*/
ENL1_printf("qualification succeeded, projecting tuple");
return ExecProject(node->js.ps.ps_ProjInfo);
}
else
InstrCountFiltered2(node, 1);
}

/*
* Otherwise just return to top of loop for a new outer tuple.
*/
continue;
}

/*
* at this point we have a new pair of inner and outer tuples so we
* test the inner and outer tuples to see if they satisfy the node's
* qualification.
*
* Only the joinquals determine MatchedOuter status, but all quals
* must pass to actually return the tuple.
*/
ENL1_printf("testing qualification");

if (ExecQual(joinqual, econtext))
{
node->nl_MatchedOuter = true;

/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI)
{
node->nl_NeedNewOuter = true;
continue; /* return to top of loop */
}

/*
* If we only need to join to the first matching inner tuple, then
* consider returning this one, but after that continue with next
* outer tuple.
*/
if (node->js.single_match)
node->nl_NeedNewOuter = true;

if (otherqual == NULL || ExecQual(otherqual, econtext))
{
/*
* qualification was satisfied so we project and return the
* slot containing the result tuple using ExecProject().
*/
ENL1_printf("qualification succeeded, projecting tuple");

return ExecProject(node->js.ps.ps_ProjInfo);
}
else
InstrCountFiltered2(node, 1);
}
else
InstrCountFiltered1(node, 1);

/*
* Tuple fails qual, so free per-tuple memory and try again.
*/
ResetExprContext(econtext);

ENL1_printf("qualification failed, looping");
}
}

/* ----------------------------------------------------------------
* ExecInitNestLoop
* ----------------------------------------------------------------
*/
NestLoopState *
ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
{
NestLoopState *nlstate;

/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));

NL1_printf("ExecInitNestLoop: %s\n",
"initializing node");

/*
* create state structure
*/
nlstate = makeNode(NestLoopState);
nlstate->js.ps.plan = (Plan *) node;
nlstate->js.ps.state = estate;
nlstate->js.ps.ExecProcNode = ExecNestLoop;

/*
* Miscellaneous initialization
*
* create expression context for node
*/
ExecAssignExprContext(estate, &nlstate->js.ps);

/*
* initialize child nodes
*
* If we have no parameters to pass into the inner rel from the outer,
* tell the inner child that cheap rescans would be good. If we do have
* such parameters, then there is no point in REWIND support at all in the
* inner child, because it will always be rescanned with fresh parameter
* values.
*/
outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
if (node->nestParams == NIL)
eflags |= EXEC_FLAG_REWIND;
else
eflags &= ~EXEC_FLAG_REWIND;
innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);

/*
* Initialize result slot, type and projection.
*/
ExecInitResultTupleSlotTL(&nlstate->js.ps, &TTSOpsVirtual);
ExecAssignProjectionInfo(&nlstate->js.ps, NULL);

/*
* initialize child expressions
*/
nlstate->js.ps.qual =
ExecInitQual(node->join.plan.qual, (PlanState *) nlstate);
nlstate->js.jointype = node->join.jointype;
nlstate->js.joinqual =
ExecInitQual(node->join.joinqual, (PlanState *) nlstate);

/*
* detect whether we need only consider the first matching inner tuple
*/
nlstate->js.single_match = (node->join.inner_unique ||
node->join.jointype == JOIN_SEMI);

/* set up null tuples for outer joins, if needed */
switch (node->join.jointype)
{
case JOIN_INNER:
case JOIN_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
nlstate->nl_NullInnerTupleSlot =
ExecInitNullTupleSlot(estate,
ExecGetResultType(innerPlanState(nlstate)),
&TTSOpsVirtual);
break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) node->join.jointype);
}

/*
* finally, wipe the current outer tuple clean.
*/
nlstate->nl_NeedNewOuter = true;
nlstate->nl_MatchedOuter = false;

NL1_printf("ExecInitNestLoop: %s\n",
"node initialized");

return nlstate;
}

/* ----------------------------------------------------------------
* ExecEndNestLoop
*
* closes down scans and frees allocated storage
* ----------------------------------------------------------------
*/
void
ExecEndNestLoop(NestLoopState *node)
{
NL1_printf("ExecEndNestLoop: %s\n",
"ending node processing");

/*
* Free the exprcontext
*/
ExecFreeExprContext(&node->js.ps);

/*
* clean out the tuple table
*/
ExecClearTuple(node->js.ps.ps_ResultTupleSlot);

/*
* close down subplans
*/
ExecEndNode(outerPlanState(node));
ExecEndNode(innerPlanState(node));

NL1_printf("ExecEndNestLoop: %s\n",
"node processing ended");
}

/* ----------------------------------------------------------------
* ExecReScanNestLoop
* ----------------------------------------------------------------
*/
void
ExecReScanNestLoop(NestLoopState *node)
{
PlanState *outerPlan = outerPlanState(node);

/*
* If outerPlan->chgParam is not null then plan will be automatically
* re-scanned by first ExecProcNode.
*/
if (outerPlan->chgParam == NULL)
ExecReScan(outerPlan);

/*
* innerPlan is re-scanned for each new outer tuple and MUST NOT be
* re-scanned from here or you'll get troubles from inner index scans when
* outer Vars are used as run-time keys...
*/

node->nl_NeedNewOuter = true;
node->nl_MatchedOuter = false;
}

源代码修改

我们主要做了一下修改:

  • 新增了结构体类型 NestedBlock ,用于存放中间缓存的外部元组。

与之相应地:

  • 增加了 GetNestedBlock() 函数,用于获取新的中间元组。
  • 在上下文类型 ExprContext 中增加 NestedBlock 变量,将中间缓存元组记录在上下文中。
  • 修改 ExecNestLoop() 函数:外部元组一次取出至多 NESTED_BLOCK_SIZE 个元组放入缓存 NestedBlock 中,并照旧扫描内部元组。但每次需要将缓存中的所有 outer tuple 一一与 inner tuple 进行比较,且外部元组的指针需要往后移动 NESTED_BLOCK_SIZE 个元组。

修改细节如下(修改之处都加有 xht03 注释):

execnodes.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// xht03
#define NESTED_BLOCK_SIZE 64

typedef struct NestedBlock{
TupleTableSlot *tuple[NESTED_BLOCK_SIZE]; // 存放外部元组
bool isMatched[NESTED_BLOCK_SIZE]; // 是否在内部关系中匹配到过
int size; // 缓存中的元组个数
} NestedBlock;


typedef struct ExprContext
{
NodeTag type;

/* Tuples that Var nodes in expression may refer to */
#define FIELDNO_EXPRCONTEXT_SCANTUPLE 1
TupleTableSlot *ecxt_scantuple;
#define FIELDNO_EXPRCONTEXT_INNERTUPLE 2
TupleTableSlot *ecxt_innertuple;
#define FIELDNO_EXPRCONTEXT_OUTERTUPLE 3
TupleTableSlot *ecxt_outertuple;

// xht03
NestedBlock *ecxt_block;

/* Memory contexts for expression evaluation --- see notes above */
MemoryContext ecxt_per_query_memory;
MemoryContext ecxt_per_tuple_memory;

/* Values to substitute for Param nodes in expression */
ParamExecData *ecxt_param_exec_vals; /* for PARAM_EXEC params */
ParamListInfo ecxt_param_list_info; /* for other param types */

/*
* Values to substitute for Aggref nodes in the expressions of an Agg
* node, or for WindowFunc nodes within a WindowAgg node.
*/
#define FIELDNO_EXPRCONTEXT_AGGVALUES 8
Datum *ecxt_aggvalues; /* precomputed values for aggs/windowfuncs */
#define FIELDNO_EXPRCONTEXT_AGGNULLS 9
bool *ecxt_aggnulls; /* null flags for aggs/windowfuncs */

/* Value to substitute for CaseTestExpr nodes in expression */
#define FIELDNO_EXPRCONTEXT_CASEDATUM 10
Datum caseValue_datum;
#define FIELDNO_EXPRCONTEXT_CASENULL 11
bool caseValue_isNull;

/* Value to substitute for CoerceToDomainValue nodes in expression */
#define FIELDNO_EXPRCONTEXT_DOMAINDATUM 12
Datum domainValue_datum;
#define FIELDNO_EXPRCONTEXT_DOMAINNULL 13
bool domainValue_isNull;

/* Link to containing EState (NULL if a standalone ExprContext) */
struct EState *ecxt_estate;

/* Functions to call back when ExprContext is shut down or rescanned */
ExprContext_CB *ecxt_callbacks;
} ExprContext;

nodeNestloop.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#ifndef NODENESTLOOP_H
#define NODENESTLOOP_H

#include "nodes/execnodes.h"


// xht03
extern void *GetNestedBlock(NestedBlock *block, PlanState *outerPlan);


extern NestLoopState *ExecInitNestLoop(NestLoop *node, EState *estate, int eflags);
extern void ExecEndNestLoop(NestLoopState *node);
extern void ExecReScanNestLoop(NestLoopState *node);

#endif /* NODENESTLOOP_H */

nodeNestloop.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
// xht03
void *GetNestedBlock(NestedBlock *block, PlanState *outerPlan)
{
for(int i = 0; i < NESTED_BLOCK_SIZE; i++)
{
block->tuple[i] = ExecProcNode(outerPlan);
block->isMatched[i] = false;
if(TupIsNull(block->tuple[i]))
{
block->size = i;
break;
}
}
}



static TupleTableSlot *
ExecNestLoop(PlanState *pstate)
{
NestLoopState *node = castNode(NestLoopState, pstate);
NestLoop *nl;
PlanState *innerPlan;
PlanState *outerPlan;
TupleTableSlot *outerTupleSlot;
TupleTableSlot *innerTupleSlot;
ExprState *joinqual;
ExprState *otherqual;
ExprContext *econtext;
ListCell *lc;

// xht03
NestedBlock block;


CHECK_FOR_INTERRUPTS();


/*
* get information from the node
*/
ENL1_printf("getting info from node");

nl = (NestLoop *) node->js.ps.plan;
joinqual = node->js.joinqual;
otherqual = node->js.ps.qual;
outerPlan = outerPlanState(node);
innerPlan = innerPlanState(node);
econtext = node->js.ps.ps_ExprContext;

/*
* Reset per-tuple memory context to free any expression evaluation
* storage allocated in the previous tuple cycle.
*/
ResetExprContext(econtext);

/*
* Ok, everything is setup for the join so now loop until we return a
* qualifying join tuple.
*/
ENL1_printf("entering main loop");

for (;;)
{
/*
* If we don't have an outer tuple, get the next one and reset the
* inner scan.
*/
if (node->nl_NeedNewOuter)
{
ENL1_printf("getting new outer tuple");
// xht03
GetNestedBlock(&block, outerPlan);


/*
* if there are no more outer tuples, then the join is complete..
*/
// xht03
if (block.size == 0)
{
ENL1_printf("no outer tuple, ending join");
return NULL;
}

ENL1_printf("saving new outer tuple information");
//econtext->ecxt_outertuple = outerTupleSlot; // 记录外表所在的tuple,更新上下文

// xht03
outerTupleSlot = block.tuple[block.size - 1];
econtext->ecxt_outertuple = outerTupleSlot;
econtext->ecxt_block = &block;
node->nl_NeedNewOuter = false;
node->nl_MatchedOuter = false;

/*
* fetch the values of any outer Vars that must be passed to the
* inner scan, and store them in the appropriate PARAM_EXEC slots.
*/
// 将外表的一些参数传递给内表
foreach(lc, nl->nestParams)
{
NestLoopParam *nlp = (NestLoopParam *) lfirst(lc);
int paramno = nlp->paramno;
ParamExecData *prm;
prm = &(econtext->ecxt_param_exec_vals[paramno]);
/* Param value should be an OUTER_VAR var */
Assert(IsA(nlp->paramval, Var));
Assert(nlp->paramval->varno == OUTER_VAR);
Assert(nlp->paramval->varattno > 0);
prm->value = slot_getattr(outerTupleSlot,
nlp->paramval->varattno,
&(prm->isnull));
/* Flag parameter value as changed */
innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
paramno);
}
/*
* now rescan the inner plan
*/
// 重新确定内表从哪里开始扫描
ENL1_printf("rescanning inner plan");
ExecReScan(innerPlan);
}


/*
* we have an outerTuple, try to get the next inner tuple.
*/
ENL1_printf("getting new inner tuple");
innerTupleSlot = ExecProcNode(innerPlan); // 获取新的内部节点(tuple)
econtext->ecxt_innertuple = innerTupleSlot; // 记录内表所在的tuple,更新上下文

if (TupIsNull(innerTupleSlot))
{
ENL1_printf("no inner tuple, need new outer tuple");
node->nl_NeedNewOuter = true;
/*
* xht03:
* If not all the tuples in the block have been matched,
* or the join type is left-join or anti-join
*/
if (!node->nl_MatchedOuter &&
(node->js.jointype == JOIN_LEFT ||
node->js.jointype == JOIN_ANTI))
{
/*
* We are doing an outer join and there were no join matches
* for this outer tuple. Generate a fake join tuple with
* nulls for the inner tuple, and return it if it passes the
* non-join quals.
*/

for(int i = 0; i < block.size; i++){
econtext->ecxt_outertuple = block.tuple[i];
if(!block.isMatched[i]){
// set to a empty tuple
econtext->ecxt_innertuple = node->nl_NullInnerTupleSlot;

// 检查非连接条件(non-join quals)的其他条件是否满足
ENL1_printf("testing qualification for outer-join tuple");
if (otherqual == NULL || ExecQual(otherqual, econtext))
{
/*
* qualification was satisfied so we project and return
* the slot containing the result tuple using
* ExecProject().
*/
ENL1_printf("qualification succeeded, projecting tuple");
return ExecProject(node->js.ps.ps_ProjInfo);
}
else
InstrCountFiltered2(node, 1);
}
}
}
/*
* Otherwise just return to top of loop for a new outer tuple.
*/
continue;
}
/*
* at this point we have a new pair of inner and outer tuples so we
* test the inner and outer tuples to see if they satisfy the node's
* qualification.
*
* Only the joinquals determine MatchedOuter status, but all quals
* must pass to actually return the tuple.
*/
ENL1_printf("testing qualification");

// 测试是否满足 join 条件
// xht03
int matched_num = 0;

for(int i = 0; i < block.size; i++){
econtext->ecxt_outertuple = block.tuple[i];
if (ExecQual(joinqual, econtext))
{
//xht03
if(!block.isMatched[i]){
matched_num++;
block.isMatched[i] = true;
}

/*
* If we only need to join to the first matching inner tuple, then
* consider returning this one, but after that continue with next
* outer tuple.
*/

if (otherqual == NULL || ExecQual(otherqual, econtext))
{
/*
* qualification was satisfied so we project and return the
* slot containing the result tuple using ExecProject().
*/
ENL1_printf("qualification succeeded, projecting tuple");
return ExecProject(node->js.ps.ps_ProjInfo);
}
else
InstrCountFiltered2(node, 1);
}
else
InstrCountFiltered1(node, 1);
}
/*
* xht03:
* If all tuples in the block have been matched
*/
if(matched_num == block.size)
{
node->nl_MatchedOuter = true;
/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI || node->js.single_match)
{
node->nl_NeedNewOuter = true;
}
}


/*
* Tuple fails qual, so free per-tuple memory and try again.
*/
ResetExprContext(econtext);
ENL1_printf("qualification failed, looping");
}
}

源代码测试

重新编译并安装 postgresql 后,设置 NESTED_BLOCK_SIZE = 1、2、8、64、128、1024,运行下面语句两次,取第二次运行时间。

在数据库命令行中输入:\timing,打开测时间功能。

1
SELECT count(*) FROM restaurantaddress ra, restaurantphone rp WHERE ra.name = rp.name;

Timing

NESTED_BLOCK_SIZE 时间1/ms 时间2/ms 时间3/ms
1 2.710 2.559 2.565
2 2.579 2.753 2.636
8 2.337 2.483 2.716
64 2.277 2.441 1.896
128 2.657 2.543 2.464
1024 2.871 2.743 2.778

这样的实验结果是符合预期的:

  • NESTED_BLOCK_SIZE 太小时,与 Simple Nested-Loop 相近;当 NESTED_BLOCK_SIZE 太大时,就相当于内外表互换后的 Simple Nested-Loop。所以当 NESTED_BLOCK_SIZE 取一个大小适中的值时,Block Nested-Loop 才能性能最大化。