PostgreSQL数据库统计信息——计算统计数据

发表于： 2022-09-12 22:10:20
分类：数据库文章分享
浏览：浏览(89)

在PostgreSQL数据库统计信息——examine_attribute单列预分析中提到了vacattrstats，该数组就是为每个需要分析的列分配的统计信息数据结构体VacAttrStats。在计算统计数据这一步也需要使用到该统计信息数据结构体。

计算统计数据首先对表需要分析的列进行遍历，取出需要分析的列对应的统计信息数据结构体VacAttrStats，将存放采样行数据的rows设置到VacAttrStats->rows、设置stats->tupDesc为onerel->rd_att，调用每列列类型对应的compute_stats函数进行计算。如果指定了n_distinct选项，则使用相应的值更新VacAttrStats->stadistinct。

  * Compute the statistics.  Temporary results during the calculations for each column are stored in a child context.  The calc routines are responsible to make sure that whatever they store into the VacAttrStats structure is allocated in anl_context. */ / 计算统计数据。每列计算期间的临时结果存储在子上下文中。calc例程负责确保它们存储到VacationrStats结构中的任何内容都在anl_context中分配。
  if (numrows > 0){
    MemoryContext col_context,old_context;
    col_context = AllocSetContextCreate(anl_context,"Analyze Column",ALLOCSET_DEFAULT_SIZES); old_context = MemoryContextSwitchTo(col_context);
    for (i = 0; i < attr_cnt; i++){ // 对表需要分析的列进行遍历
      VacAttrStats *stats = vacattrstats[i];
      stats->rows = rows; stats->tupDesc = onerel->rd_att;
      stats->compute_stats(stats, std_fetch_func, numrows, totalrows);


      /* If the appropriate flavor of the n_distinct option is specified, override with the corresponding value. */
      AttributeOpts *aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);
      if (aopt != NULL){
        float8    n_distinct;
        n_distinct = inh ? aopt->n_distinct_inherited : aopt->n_distinct; // 如果分析的是父表，使用aopt->n_distinct_inherited，否则使用子表aopt->n_distinct
        if (n_distinct != 0.0) stats->stadistinct = n_distinct; // 不为0.0，直接更新
      }
      MemoryContextResetAndDeleteChildren(col_context);
    }
    if (hasindex) compute_index_stats(onerel, totalrows, indexdata, nindexes, rows, numrows, col_context);
    MemoryContextSwitchTo(old_context); MemoryContextDelete(col_context);

AnalyzeAttrFetchFunc

AnalyzeAttrFetchFunc函数用于帮助compute_stats获取采样行中相关列数据的回调函数，对于表采样的数据可以使用std_fetch_func，对于索引采样的数据可以使用ind_fetch_func。

/* Standard fetch function for use by compute_stats subroutines. This exists to provide some insulation between compute_stats routines and the actual storage of the sample data. */
static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull){
  int      attnum = stats->tupattnum;
  HeapTuple  tuple = stats->rows[rownum];
  TupleDesc  tupDesc = stats->tupDesc;
  return heap_getattr(tuple, attnum, tupDesc, isNull);
}

compute_stats(VacAttrStatsP stats, AnalyzeAttrFetchFunc fetchfunc, int samplerows, double totalrows)

如果列类型指定的是std_typanalyze函数决定compute_stats的取值：

如果列数据类型支持默认的等于(eqopr equals operator)和小于(ltopr less than operator)，那么这个列应该是数值scalar类型，应该可以使用compute_scalar_stats进行分析。
如果列数据类型仅仅支持等于运算符，可以使用compute_distinct_stats函数进行唯一值的分析。
如果列数据类型不支持上述运算，那么只能使用compute_trivial_stats进行分析了。

如果列类型指定的是array_typanalyze（typanalyze function for array colums）函数决定compute_stats的取值：使用compute_array_stats进行分析；获取信息填充ArrayAnalyzeExtraData，将其作为VacAttrStats的extra_data成员，以供compute_array_stats使用。

如果列类型指定的是ts_typanalyze（a custom tyanalyze function for tsvector columns）函数决定compute_stats的取值：使用compute_tsvector_stats进行分析，并将采用最低行数设置为300 * VacAttrStats->attr->attstattarget。

如果列类型指定的是range_tyanalyze（typanalyze function for range columns）函数决定compute_stats的取值：使用compute_range_stats进行分析，并将采用最低行数设置为300 * VacAttrStats->attr->attstattarget，将range_get_typcache函数获取的结果作为VacAttrStats的extra_data成员，以供compute_array_stats使用。

如果列类型指定的是multirange_typanalyze（typanalyze function for multirange columns）函数决定compute_stats的取值：使用compute_range_stats进行分析，并将采用最低行数设置为300 * VacAttrStats->attr->attstattarget，将multirange_get_typcache函数获取的结果作为VacAttrStats的extra_data成员，以供compute_array_stats使用。

由上可以看出计算统计数据的流程由符合列类型特性的分析函数通过AnalyzeAttrFetchFunc取采样行中相关列数据，采用不同的算法进行统计信息的计算和分析。

赞(0)
踩(0)

免责声明：

1、本站资源由自动抓取工具收集整理于网络。

2、本站不承担由于内容的合法性及真实性所引起的一切争议和法律责任。

3、电子书、小说等仅供网友预览使用，书籍版权归作者或出版社所有。

4、如作者、出版社认为资源涉及侵权，请联系本站，本站将在收到通知书后尽快删除您认为侵权的作品。

5、如果您喜欢本资源，请您支持作者，购买正版内容。

6、资源失效，请下方留言，欢迎分享资源链接

文章评论

共0条评论