Database Performance Tuning Tutorial – DB Statistics
OptimizDBA database statistic

Database performance tuning entails a plethora of tasks and processes that ensure the optimal functioning of the DB. Identifying design flaws, improving query performance, monitoring hardware usage is some of the tasks DBAs and DB developers undertake during the tuning process. This tutorial is part of the series where I cover the breadth of database optimization techniques in detail. DB Statistics is one of the vital parameters that help us get an overview of database health. Today, we will look at how we can extract the statistics from various databases and use them. 

Statistics and Database Performance Tuning

You might have a question here:

Does DB Statistics help in improving database performance?

The answer to that is it does help with improving the database performance but just not directly. When we are optimizing the database, we need to collect as much of its data as possible. This data has the capability to some critical questions we pose during the optimization process such as: 

  1. How many rows are present in a table?
  2. How are the indexes created, and what are the relationships between the indexes in the DB? 
  3. How many distinct values do a column hold, and more? 

Thus, ensuring that these statistics are being collected is imperative. 

Composition of DB Statistics across Databases 

The statistics collected by databases are very similar to each other. They mostly contain information about the tables like the indexes, column details, table metadata, and more. Here let’s look at the contents of some DB Statistics. 

1. Statistics for MS SQL Server

Example of a Statistics Object in a SQL Server database. Source: Database Journal

SQL Server creates and maintains a collection of statistics objects. Each of these objects is a histogram that provides information about column value distribution, distinct column values, etc. These objects can represent both single or multi-column information. However, for multi-column values, it also stores correlation information among columns. 

2. Stats stored in MySQL DB

The MySQL Database stores all the table related statistics in the STATISTICS table. Similar to the SQL Server, the Statistics table holds information such as: 

  1. Cardinality – An estimated number of unique values within a given index.
  2. Collation – How the values in an index are stored. For example, are they in ascending or descending order?
  3. Sub Part – It holds the number of indexed characters in the case of partially indexed columns. For columns that are completely indexed, the value will be NULL. 

The MySQL database also stores a few other statistics regarding the table such as query latency times and query fetch times, as well as the total number of rows fetched. 

3. Oracle DB statistics

The Usage of Statistics in Oracle DB. Source: Oracle DB Documentation

The Oracle DB stores multiple types of statistics within the database. These statistics are used mainly for optimizing query executions. Following are the types of statistics gathered: 

  1. Table Statistics – We can extract the table metadata from these stats like the average row length, number of rows and blocks, etc. 
  2. Column Statistics – The values in each column are analyzed to build stats like the total number of distinct values in a given column, or the number of null values. It also provides a histogram of the values that are present in the column. 
  3. Index Statistics – We all know the significance of database indexes for fast query processing. Oracle DB stores stats such as the Index clustering factor, the number of index levels, and more.
  4. System Statistics – The system stats such as CPU utilization, I/O performance, and their utilization are gathered and stored as well. System statistics are important to understand the overall pressure on the database server and ensure that it the load doesn’t increase beyond its operating capacity.

Where are these stats used? 

We talked about how these DB statistics are used for database performance tuning processes. But who and when are these stats used? 

There are many different entities that use these statistics. Some entities like the database optimizer use these for finding the best possible way to execute the query or even the fastest way to execute them. Optimizers rely on these statistics and they must be gathered and present. Other entities like DBAs use the statistics for Database Administration tasks.  For example, the hardware resource usage statistics are used by DBAs to check if the database is under bottlenecks and adjust them accordingly. 

On the other hand, general statistics captured by the databases are used by database administrators, DB developers, and data analysts for maintenance and performance tracking purposes. General statistics include server information like concurrency, throughput, and even response times. These stats are often used while taking decisions like increasing resources, tuning queries, and tracking performance metrics over time. 

Fetching DB Statistics

When it comes to statistics what is stored and when depends on the database itself. Each database also provides its own set of interfaces through queries or tools to fetch these statistics. In this section, let us look at how we can fetch statistics for various databases.

1. MySQL Database

For the MySQL database, we can use queries to fetch the statistics. 

Global statuses are statistics about the database and the interactions with it. We can use the “ SHOW GLOBAL STATUS” statement to fetch various types of information. For example, if we were to fetch all the connection error statistics, we can use the following command: 

SHOW GLOBAL STATUS LIKE ‘%Connection_errors%’

And the image below shows a sample response: 

Image source: Collect MySQL Statistics

In order to fetch optimizer statistics, we can query the “INFORMATION_SCHEMA.COLLATIONS”  table. An example query to fetch histogram on column statistics: 

SELECT TABLE_NAME, COLUMN_NAME,

          HISTOGRAM->>’$.”data-type”‘ AS ‘data-type’,

         JSON_LENGTH(HISTOGRAM->>’$.”buckets”‘) AS ‘bucket-count’

         FROM INFORMATION_SCHEMA.COLUMN_STATISTICS;

Here we use the HISTOGRAM keyword to fetch the data we are looking for. The image below shows a sample output for the query above. 

Image Source: MySQL Reference Manual

2. Oracle Database

The Oracle database provides a package called “dbms_stats” which allows us to fetch and view the various stats it gathers. 

We can write simple procedures to gather the stats we need. For example, here is a simple procedure to fetch all stats from a table:

BEGIN

dbms_stats.gather_table_stats(‘SH’,”TREASURY’);

END;

For the Oracle database, you start and stop the gathering workloads using appropriate procedures. Here’s an example on that: 

dbms_stats.gather_system_stats(‘start’) 

dbms_stats.gather_system_stats(‘stop’)

Image Source: Oracle Blogs

The image above shows a query being executed after a procedure to start gathering data has been executed. 

Conclusion

Databases collect many different types of statistics from the time they are set up. From information about tables, the columns, values in each column, to even hardware usage on which the database server is running are captured. While some data is used regularly by the databases like the index  statistics by the query optimizer, other types of collected data are used by DBAs during database performance tuning operations. 

Some databases like MySQL provide in-built tools where you can directly query for DB or index statistics while databases like Oracle DB required packages to do the same. Overall these statistics are very vital to monitor database performance and are also used as diagnostic tools when things go wrong.