23 Aug

CH-BenCHmark for MS SQL Server – HTAP benchmarking

There aren’t many benchmarks which allow you to test your systems against a hybrid OLTP and OLAP workloads. CH-BenCHmark fills that gap by combining TPC-C and TPC-H. You can download the source from the linked site or you can use something like OLTPBench (a collection of benchmarks). However, the TPC-H modified queries are not written for SQL server. In this article, I will add the modified CH-BenCHmark OLAP queries for SQL Server.

Read More
18 Jul

Index Physical Structure Example; Multi-column Non-Clustered Index with Includes

Structure of a non-clustered multi-column index with include columns.

This article demonstrates the physical design of a multi-column non-clustered index with include-columns. Many examples on the internet only demonstrate the most simple version of an index with a single column. This article gives a proper view of an index with multiple columns through a simple example. Furthermore, you can see how the include-columns are stored, only at the leaf level of the tree.

Here we use a simple table ‘People’ with 6 columns (ID, First Name, Last Name, Age, Sex, Address). We assume we already have a clustered index created on the ID column (it will be almost no difference if there is no clustered index as well, explained at the end). Now we are going to create the non-clustered index as defined below.

CREATE NONCLUSTERED INDEX IX_NAME ON People
(FirstName, LastName)
INCLUDE (Age, Sex)
GO

Below diagram shows the structure of this non-clustered index.

Structure of a non-clustered multi-column index with include columns.
Read More
31 Mar

Microsoft SQL Server 2016 Database with IMDB 2013 Dataset

Microsoft SQL Server 2016 Database with IMDB 2013 Dataset

Recently I wanted to run the JOB benchmark for an experiment. This benchmark uses an IMDB dataset, published in 2013. Initially, I had some trouble running the benchmark as it was designed for a PostgreSQL database. And the dataset was created in a UNIX system which can create issues when used in a Windows system. So I decided to share the exact steps you need to take to take in order to create a Microsoft SQL Server database with IMDB dataset. All the scripts used in the project can be found in this Git repo.

Read More
27 Jan

What-if Analysis with SQL server (Hypothetical Indexes) – Using python

What-if Analysis with SQL server (Hypothetical Indexes) – Using python

If you are a Database administrator or a developer working with a transaction database, you might have come across this problem

“Is it worthy to build that index?”

Exact answer for that question is only known once you build it. However, luckily SQL server provides you with functionality to check the workload performance under hypothetical indexes (without actually creating them)

You can find more information about hypothetical indexes here.

I will just provide you with a simple python code that will help you with the hypothetical index creation. Example code will compose of 3 parts

  1. Index creation
  2. Enabling the index (unlike the normal indexes you need to enable them before using)
  3. Executing the query under the hypothetical index

Index creation

def hyp_create_index_v2(connection, schema_name, tbl_name, col_names, idx_name, include_cols=()):
    """
    Create an hypothetical index on the given table

    :param connection: sql_connection
    :param schema_name: name of the database schema
    :param tbl_name: name of the database table
    :param col_names: string list of column names
    :param idx_name: name of the index
    :param include_cols: columns that needed to be added as includes
    """
    query = f"CREATE NONCLUSTERED INDEX {idx_name} ON {schema_name}.{tbl_name} ({', '.join(col_names)}) " \
            f"INCLUDE ({', '.join(include_cols)}) WITH STATISTICS_ONLY = -1"
    cursor = connection.cursor()
    cursor.execute(query)
    connection.commit()
    logging.info(f"Added HYP: {idx_name}")

Enabling the indexes

def hyp_enable_index(connection):
    """
    This enables the hypothetical indexes for the given connection. This will be enabled for a given connection and all
    hypothetical queries must be executed via the same connection
    :param connection: connection for which hypothetical indexes will be enabled
    """
    query = f'''SELECT dbid = Db_id(),
                    objectid = object_id,
                    indid = index_id
                FROM   sys.indexes
                WHERE  is_hypothetical = 1;'''
    cursor = connection.cursor()
    cursor.execute(query)
    result_rows = cursor.fetchall()
    for result_row in result_rows:
        query_2 = f"DBCC AUTOPILOT(0, {result_row[0]}, {result_row[1]}, {result_row[2]})"
        cursor.execute(query_2)

Executing the query

def hyp_execute_query(connection, query):
    """
    This hypothetically executes the given query and return the estimated sub tree cost. If required we can add the
    operation cost as well. However, most of the cases operation cost at the top level is 0.

    :param connection: sql_connection
    :param query: query that need to be executed
    :return: estimated sub tree cost
    """
    hyp_enable_index(connection)
    cursor = connection.cursor()
    cursor.execute("SET AUTOPILOT ON")
    cursor.execute(query)
    stat_xml = cursor.fetchone()[0]
    cursor.execute("SET AUTOPILOT OFF")
    query_plan = QueryPlan(stat_xml)
    return query_plan.estimated_sub_tree_cost, query_plan.index_seeks

 

 

Last updated by .