11 Feb

VS Code Won’t Open After Unplanned Restart (Failed to deserialize the V8 snapshot blob)

Error

Fatal error in , line 0
Failed to deserialize the V8 snapshot blob. This can mean that the snapshot blob file is corrupted or missing.
FailureMessage Object: 00000071D3DFF2C0
1: 00007FF60A57E91F node::Buffer::New+130911
2: 00007FF60A3F7CDA IsSandboxedProcess+1850986
3: 00007FF608E1D798 v8::Isolate::Initialize+744
4: 00007FF60A3FD1A0 uv_mutex_unlock+21184
5: 00007FF607A28793 std::__1::__vector_base >::__end_cap+102515
6: 00007FF607AE56C8 v8::internal::JSMemberBase::JSMemberBase+54872
7: 00007FF6079513A0 Ordinal0+5024
8: 00007FF60D6FDB02 uv_random+18066594
9: 00007FFB77EF4034 BaseThreadInitThunk+20
10: 00007FFB781F3691 RtlUserThreadStart+33

Solution:

I reinstalled the Visual Studio Code without uninstalling which fixed the issue. In addition, it started without any loss to previous plugins and open projects.

20 Jan

PostgreSQL – BULK INSERTING from a delimited file, Most common errors in Windows

PostgreSQL – BULK INSERTING from a delimited file, Most common errors in Windows

PostgreSQL alternative for MS SQL Server BULK INSERT is the equally simple COPY command. In this article, we are going to take a step by step look at how to use this and possible errors. So I tried below command, which is completely correct. I faced a couple of issues when fixed it worked fine.

COPY part FROM '...Desktop\TPCH_001\pg_part.tbl' WITH (DELIMITER  '|')

Errors

ERROR: could not open file "file.tbl" for reading: Permission denied.
HINT:  COPY FROM instructs the PostgreSQL server process to read a file.  You may want a client-side facility such as psql's \copy. 
SQL state: 42501

To resolve this error, you need to update the permission of the file so that PostgreSQL can read them. Get to the folder/file, right-click and get to properties. Go to the Security tab. You need to add “Everyone” to the list. More info

ERROR:  extra data after last expected column data ending with dilimiter

This mean you have more columns the CSV than expected. As example you might have 4 column in CSV file and only 3 columns in the table. If it was other way around (table having more columns than file) we can define the needed columns in the query like below.

Read More
15 Jan

PostgreSQL – How to get the total index size used by each table in a database

PostgreSQL - How to get the total index size used by each table in a database

As per the documentation, To get the total size of all indexes attached to a table, you use the function. The pg_indexes_size() function accepts the OID or table name as the argument and returns the total disk space used by all indexes attached to that table.

We will use this funcion to get the index sizes of each table in the database.

select relname as table_name,
       pg_size_pretty(pg_indexes_size(relid)) as index_size
from pg_catalog.pg_statio_user_tables;
11 Nov

UVA 893 – Y3K Problem: Handling Python years above 9999

I was working on a problem in UVA online judge where I needed to do a simple data addition. However, the catch was year can go beyond 9999 (which is the limit in python). Below code is the python solution for this problem. I simply divided the date-delta with (1200 years, i.e. 438291 days) and added it separately after the computation. If you have unclear areas, let me know in the comments.

Read More
26 Oct

DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees

Abstract:

“Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today’s commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Unfortunately, the latest advancements like query stores provide only limited support for dynamic environments. This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser’s cost misestimates (stemming from unrealistic assumptions such as attribute value independence and uniformity of data distribution). We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to a fixed policy that is optimal with perfect hindsight. Our comprehensive empirical results demonstrate up to 75% speed-up on shifting and ad-hoc workloads and 28% speed-up on static workloads compared against a state-of-the-art commercial tuning tool.” [1]

[1] Full Paper: https://arxiv.org/abs/2010.09208

23 Aug

CH-BenCHmark for MS SQL Server – HTAP benchmarking

There aren’t many benchmarks which allow you to test your systems against a hybrid OLTP and OLAP workloads. CH-BenCHmark fills that gap by combining TPC-C and TPC-H. You can download the source from the linked site or you can use something like OLTPBench (a collection of benchmarks). However, the TPC-H modified queries are not written for SQL server. In this article, I will add the modified CH-BenCHmark OLAP queries for SQL Server.

Read More
19 Jul

NetworkX visualization with Graphviz (Example)

NetworkX visualization with Graphviz (Example)

If you are trying to visualize a nice graph with NetworkX, you should be exhausted by now. After all, NetworkX only provides basic functionality for graph visualization. The main goal of NetworkX is to enable graph analysis. For everything other than basic visualization, it’s advisable to use a separate specialized library. In my case, I choose Graphviz. It’s simplistic to get an attractive visualization of a NetworkX graph with Graphviz. I’m taking a gradual start, but you may skip to “NetworkX with Graphviz” directly.

Read More
18 Jul

Index Physical Structure Example; Multi-column Non-Clustered Index with Includes

Structure of a non-clustered multi-column index with include columns.

This article demonstrates the physical design of a multi-column non-clustered index with include-columns. Many examples on the internet only demonstrate the most simple version of an index with a single column. This article gives a proper view of an index with multiple columns through a simple example. Furthermore, you can see how the include-columns are stored, only at the leaf level of the tree.

Here we use a simple table ‘People’ with 6 columns (ID, First Name, Last Name, Age, Sex, Address). We assume we already have a clustered index created on the ID column (it will be almost no difference if there is no clustered index as well, explained at the end). Now we are going to create the non-clustered index as defined below.

CREATE NONCLUSTERED INDEX IX_NAME ON People
(FirstName, LastName)
INCLUDE (Age, Sex)
GO

Below diagram shows the structure of this non-clustered index.

Structure of a non-clustered multi-column index with include columns.
Read More
27 May

PRIVATE: A Privacy-preserving Data Analysis Language.

This is a new project, I’m working on from early last year. The motivation behind this project is to build a programing language that allows users to analyze private data without exposing sensitive information. Many data analysis languages (R, Python, MATLAB etc.) in the current market assume direct access to data. PRIVATE, on the other hand, performs a privacy calculation that will make sure only non-sensitive information is released to the user.

More Information:

This is the tutorial series by Simon Dennis, Founder of PRIVATE

Contribute to PRIVATE: Git-hub

31 Mar

Microsoft SQL Server 2016 Database with IMDB 2013 Dataset

Microsoft SQL Server 2016 Database with IMDB 2013 Dataset

Recently I wanted to run the JOB benchmark for an experiment. This benchmark uses an IMDB dataset, published in 2013. Initially, I had some trouble running the benchmark as it was designed for a PostgreSQL database. And the dataset was created in a UNIX system which can create issues when used in a Windows system. So I decided to share the exact steps you need to take to take in order to create a Microsoft SQL Server database with IMDB dataset. All the scripts used in the project can be found in this Git repo.

Read More

Last updated by .