15 Jan

Jaro–Winkler Similarity – How to correctly count the number of transpositions

Jaro–Winkler Similarity is a widely used similarity measure for checking the similarity between two strings. Being a similarity measure (not a distance measure), a higher value means more similar strings.
You can read on basics and how it works on Wikipedia. It’s available in many places and I’m not going into that. However, none of these sites talks about how to correctly count the number of transpositions in complex situations.

Transposition is defined as “matches which are not in the same position”. For a simple example like ‘cart’ vs ‘cratec’ it is obvious with 4 matches and 2 transpositions (‘r’ and ‘a’ are in not in the same position). But for 'xabcdxxxxxx' vs 'yaybycydyyyyyy' in the first look, all letters seem to be out of position but there are no transpositions (4 matches). For very similar 'xabcdxxxxxx' vs 'ydyaybycyyyyyy', there are 4 transpositions (4 matches). With these examples, it might not be trivial to count the number of transpositions. 

Read More
11 Nov

Sri Lanka Holidays Calendar 2020 for office365, Google etc. in ics format

Sri Lanka Holidays Calendar 2016 for Outlook & Google

Honouring everyone’s request I added the Sri Lankan Holiday calendar for 2020. This will allow you to add the Sri Lanka Holidays Calendar 2020 to your digital calendar (Most of the online digital calendars that we use these days don’t have inbuilt option to add the holiday calendar for Sri Lanka). Download 2020 Sri Lankan Holiday Calendar  (For Outlook get this: Download 2020 Sri Lankan Holiday Calendar – Outlook)

Adding to office365

  1. Add calendar
  2. From file (don’t go to the holiday calendar, Sri Lanka is not available there)
  3. Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED colour)
  4. Save
Read More
27 Jan

What-if Analysis with SQL server (Hypothetical Indexes) – Using python

What-if Analysis with SQL server (Hypothetical Indexes) – Using python

If you are a Database administrator or a developer working with a transaction database, you might have come across this problem

“Is it worthy to build that index?”

Exact answer for that question is only known once you build it. However, luckily SQL server provides you with functionality to check the workload performance under hypothetical indexes (without actually creating them)

You can find more information about hypothetical indexes here.

I will just provide you with a simple python code that will help you with the hypothetical index creation. Example code will compose of 3 parts

  1. Index creation
  2. Enabling the index (unlike the normal indexes you need to enable them before using)
  3. Executing the query under the hypothetical index

Index creation

Enabling the indexes

Executing the query

 

 

16 Dec

Sri Lanka Holidays Calendar 2019 for office365, Google etc. in ics format

Sri Lanka Holidays Calendar 2016 for Outlook & Google

Honouring everyone’s request I added the Sri Lankan Holiday calendar for 2019. This will allow you to add the Sri Lanka Holidays Calendar 2019 to your digital calendar (Most of the online digital calendars that we use these days don’t have inbuilt option to add the holiday calendar for Sri Lanka). Download 2019 Sri Lankan Holiday Calendar  (For Outlook get this: Download 2019 Sri Lankan Holiday Calendar – Outlook)

Adding to office365

  1. Add calendar
  2. From file (don’t go to the holiday calendar, Sri Lanka is not available there)
  3. Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED colour)
  4. Save

Read More

26 Dec

Sri Lanka Holidays Calendar 2018 for office365, Google etc. in ics format

Sri Lanka Holidays Calendar 2016 for Outlook & Google

As Everyone requested I added the Sri Lankan Holiday calendar for 2018 . This will allow you to add the Sri Lanka Holidays Calendar 2018 to your digital calendar (Most of the online digital calendars that we use these days don’t have in built option to add the holiday calendar for Sri Lanka). Download ICS format 2018 Sri Lankan Holiday Calendar

Adding to office365

  1. Add calendar
  2. From file (don’t go to holiday calendar, Sri Lanka is not available there)
  3. Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED color)
  4. Save

Read More

07 Oct

Evolution of a Data Platform

Evolution of a Data Platform

Being a startup is “great” as a feeling. Startup culture is filled with so much positive energy to get the things done. In this process of getting things done, one thing we miss is the proper design in a data platform. It is understandable that people start with a simple data platform and evolve it over the time. Starting with the perfect data platform is less practical when we consider the cost involved and the lack of domain knowledge in initial stages. We should all admit that proper data platform costs a lot, which sometimes not efficient for a startup. My personal opinion is to start small and to evolve with time. Here we will talk about common problems that we faced in a start-up data platform.

Lacking Scalability

Evolution of a Data Platform

Scalability issues impact in several ends. Startup systems are not meant to scale until the end of time. Sometimes they become impossible to scale, sometimes scaling requires so much additional effort that they need a separate team working on scaling the data platform. Sometimes scaling is involved with a large cost that is rapidly increasing. Sometimes scaling increases the overall system complexity and reduce maintainability. If I summarize main impact area of scalability costs, it will be as follows,

  • Being impossible to scale
  • High Cost of scaling
  • Increasing manual tasks of Scaling
  • Increase in system complexity while scaling
  • Reduction of system maintainability

Proper data platform design should answer above concerns. Proper design should be scalable beyond the foreseeable future. While scaling it should minimize the cost additions, remove any complexity additions and should involve minimal or no manual effort.

Read More

17 Aug

Negombo Toastmasters Club 8th Installation Ceremony for year 2017/2018

Negombo Toastmasters Club 8th Installation Ceremony for year 2017/2018

Negombo Toastmasters held its eighth Executive Committee Installation Ceremony for 2017/2018 on 12 August 2017 at Paradise Beach Hotel, Negombo. The Chief Guest was DTM Arjuna ⁠⁠⁠Jayadarshana and the Guest of Honor was TM Sudath Fernando who is a charter member of Negombo Toastmasters Club. Area Director for District 89 Area H4 Sudath Ranaweera was also present at the occasion. Another two of our charter members TM Mohammed Marzook, Division Director TM Anura Perera also joined us on this important day.

Negombo Toastmasters Club 8th Installation Ceremony for year 2017/2018

The following officers for 2017/2018 were installed at the ceremony.

  • President – TM Tony Ukwattage
  • Vice President/Education – TM Bavanitha Rajagugan
  • Vice President/Membership – TM Gihan Wijayatilake
  • Vice President/Public Relations – TM Romesh Malinga Perera
  • Secretary – TM Tissera Maduranga
  • Treasurer – TM Buddika Liyanage
  • Sergeant-at-Arms – TM Shehan Gunasinghe

Read More

10 Aug

Performance evaluation between different Druid roll-up levels

Introduction

In most datasets with a large number of events, going through individual events is less important. Most of the data use cases are around the summarized data. Druid summarizes this raw data at ingestion time using a process refer to as “roll-up”. Roll-up is the highest granularity of the data and will be able to query only up to the roll-up granularity. However, there are some scenarios where it’s important to have more granular data. However keeping more granular data comes at a cost. We did a small experiment to identify how different roll-up levels affect performance.

Rolling up data can dramatically reduce the size of data that needs to be stored (up to a factor of 100). Druid will roll up data as it is ingested to minimize the amount of raw data that needs to be stored. This storage reduction does come at a cost; as we roll up data, we lose the ability to query individual events. Phrased another way, the rollup granularity is the minimum granularity you will be able to explore data at and events are floored to this granularity. Hence, Druid ingestion specs define this granularity as the queryGranularity of the data. The lowest supported queryGranularity is millisecond. -http://druid.io

Dataset and Setup

We choose a CSV data set with millions (150M+) of records which contain sales data spanning across 2 years. CSV file was around 6 GB in physical size. This is a narrow data set with 3 dimensions and 2 metrics. We had 2 servers where all the components are deployed.

m4 large – Coordinator, Brokers, Overload nodes
r3 large – Middle managers and Historical nodes

Read More