In times of the recent tragic Easter attacks and the insurgency in our home country Sri Lanka, The Sri Lankan Graduates’ Society, The University of Melbourne, organized a silent candlelight vigil in paying respect to the lives lost and to commemorate the much-needed unity among us to walk through these difficult times together.
Recently I wanted to run the JOB benchmark for an experiment. This benchmark uses an IMDB dataset, published in 2013. Initially, I had some trouble running the benchmark as it was designed for a PostgreSQL database. And the dataset was created in a UNIX system which can create issues when used in a Windows system. So I decided to share the exact steps you need to take to take in order to create a Microsoft SQL Server database with IMDB dataset. All the scripts used in the project can be found in this Git repo.
Jaro–Winkler Similarity is a widely used similarity measure for checking the similarity between two strings. Being a similarity measure (not a distance measure), a higher value means more similar strings. You can read on basics and how it works on Wikipedia. It’s available in many places and I’m not going into that. However, none of these sites talks about how to correctly count the number of transpositions in complex situations.
Transposition is defined as “matches which are not in the same position”. For a simple example like ‘cart’ vs ‘cratec’ it is obvious with 4 matches and 2 transpositions (‘r’ and ‘a’ are in not in the same position). But for 'xabcdxxxxxx' vs 'yaybycydyyyyyy' in the first look, all letters seem to be out of position but there are no transpositions (4 matches). For very similar 'xabcdxxxxxx' vs 'ydyaybycyyyyyy', there are 4 transpositions (4 matches). With these examples, it might not be trivial to count the number of transpositions.
Honouring everyone’s request I added the Sri Lankan Holiday calendar for 2020. This will allow you to add the Sri Lanka Holidays Calendar 2020 to your digital calendar (Most of the online digital calendars that we use these days don’t have inbuilt option to add the holiday calendar for Sri Lanka). Download 2020 Sri Lankan Holiday Calendar (For Outlook get this: Download 2020 Sri Lankan Holiday Calendar – Outlook)
Adding to office365
Add calendar
From file (don’t go to the holiday calendar, Sri Lanka is not available there)
Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED colour)
If you are a Database administrator or a developer working with a transaction database, you might have come across this problem
“Is it worthy to build that index?”
Exact answer for that question is only known once you build it. However, luckily SQL server provides you with functionality to check the workload performance under hypothetical indexes (without actually creating them)
You can find more information about hypothetical indexes here.
I will just provide you with a simple python code that will help you with the hypothetical index creation. Example code will compose of 3 parts
Index creation
Enabling the index (unlike the normal indexes you need to enable them before using)
Honouring everyone’s request I added the Sri Lankan Holiday calendar for 2019. This will allow you to add the Sri Lanka Holidays Calendar 2019 to your digital calendar (Most of the online digital calendars that we use these days don’t have inbuilt option to add the holiday calendar for Sri Lanka). Download 2019 Sri Lankan Holiday Calendar (For Outlook get this: Download 2019 Sri Lankan Holiday Calendar – Outlook)
Adding to office365
Add calendar
From file (don’t go to the holiday calendar, Sri Lanka is not available there)
Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED colour)
As Everyone requested I added the Sri Lankan Holiday calendar for 2018 . This will allow you to add the Sri Lanka Holidays Calendar 2018 to your digital calendar (Most of the online digital calendars that we use these days don’t have in built option to add the holiday calendar for Sri Lanka). Download ICS format 2018 Sri Lankan Holiday Calendar
Adding to office365
Add calendar
From file (don’t go to holiday calendar, Sri Lanka is not available there)
Select file to upload and a calendar that holidays will be added (Create a separate new calendar is recommended. Use RED color)
Being a startup is “great” as a feeling. Startup culture is filled with so much positive energy to get the things done. In this process of getting things done, one thing we miss is the proper design in a data platform. It is understandable that people start with a simple data platform and evolve it over the time. Starting with the perfect data platform is less practical when we consider the cost involved and the lack of domain knowledge in initial stages. We should all admit that proper data platform costs a lot, which sometimes not efficient for a startup. My personal opinion is to start small and to evolve with time. Here we will talk about common problems that we faced in a start-up data platform.
Lacking Scalability
Scalability issues impact in several ends. Startup systems are not meant to scale until the end of time. Sometimes they become impossible to scale, sometimes scaling requires so much additional effort that they need a separate team working on scaling the data platform. Sometimes scaling is involved with a large cost that is rapidly increasing. Sometimes scaling increases the overall system complexity and reduce maintainability. If I summarize main impact area of scalability costs, it will be as follows,
Being impossible to scale
High Cost of scaling
Increasing manual tasks of Scaling
Increase in system complexity while scaling
Reduction of system maintainability
Proper data platform design should answer above concerns. Proper design should be scalable beyond the foreseeable future. While scaling it should minimize the cost additions, remove any complexity additions and should involve minimal or no manual effort.