Facebook

Sunday, July 1, 2018

Performance Optimization of Relational Database

Case study based on the experience in performance enhancement of an application done with Qburst Technologies Pvt Ltdhttp://www.qburst.com/

MySql Based Applications - Improve Performance with Application Level Optimization


MySql is one of the data base solution used by most of the small and medium scale organization for their web applications. MySql delivers good performance with well designed table structures an relations. Most web applications deals with mysql performance issues when volume of data grows beyond expectations.
Usually when we design web based applications with mySql in the backend we follow normalized design pattern with look-up tables and foreign key relations. This kind of normalized design ensures good performance and less read write time when the data volume is within expected limits. (ie, practically there is an upper limit on volume of data that can be handled with relational databases and is defined by type of data and nature of data access).
Data volume within limit:
Usually large volumes of data are populated when we deal with real world scenario like daily activities, stock market data, social media data etc which are usually dependent on human interactions. We can predict user behavior only upto a limit and so is the volume of data being populated in such scenarios.

When data volume grows:
If we go for normalized design, we should keep similar kind of data in same place and searching for mutually related data may force us to go for self joins in the heavily populated tables. Self join of heavily populated tables is going to be a big issue in the performance point of view. When data volume grows the normalized structure may not deliver expected performance due to many factors.
Barriers:
One of them is foreign key relations and another is the normalized structure itself. We use foreign keys to keep track of relational data kept in normalized tables, which can cause issues like increased insert/update time caused by the overhead on the database to create and maintain indexes.
Alternatives:
There is a cap on volume of data that can be effectively handled by mysql with normalized structure, which force us to think about alternatives like non-relational database like mongo db. An issue we are going to face with non-relational database is handling relation of mutually dependent real world data. There are ways to tackle the situation.
Budget:
Even though non relational databases promise faster reading, there are issues like poor write speed. Obviously there are techniques to tackle the issues in non relational database, most of them are heavy in terms of cost and/or effort for small and medium scale organizations.
Economic alternatives:
The main question that gains popularity is "Is it possible to extend the practical upper limit on volume of data that can be handled effectively?" and the answer is YES
We can increase the upper limit on volume of data that can be handled with mysql by following some techniques like de-normalization, archiving etc.
MySql can respond to normal select queries better compared to select queries including joins with same or other tables.
De-normalization can help improve read time in many ways. One of them is keeping data in a purpose oriented manner. De-normalization can improve performance of data read by duplicating data to multiple tables that are meant for display purpose, thus avoiding joins.
For example to make display of data faster, we can keep data optimized for display in a separate table that is used for display regardless of duplication. Now one possible issue is to manage consistency of data. In normalized and properly designed databases, consistency of data is assured by database to a large extend. But as we keep duplicate copies of data for making data read faster, the responsibility of keeping the data kept in multiple locations(data duplication) consistent should be taken from database to your application. This approach can increase the coding effort that take care of data handling and update in multiple copies of data.
How:
De-normalization can help in reducing reading time if we include additional tables that can keep collective data. We can do this by creating rules for data display in all possible scenarios and combined display rules can be handled together with less overhead and duplicate data to multiple tables in a way that each one is optimized for its purpose. This can improve performance of data population at front end, at the same time increasing processing overhead to the code.
Archiving:
Real world data usually become in consistent or less accessed as time pass by. For example if an application manages marketing statistics, data of last 6 months or 1 year may be relevant and previous data becomes less relevant or less frequently accessed. The less relevant or less accessed data can be moved from main database to an archiving system, which can be another database or even tapes or disks based on demand for old data. If the user needs previous data, which can be provided on demand.
Observe your data:
There are techniques that can make mysql perform better but depends on structure and behaviour of data. Structured monitoring can give an idea on the root cause of the problem and based on which proper solution can be formed. Log your data access, create metrics of data access and find data access peeks and performance falls. These peeks and falls can explain the real problem/root cause. Solution can be formed based on this real problem.

Based on optimization tasks I worked on a project for Qburst Technologies Pvt Ltd.

References: 

https://www.toptal.com/sql-server/sql-database-tuning-for-developers


https://www.toptal.com/database/the-definitive-guide-to-nosql-databases