HighendLogic Tech Blog: Areas of the Database

The backend of websites can be very dynamic and complex. This is just the nature of the beast. Not pushing all your logic and data to someones client instead giving them chunks at a time. And managing the access of this data is about 1/2 of the struggle of managing your application. In this article im going to layout a high level view of the different tiers of database tools you can apply to make sure that this area of your web application runs fast. There are three sections; Schema, Query, and Infrastructure.

Schema

The schema of your database lends itself to certain features that your are trying implement. For instance storing totals for an accounting application. Then there are ways to structure your database for certain management , if you have a list of categories and a list of books you can have a many to many relationship implement a table that bridges this relationship. If you are not familiar already you should get to know the concepts of key-value storage, normalization, and indexing(cardinality).

You also want to make sure you pay close attention to what types you are setting for your data. You can improve query performance and eliminate possible bugs by setting the right type and defaults.

NOTE: I dont think you should do cascade logic in between tables, although this syncs data I think this should be handled out of the storage relm not tied to the structure itself

Query

When accessing the data there are a couple of tricks / rules of thumb you can use to make this pare efficient. Sometimes it is your schema that has to lend itself to the feature , if so see part 1.

a. Only select columns that you are going to use from the data set. Select * is ambiguous and may cause unnecessary work for the database.
b. Be attentive to your joins. Including(left) and excluding(inner)joins should be use properly and you should be joining on indices*. When you have more than 2 joins take a step back and look at the usage of the data. Does all this data have to be retrieved at one time? You might be able to offset some of this data in cache or to the front-end instead of slowing down multiple queries retrieving reusable meta data.
c. If you are constructing an "IS IN" clause it might be faster to construct the "IS NOT IN" clause , a smaller item number means less checks.
d. Self excluding joins are a simple little trick to filter a select section of a table and perform better than sub queries. Here are some examples: http://adam-bernier.appspot.com/post/38001/the-self-exclusion-join

Infrastructure

Now that you have your data setup correctly and are accessing it sensibly we can still do some optimization base on usage and a high-level knowledge of the data's meaning.

The Master-Slave

concept in database design separates your data management into to parts a write database that you update and insert rows to and a read database where you select your data from. The databases are synced in some interval and priority is given to select statements in the "read" database. And they is further enhanced by having multiple read databases.

Horizontal partitioning

is a method of separating data by a range and storing those ranges in separate tables that have the same schema. One example could be a table for each month in a finance application.

Vertical partitioning

you might have already done in part 1 but if not this could be a big performance improvement to your table access if you have heavy variance in the access of groups of data that live in the same table. Vertical portioning is the practice of splitting a table up into multiple tables each one contains columns that have some relationship. This is not just creating tables based on separate features but more focused on efficient access. One table might be accessed by multiple features but only sections are updated or accessed by any one feature, never the entire row. Vertical partitioning breaks the table up and in return splits up the performance requirement for the single table allowing the features to not have to share resources.

These concepts can fix a good portion of your data access woes. Each of these concepts can be explored in more detail and i suggest you do just that. You may be better with sys admin stuff so you dive deep into setting up your Master-Slave cluster, or you my be a good programmer and decide to find all the logic tricks to make queries efficient and use the underlying db engine correctly. You should be at least familiar with all these concepts and be able to understand when they might need to be put to use.

HighendLogic
Tech Blog

Tuesday, June 17, 2014

Areas of the Database

No comments:

Post a Comment