Sunday, December 6, 2015

7 out of 10 in MYSQL

I recently went on an interview where the topic of my proficiency in database handling came up. I gave myself a 7 out of 10. To demonstrate this I brought up the idea of "self excluding joins." Here I will demonstrate what a self excluding join is and other topics that I think will give you a solid foundation in using database technologies (sql databases more specifically). At the end I will discuss what I think it takes to be above a 7. The interviewer stated that he thought self excluding joins would account for a "solid six." Of course these are subjective but that doesnt mean their is not merit to the rating. After you read this article if you are familiar or not with the topics ask yourself if you think you are a 7, are you better, or what do you think this set of information should be rated and what is it missing?

1) Basic structure, rules
  • use fixed small types: when defining the tables and columns you need to store your data. try to make the data types as small and strictly defined as possible while still making sure your restrictions dont hurt future data storing. This will ensure that the storage will stay small and easy to reason about. Also use Not Null when possible.
  • join columns of the same type: if you try to compare columns of different types in a join only one index will be usable. This will slow your queries. You will also avoid any comparison conversion that needs to go on for the engine to determine the results.
  • muticolumn indexes: there is an overhead with adding an index to a table on any data inserted into the table because the index must also be updated. In certain cases you may need to filter on multiple columns in a single query. In this case your select query can be sped up with a multicolumn index. If this is the case the first column in this index will also act as a substitute for a single column index.
  • use "LIMIT 1" when only expecting a single row return for your query;
2) Basic queries tricks:
  • speeding up pagination: when speeding up pagination try to use a data value pointer for your engine to start keeping responses at. For example to get the second list of articles with 50 per page and the last page ended with an id of 80:
    SELECT id, title , date , body FROM articles WHERE id > 80 ORDER BY id ASC LIMIT 50 ;
  • self excluding joins, are one way to avoid performing multiple queries or using inefficient inner queries. Self excluding joins use the ability knowledge of how joins work to infer comparisons of rows within a table. For instance if we have a list of classes and we want the class in each subject that has the most students. We can use the following query instead of having multiple or inner queries. Note in this query the table aliases and the understanding the relationship between the where clause and the join clause.
    SELECT c1.id, c1.name, c1.student_count, c1.subject FROM classes AS c1 LEFT JOIN classes AS c2 ON c1.subject = c2.subject AND c1.student_count < c2.student_count WHERE c2.id IS NULL
  • Speeding up group bys: Often in any logic language you get the idea that the same answer can be achieved multiple ways. Simply put f(g(c)) = g(f(c)). But sometimes one way is more efficient than the other. Referring to the article https://www.percona.com/blog/2015/06/15/speed-up-group-by-queries-with-subselects-in-mysql/ this can be the case with group bys. When performing a group by and join in the same query you may be able to speed up your query by reording the group by before the join. This can make the table comparison much smaller by reducing the gathering and removal of data that will not end up in the result.
3) DB Responsibility:
  • Use prepared statements, or understand data cleaning and sql injection attacks. This has nothing to do with the performance of queries but with making sure your data is secure.
  • Know when it is good to use transactions be comfortable with the syntax
  • Be familiar with using the EXPLAIN and ANALYZE commands

Of course besides the above knowledge the database user should also be familiar with basic knowledge of syntax and simple database literacy. If all of that is a 7 then to go above you must be better. For an 8 you should be better at query optimization being able to predict what the output of the EXPLAIN will be and also understanding what a better query solution will be and why. A 9 will have a solid grasp of creating a database from scratch. Given the business logic a skill level of 9 will setup a table structure that will take into account select and insert efficiency. This developer should also be able to foreshadow tradeoffs of other database schemas. A rating of 10 is reserved for a developer that can do all of the above but also understands differences in engine choices as well as configuration options of any engine. 10's can optimize a database from engine choice to table layout and finally to query creation.

No comments:

Post a Comment