There was a question on the JoS forum about the use and viability of SQL Hints. SQL hints are pretty much what the name implies. They are guides inserted by the developer that provide additional direction to the database on how to execute a particular SQL query. One of the big fallacies of SQL that is pushed across to students is that SQL code is portable between DB, so that what you write for one system can be easily ported to another. Whilst theoretically true in the white glove teachings of academia, in practice the reality is far different.
In the real world, each DBMS has its own quirks, limitations, features and best practices. The situation is analogous to what occurs with browser support of web technologies, only on a scale a million times removed but I’ll talk about this another time.
The other fallacy closely connected to this is that SQL allows us to concentrate on the high level logic and not worry about how a particular DBMS is operating under the hood. A quick chat to any experienced DB practitioner will quickly divest the innocent inquisitor of their naivety.
This brings us on to SQL hints. The general consensus amongst the respondents to the JoS question is that, though hints should not be used at all in a production environment, the DB does at times seem to make some whacky decisions.
Let’s think about what is going on and why. Firstly, SQL hints are similar to training whips. They say to the DBMS “You better do exactly as I say!”. And so the DB does, and for the early lifetime of the application, all is well. However, later on, the situation has changed. The amount of data to search has changed and the DB is thinking, with a bit of independence, “Hmmm, maybe this isn’t the best way as it’s going to take more time”. However, it realises that it’s been given an order and like a slave to an inflexible master, it does as it was told.
The problem with SQL hints is that they operate independently of local knowledge. As long as everything is as it was when the hint was added, things are rosy but as soon as any thing changes, they can force the DB into working along an inefficient execution plan.
Most DB in production operate using some sort of analysis based on the data set and tools available to dictate the course of action. Unfortunately, they can’t analyse the whole data set at execution time (as if it has done that, then it might as well just return the results) and so they analyse “statistics” data that provide a reflection of the full dataset.
This is usually the cause of the problems which lead developers to use hints in the first place. If the statistics are stale, and no longer reflect what is actually in the system, then any analysis based on these statistics will be flawed.
However, with that said, there are often situations in which the developer can be fairly confident that they can beat the DB because they have knowledge of the requirements behind the query.
For example, in Oracle, if you want to get rows N to M of a particular result set, you would generally do something like
Select * from (
select a.*, ROWNUM rnum
From ([result set]) a
Where rownum <= :M
)
And rnum >= :N
In this scenario, the following query would perhaps be better:
Select * from (
select /*+ FIRST_ROWS */ a.*, ROWNUM rnum
From ([result set]) a
Where rownum <= :M
) And rnum >= :N
The reason we can be assured that this is better is that the situation in which something like this would likely be deployed is for paginated reports. We are want the information for the pages as early as possible so that we can pass them to the user as soon as possible.
However, my thoughts are that it is better to use hints sparingly. Often, it’ll be better to re-examine what and why you are doing a particular piece of logic before trying to get the DB to run that logic as you think is suitable.