For the purpose of tuning your database that has begun to slow down for some que
ID: 3693000 • Letter: F
Question
For the purpose of tuning your database that has begun to slow down for some queries, you are looking for the best plan to execute the following queries on the relation described below:
The relation you are dealing with is Employee with attributes ename, title, dname, and address; all are string fields of the same length.
The ename attribute is a candidate key.
The relation contains 10,000 pages.
There are 10 buffer pages.
(When answering the questions, make sure to describe the plan you have in mind.)
The second query is:
SELECT E.ename FROM Employee E WHERE E.title=‘Administrator’ AND E.dname=‘Finance’
Assume that only 10% of Employee tuples meet the condition E.title =’Administrator’, only 10% meet E.dname =’Finance’ and only 5% meet both conditions.
Suppose that a clustered B+ tree index on dname is (the only index) available. What is the cost of the best plan?
Suppose that a clustered B+ tree index on <dname, title, ename> is (the only index) available. What is the cost of the best plan?
Explanation / Answer
In this case,first we need to choose our indices carefully while creating queries.For that we should filter the indices and put only those indices which are necessary.To ensure that the Database administrator should know the statistics of the database (in our case this employee table).
By statistics,I mean information about indexes and their distribution with respect to each other.(Indexes are the columns /attribute names e.g. ename,title,dname etc).We should select thoses indexes first which contains less values.For example ename index equal to "ALICE SAMUEL" will return lesser number of rows as compared to title index equal to "ADMINSTRATOR"
This is called optimization which always tend to select the least expensive path that returns least number of rows in fastest time. The order of indexes while using where clause in a SQL Query is also important.
For example consider following queries ,
1) SELECT * FROM Employee E
WHERE E.Title= 'Administrator' AND E.Name= 'Max';
2) SELECT * FROM Employee E
WHERE E.Name= 'Max Williams' AND E.Title= 'Administrator' ;
Both of these queries will show different performances because SQL optimizers take only one index at a time .In the 1st query, Title will be choosen as index and all rows having title = Administrator will be selected first and then from those selected rows ,rows having ename = Max will be selected.In the second query ,reverse will happen.
Obviously , there will be lesser number of rows having a particular name as compared to rows having a particular title.So query 2 will have better performance.
Also , we should avoid using *(asterisk) in sql queries where all the columns are not necessary because it increases the performance time.
Best plan to execute the above method is to create optimized indexes.When creating indexes, estimate the number of unique values the column(s) will have for a particular field. For example, the title index in our sample database is not a good candidate for an index.As it can return potentially thousand number of rows which are then searched sequentically. Such indexes seldom help in speeding up SELECT queries and reduce the response time for DML queries.
two types of indexes are generally used: Composite and Clustered
Composite-Indexes containing more than one field are called composite index.
Clustered-Here the data is sorted physically according to the fields in the index.
ANSWER TO YOUR SECOND QUERY:
Complexity of a equality searching in a clustered B+ tree S = logF (1.5 B) where B= no.of data pages and F = Fan out of B+ tree. ( I have taken equality search according to the requirement of query given in question )
As we can estimate that in our database there will be more employee rows corresponding to finance dept as compared to employees having administrator as title.That's why the cost of best plan having index as dname only will be more as compared to index having administrator title.
But optimizers select one index at a time so,in the second case first all rows corresponding to finance dname will be selected followed by title and ename.It will take more time as compared to first case.So it will have more cost