Q: Explain inner and outer SQL joins?
A: Joins allow database users to combine data from one table with data from one or more other tables (or views, or synonyms). Tables are joined two at a time making a new table containing all possible combinations of rows from the original two tables.
Inner joins: Chooses the join criteria using any column names that happen to match between the two tables. The example below displays only the employees who are executives as well.
SELECT emp.firstname, exec.surname
FROM employees emp, executives exec
WHERE emp.id = exec.id;
Left Outer joins: A problem with the inner join is that only rows that match between tables are returned. The example below will show all the employees and fill the null data for the executives.
SELECT emp.firstname, exec.surname
FROM employees emp left join executives exec ON emp.id = exec.id;
Right Outer join: A problem with the inner join is that only rows that match between tables are returned. The example below will show all the executives and fill the null data for the employees.
SELECT emp.firstname, exec.surname
FROM employees emp right join executives exec ON emp.id = exec.id;
Full outer join: To cause SQL to create both sides of the join
SELECT emp.firstname, exec.surname
FROM employees emp full join executives exec ON emp.id = exec.id;
Self join: A self-join is a join of a table to itself. If you want to find out all the employees who live in the same city as employees whose first name starts with “Peter”, then one way is to use a sub-query as shown below:
SELECT emp.firstname, emp.surname
FROM employees emp
WHERE city IN (SELECT city FROM employees where firstname like ‘Peter’)
Q. Explain a sub-query? How does a sub-query impact on performance?
A. It is possible to embed a SQL statement within another. When this is done on the WHERE or the HAVING statements, we have a subquery construct.
Q. What is subquery useful for?
A. It is used to join tables and there are cases where the only way to correlate two tables is through a subquery.
SELECT emp.firstname, emp.surname
FROM employees emp
WHERE emp.id NOT IN (SELECT id FROM executives);
There are performance problems with sub-queries, which may return NULL values. The above sub-query can be re-written as shown below by invoking a correlated sub-query.
SELECT emp.firstname, emp.surname
FROM employees emp
WHERE emp.id NOT EXISTS (SELECT id FROM executives);
The above query can be re-written as an outer join for a faster performance as shown below:
SELECT emp.firstname, exec.surname
FROM employees emp left join executives exec on emp.id = exec.id AND exec.id IS NULL;
The above execution plan will be faster by eliminating the need for a sub-query.
Q. Can you give SQL examples for the following scenarios?
A.
Scenario 1: Retrieve first name and sum of order qty for order sum greater than 25, and group the order sum by first name.?
SELECT FIRSTNAME,SUM(QTY)
FROM orders
GROUP BY FIRSTNAME
HAVING SUM(QTY)>25;
Scenario 2: Retrieve all employees whose name has a String "au"?
SELECT *
FROM employees emp
WHERE emp.firstname LIKE ‘%au%’;
Scenario 3: select account number and adviser code for a given adviser code, but restrict the returned values to supplied min and max limit. For example, record 1 to record 10, record 11 to record 20, etc.
The SQL for the above scenario needs to use some custom SQL parameters and functions. The example below uses the ROWNUM variable that keeps track of the row numbers in Oracle. The nested query shown below can limit the returned results based on a lower and upper limit.
select * from
(select a.ACCOUNT_NO, a.ADVISER_CODE, ROWNUM rnum from
( Select * from accounts where ADVISER_CODE=:advCode order by advCode) a
where ROWNUM <= :max_row)
where rnum >= :min_row
Q: In your experience, what are some of the common mistakes developers make?
A:
1. Cartesian joins
SQL Joins are used to relate information in different tables. A Join condition is a part of the sql query that retrieves rows from two or more tables. A SQL Join condition is used in the SQL WHERE Clause of select, update, delete statements.
The Syntax for joining two tables is:
SELECT col1, col2, col3
FROM table_name1, table_name2
WHERE table_name1.col2 = table_name2.col1;
If a sql join condition is omitted as shown below
SELECT col1, col2, col3
FROM table_name1, table_name2
or if the condition is invalid, then the join operation will result in a Cartesian product. The Cartesian product returns a number of rows equal to the product of all rows in all the tables being joined. For example, if the first table has 20 rows and the second table has 10 rows, the result will be 20 * 10, or 200 rows. This query will take a long time to execute.
2. Use of SELECT *
For example, a common misuse of SELECT * is to extract a set of all employees and to insert them into another table called Contractors with the same structure
INSERT INTO Contractors
SELECT * FROM Employees WHERE emp_type = 'C';
The above query does the job, however, one day business requirements change and two new columns are added to the Employees table:
ALTER TABLE Products
ADD effective_start_date DATETIME, effective_end_date DATETIME;
All of sudden the query that extracts from the Employees table and insert records into the Contractor table results in error.
"Insert Error: Column name or number of supplied values does not match table definition."
The fix is to explicitly list the column names in the query:
INSERT INTO Contractors (emp_id, emp_name)
SELECT emp_id, emp_name FROM Employees WHERE emp_type = 'C';
3. Embedding User Interface (UI) layer logic into Data layer via SQL. For example
SELECT '<a href="http://www.blogger.com/...">' + name ' </a>'
The above code is a bad practice because it tightly couples your UI Layer with the Data Layer.
4. Not using Prepared statements. Prepared statements are more secured and efficient than the ordinary statements. Prepared statements prevent SQL injection attacks.
5. Using the predicate "LIKE" in indexed columns. The "LIKE" predicate typically performs a search without the normal performance benefit of indexes. Using '=', '<>', etc instead of "LIKE" will increase performance. Also should be aware of that case sensitivity (e.g., 'A' versus 'a') may be different based upon database Server or configuration.
6. Over use of cursors in stored procedures. If possible, avoid using SQL stored proc cursors. They generally use a lot of Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task.
Here are some alternatives to using a cursor:
- Use WHILE LOOPS
- Use temp tables
- Use materialized views (allowing you to pre-join complex views and pre-compute summaries for super-fast response time. )
- Use derived tables
- Perform multiple queries
- Use correlated sub-queries
- Use the CASE statement
Q. Can you give some database performance tuning tips based on your experience?
A.
1. Materialized views are one of the important SQL tuning tools in Oracle. Instead of the entire company accessing a single database server, user load can be distributed across multiple database servers with the help of materialized views in Oracle. Through the use of multi tier materialized views, you can create materialized views based on other materialized views, which enables you to distribute user load to an even greater extent because clients can access materialized view sites instead of master sites. To decrease the amount of data that is replicated, a materialized view can be a subset of a master table or master materialized view.
- Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data.
- It allows you to pre-join complex views and pre-compute summaries for super-fast response times. Unlike an ordinary view, which does not take up any storage space or contain any data, a materialized view provides indirect access to table data by storing the results of a query in a separate schema object.
- You can define a materialized view on a base table, partitioned table or view and you can define indexes on a materialized view.
- A materialized view can be stored in the same database as its base tables (improves query performance through query rewrite) or in a different database.
It is also worth noting that this capability may not suit best for too frequent activities as in online transaction processing (i.e. OLTP) environments. In other databases equivalent functionalities can be achieved through triggers on base tables that would update/insert aggregate or dimension tables and queries can be executed against these aggregated or dimension tables as oppose to the base tables.
2. As a rule of thumb, every table should have a clustered index. Generally, but not always, the clustered index should be on a column that increases in one direction (i.e. monotonic) such as an identity column, or some other column where the value is increasing and is unique. In many cases, the primary key is the ideal column for a clustered index. Create an index on any column that is a foreign key. If you know it will be unique, set the flag to force the index to be unique.
3. Avoid temp tables as much as you can, but if you need a temp table, create it explicitly using Create Table #temp.
4. Only return the columns and the rows you need.
5. Avoid full table scan where possible. The full table scan can be caused by
- No WHERE condition.
- No index on any type of field in the WHERE clause.
- NOT IN predicate that is easier to write (replace NOT IN with a left outer join).
- WHERE clauses like column_name is not null, condition 1 or condition 2, column_name between ... and ..., not equality comparisons
- Use of SQL “LIKE clause” operator to find a string within a large table column (e.g. VARCHAR(2000), CLOB, BLOB).
- DISTINCT, ANY, and ALL.
Other relevant questions and answers
- Database interview Questions and answers
- How will you store a tree data structure in a relational database?