SQL In Simple English - Part II

2005-06-11

**I read SQL In Simple English Part I. I shall tell my friends that I know a lot of SQL !!
** Wait.. You still have a long way to go before you know a lot of SQL. But to know even the basics of SQL you need to be familiar with one topic.. called Joins. In this article I shall explain Joins and some other interesting aspects of SQL which all beginners must know.
**
But why? I read Part I and I was able to implement a simple database program very easily. Why would I need to learn anything more?
** You might have implemented some database program.. but I am sure it would be a very simple one. You may even be able to implement a good database program without Joins. But you wouldn't be using the features of SQL that let you do the same work much easily. Without Joins, you may manage SQL.. but with Joins you could do things very easily.

So what are Joins?
Suppose you have a database which has 2 tables in it. And you want some information which is spread over both the tables. Typically you would have to run a query for each table to get the data from each separate table. But with Joins you could get data from both the tables with just one query. There are quite a few types of Joins. I shall begin by explaining the simplest one first.

Give me an example..
To explain this topic, I shall use the tables that I have shown below. It's just some sample values. It should be sufficient for you to grasp the concept.

Table Name : authors

|

author_id

|

** lastname **

|

** firstname **

|

** company **

---|---|---|---

1

|

Hunter

|

Jason

|

O' Reilly

2

|

Kanetkar

|

Yashwant

|

BPB Publications

Table Name : books

** book_id **

|

** title **

| author_id |

pages

---|---|---|---

1

|

Let Us C

|

2

|

100

2

|

Let Us C++

|

2

|

200

3

|

Java Servlet Programming

|

1

|

300

4

|

C Projects

|

2

|

400

Initially I shall show you the way to get the work done with the simplest form of Join using the WHERE clause. You would be using only those features that you have already learnt.

Example : Suppose you want the names of all the authors as well as the books they have written, what would you do? Instead of using 2 separate queries to get the work done you could use one query as follows

SELECT firstname,lastname,title FROM authors,books WHERE authors.author_id = books.author_id

This query selects 3 columns (firstname, lastname and title) from the joined table of 2 different tables (authors and books) depending on the criteria mentioned. In this case it selects those rows from the joined table where the value of the authors.author_id field is equal to books.author_id in the joined table. So both the author_id values should be equal. This query would return 4 rows of data. (Try out these examples so that you can see the results yourself)

What is this joined table? Where did 2 author_id values come from? I didn't understand that example at all !!!
Hold on!! You need some detailed explanation to understand all of that. First and foremost you should understand that the basic difference between the query discussed above and all previous queries you have learnt till now - and that is that now you are asking the query to SELECT from _**more than 1 table . **

In our case SELECT .... from authors,books .... The moment you do so, the way queries work might surprise you. To understand it clearly, you can assume the following. The moment you ask a query to select from more than one table, a temporary table is created which has all possible combinations of the contents of the 2 tables. In our case this could be represented with the following table (Do not bother about the order of the columns.. just look at the number of rows in this temporary table)

This table below could be a result of a SELECT * FROM authors,books _

author.author_id

|

lastname

|

firstname

|

company

|

** book_id **

|

** title **

|

books.author_id

|

pages

---|---|---|---|---|---|---|---

1

|

Hunter

|

Jason

|

O Reilly

|

1

|

Let Us C

|

2

|

100

2

|

Kanetkar

|

Yashwant

|

BPB Publications

|

1

|

Let Us C

|

2

|

100

1

|

Hunter

|

Jason

|

O Reilly

|

2

|

Let Us C++

|

2

|

200

2

|

Kanetkar

|

Yashwant

|

BPB Publications

|

2

|

Let Us C++

|

2

|

200

1

|

Hunter

|

Jason

|

O Reilly

|

3

|

Java Servlet Programming

|

1

|

300

2

|

Kanetkar

|

Yashwant

|

BPB Publications

|

3

|

Java Servlet Programming

|

1

|

300

1

|

Hunter

|

Jason

|

O Reilly

|

4

|

C Projects

|

2

|

400

2

|

Kanetkar

|

Yashwant

|

BPB Publications

|

4

|

C Projects

|

2

|

400

If you have seen the values in this table carefully, you must have observed that many of the rows indicate wrong information. Not what the database really intended to hold. Yes it might be wrong but that's what is generated when you create a Join between 2 tables. Now its upto you to extract the correct rows from this table using a proper condition. The rows marked in red are the correct rows and also they happen to be the ones where the first column (authors.author_id) is equal to the seventh column (books.author_id) . These are the rows that hold the correct information as intended.

Thus when you SELECT data from more than one table, in order to extract only the correct rows of data from the joined table, you must always use a condition for checking the equality of the common column in the WHERE clause of your query. For example in our case as explained before we should use the following query

SELECT firstname,lastname,title FROM authors,books WHERE authors.author_id = books.author_id

Thus you get the name of all the authors as well as a list of all the books that they has written.

Is there anything that I have to take care of when making such queries?
Yeah. Remember that when you write the names of the columns in the query you should take care that you specify the exact column that you are referring to. Thus in case you have a column named 'topic' in the above 2 tables, and you want the value of the topic field in the authors table, you should refer to it as authors.topic as in the following way..

SELECT authors.topic .... from authors, .... WHERE ....

Important Note : SQL can sometimes be very very confusing in case you don't have an easy way to see what is happening when the query is executedIt . Specially when you are joining 2 tables to select values from both of them you need to remember about the joined table . Generally, in case you have 2 tables with say 3 and 5 records (rows) each, the joined table would have 3x5 = 15 records (rows).

So what's the most obvious use of Joins? And what is this concept of a common column in 2 tables?
The most simple and obvious use of Joins is to get data that exists in 2 tables which have some kind of relation. Actually there need not be a relation. Joins basically means you are kind of joining 2 tables and then selecting data from them. But in case you have no relation between the 2 tables (no common column) you will see that the query you make is senseless. It would have no practical value. In our example above the common author_id column allows us to make our query sensible. By sensible I mean something useful as information. Something that you would want to extract from a database as a response to a users question.

Note : An important thing to know is that while creating tables you generally create a primary key in every table. A primary key (for beginners) basically means that in that particular column there can be no duplicate values and that column would be sort of the most unique thing representing the data in that table. So for example in the authors table the author_id is the primary key and there can be no 2 same author_id values. Every author must have a unique id. You would generally have this value (of the primary key present in some other table) whenever you want to create a link between the 2 tables. In our case author_id is present even in the books table so that a link is created between the 2 tables. Such links between 2 tables allow us to use Joins effectively and get a lot of data with just single queries.

**** What do I do if I wanted a list of books written by an author with author_id =1 as well as his full name?
With the type of Join I just explained above, it would not be too tough to get this done. A query shown below could be used

SELECT firstname,lastname,title FROM authors,books WHERE (authors.author_id = books.author_id AND authors.author_id=1)

This would first create a joined table as discussed previously having 8 rows of data. Then it would further refine this table and select only 4 rows from them as a result of the authors.author_id = books.author_id criteria in the WHERE clause. Once down to 4 rows, it would further refine this table to 1 final row because of the second criteria which is authors.author_id=1 . Thus you get the exact data that you wanted - The firstname, lastname and the titles of the books written by Jason Hunter, since his author_id is equal to 1. This kind of Join that you have been using till now is called a Cross Join or a Cartesian Join .

So what are the other types of Joins?
As I told you initially, there are many other types of Joins. Inner Joins and Outer Joins are 2 of them.

What are Inner Joins?
Inner Joins are the most commonly used type of Joins are are almost the same thing that you just learnt above. Check the following query

SELECT * FROM (authors INNER JOIN books ON authors.author_id = books.author_id)

This would return 4 rows of data having all the information about the 4 books. This gives exactly the same result as the following query

SELECT * FROM authors,books WHERE authors.author_id = books.author_id

Basically Inner Joins combines all the records in the first table with all the records in the second table and then selects those rows depending on the criteria that is present after the the ON keyword in the query.

The most important thing to remember in Inner Joins is that only those records from both the tables are combined where there is a corresponding value in both the tables. You will understand this point clearly when you read about Outer Joins.

What are Outer Joins?
Outer Joins are almost similar to Inner Joins, but in Outer Joins all the records in one table are combined with records in the other table even if there is no corresponding common value. To get this point assume that our authors table is now the one shown below instead of the original one. The books table remains the same.

author_id

|

** lastname **

|

** firstname **

|

** company **

---|---|---|---

1

|

Hunter

|

Jason

|

O' Reilly

2

|

Kanetkar

|

Yashwant

|

BPB Publications

3

|

Pai

|

Kiran

|

Pai Ltd.

Note that there is a new row added.

There are 2 types of Outer Joins - Left Join and Right Join . Consider an example of Left Join

SELECT * FROM (authors LEFT JOIN books ON authors.author_id = books.author_id)

This query would now (considering the new authors table) return 5 rows of data. The last row would correspond to the record of the author named Kiran Pai but there would be NULL values for the fields such as title,pages,book_id... since there are no books written by Kiran Pai in our books table.

This is exactly what Left Joins do - they combine all the records in the first table with those in the second table irrespective of whether there are corresponding values in 2 tables. By corresponding I mean like in our case since there were no books by Kiran Pai in our books table there are no corresponding values for Kiran Pai in our books table. But inspite of there being no corresponding values all the values were joined and thus you got the NULL values in your final query result. This point is very important to understand Joins.. so please do read it again in case you have not understood it clearly.

Could you explain the concept of Inner Joins once again with respect to this Left Joins?
Yeah.. now you will understand better. As I mentioned in Left Joins all the records in the first table are combined with data from the second table. And if there is no corresponding data in the second table then a NULL value is inserted into the results. Whereas in Inner Joins, records from the 2 tables are combined only if there are corresponding values in both the tables. For example consider the following Inner Join on the books table and our new authors table -

SELECT * FROM (authors INNER JOIN books ON authors.author_id = books.author_id)

This query would return only 4 rows of data and not 5 inspite of the new Kiran Pai entry being present. This is because it could find no corresponding entry in the books table for the author named Kiran Pai (there were no books written by Kiran Pai)

There are are 2 types of Outer Joins ..right??
Yeah.. there are 2 types. One is Left Join and the other obviously is Right Join. Consider the same example as shown previously for Left Join, but this time use a Right Join instead of a Left Join

SELECT * FROM (authors RIGHT JOIN books ON authors.author_id = books.author_id)

This time you would NOT GET 5 rows of data, instead you would get only 4 rows of data. Why so??

Since while joining the data from the books table with the authors table, the books table data is considered first.. since it is a Right Join. Thus for every row in the books table a check for corresponding value in the authoirs table would be made. Thus while adding all the rows of the books table to the joined table, since there were no books written by Kiran Pai, so that particular record from the authors table was not added at all to the joined table. Remember that the joined table is the kind of temporary table that is created while using Joins. That table is refined depending on the criteria present in the query.

Inner Joins and Outer Joins aren't really clear in my mind right now. Why don't you explain it once again in short ?
Actually, all that is required is a re-reading of the previous 4-5 questions. In case you still haven't got it, there is only way out. Create actual tables using some software and then run some sample queries on those tables. Remember not to put corresponding entries for all your data in both of your tables.

By that I mean in case you are making 2 tables - one for artists and one for songs. See to it that in case you have 10 artists names in the artists table, you do not have songs sung by all of those 10 artists in your songs table. Add songs by only 6-7 artists. When you have such a kind of database where there isn't corresponding information for all the records in one table in the other table, Inner Joins and Outer Joins show their real working. In case you have songs sung by all 10 artists in your Songs table, most of your query results would be the same irrespective of what kind of Join you use. Only the order of the rows in the resultset might be altered. So you would be confused..you wouldn't understand the difference between the working of different Joins.

Are there other ways in which I can use Joins?
Yeah you can combine 2 Joins to create more complex Joins. I shall not go into the details of those kind of Joins, but I shall outline the structure of the query which you have to make. This kind of thing is required when you are combining data from say 3 tables. Your query would look something like the following

SELECT ... , ... FROM ( .... INNER JOIN .... ON .... = .... ) INNER JOIN .... ON .... = .... WHERE .... = .....

This is not really for beginners, but if you can get it.. you are not a beginner.. you are already on your way to be an expert.

So basically there are 3 Joins right?? Cross Join, Inner Join and Outer Join?
There is also a Self Join, which I shall not be explaining in this article. Just remember that it exists. It is a fairly complex topic which you may not need right away.

Give me more examples of Joins with new tables
I could do that.. but that would basically amount to waste of time. On the contrary I shall explain how to make effective use of Joins along with another SQL feature called Sub Query.

What's Sub Query?
Use the same tables that I have used before. In case you want to find the titles of the books written by the Jason. What would you do?? Give it a thought and try to figure some way to get it done.. then read on.

**Yes I can do it.. I shall first make a query and find his author_id from the authors table and then I shall make another query and get a list of books having that id associated with it from the books table.
** Ok.. good. but aren't you using 2 queries? Can you get it in one query?

That where Sub Queries come in use. You could find the author_id for that particular author and then also find the books written by him in the same query.

This sounds exciting.. How do I do it?
Ok here is what I am trying to do now. In English I could state it as follows - "I want a list of the books that have been written by author whose firstname is Jason" . In SQL I could possibly use the following

SELECT title FROM authors,books WHERE ( books.author_id = (SELECT author_id from authors where firstname = 'Jason') AND books.author_id = authors.author_id)

Seems complex.. but its actually simple.. (This is after all SQL In Simple English :)
The first part is simple.. you select the title of the books whenever the criteria is met. Lets analyze the criteria..

The inner most part SELECT author_id from authors where firstname = 'Jason' evaluates to the 1 (its just a simple query which you learnt right at the beginning). Thus the entire query now becomes

SELECT title FROM authors,books WHERE (books.author_id = 1 AND books.author_id = authors.author_id)

Here we are using the 2 original tables the authors table with 2 rows and the books table with 4 rows. Now when the joined table is created it would be consisting of 8 rows (2x4 as explained earlier). From these 8 rows only 4 would be containing correct information and the rest of them would be just present as a result of the various combinations that were possible. Now the books.author_id = authors.author_id part of the WHERE clause separates these 4 correct rows and discards the remaining rows. And finally the other part of the WHERE clause books.author_id = 1 would selects the final single row from the 4 correct rows.

Thus you would get the names of the books written by the author named Jason Hunter.

Now do I know the basics of SQL?
Yeah .. now you are familiar with almost all of the basics of SQL. You could go ahead and implement some really interesting database programs now.

That all for now!! I shall be explaining more complex features of SQL in SQL In Simple English - Part III

Published At
Categories with 数据库类
Tagged with
comments powered by Disqus