Database Design - A Complete Study
Part 3 - Structured Query Language
Written by Stéphane Richard (MystikShadows)
And here we are, the third and final part of this series. As promised, this document will cover everything you need to know in order to create, maintain, and query a database properly using the SQL (Structured Query Language) as it is implemented in the MySQL Database System. As such, this document will have a lot of theory to cover. I'll try to keep it as simple and as basic as I can while giving you all the syntaxes you'll need to cover all bases of Database Creation and administration. Remember that the main reason I chose MySQL is because of it's resemblance to most other database systems out there. So what you learn here will help you alot if you have things to do in other database systems.
First we'll cover the theory of database creation constructs in SQL and why they are the way the are. Then we'll create scripts to create all the tables that we defined in the second part of this series including primary keys and indexes. And we'll continue with the explanation of the SQL syntax to insert, update, delete records from tables and well also see how to query the database to retreive the information you want and need. With all this, you should be equipped to create your own MySQL database projects. So then let's get right to it.
CREATING AND SELECTING A DATABASE:
This is among the simplest statements you'll need to remember. For the sake of this series we'll call our database Business. Now, the statement to create our database is defined as follows:
CREATE DATABASE <DatabaseName>;
There you go, pretty simple no? This statement alone will go about creating a database for you. Now once the database is created, you won't be able to do anything with it unless you are using the database first. It's important to remember the semicolon ";" at the end of the SQL statement. Most statements in MySQL (and other SQL implementation) use this ; to indicate the end of a complete SQL construct. So put that in your brain right off the start. Next thing you need to know of course is how to use the database you just created. Here's the statement that will do just that:
There you have it, it's that simple. This will make sure that the database is in use and ready to be used. You might notice that I didn't put a semicolon here. the USE statement is one of the rare statement that doesn't need a ; to complete it's construct. You could put one of you wanted to but it is not mandatory. Now for the purpose of our series. Here is the code you'll need to create and the use the Business database:
CREATE DATABASE Business; USE Business
With these two statements, Business is now created, used, and ready for action. It's really as simple as that. The good thing about SQL, as you'll discover here is that it's pretty close to the english language and as such, pretty easy to understand.
MYSQL DATA TYPE REVIEW:
Before we go ahead and create our tables, I think it's important to review the data types we have available to us when defining our table structures. There are quite a few of them to allow for a greater flexibility and accuracy in our field definitions. So let's review them here:
- CHAR(Length) and VARCHAR(Length):
These are the text data types. CHAR(Length) is there for the text fields that will always need the same amount of characters. For example, Social Insurance Numbers always have 9 digits, and so on. VARCHAR(Length) are provided for those fields which may vary in length for whichever reason. For example, Names, descriptions and so on.
- BOOLEAN data types:
This data type serves to indicate fields that can be represented as TRUE (1) or FALSE (0) (these constant are available in MySQL and most other database system) so you can use either the number or the TRUE or FALSE constant when comparing these fields.
- BIT(Length) data type:
This data type offers the ability to specify it's values at the bit level. It is a binary datatype that can be of any length specified. Therefore if you define a field a BIT(4) you can set it's value to anything from "0000" to "1111" in binary (0s and 1s only). This is useful for fields that will play the role of flags to setup a given module of a program for example.
- INTEGER, FLOAT and CURRENCY data types:
These are the basic numeric values that you can put in a database. INTEGERs are whole numbers that do not have a decimal value. FLOATs have the ability to hold a decimal point and fractional values. Depending on the role of your numeric data, using INTEGER when you can will make the table shorter in size. The CURRENCY datatype is a number that typically will only have 2 digits after the decimal point and work best for all monetary values.
- DATE and TIME data types:
These types pretty much speak for themselves. They are designed to hold specific date and time values and are also made to be interrogated specifically based on a date and/or tume value.
NOTE: There are many more data types that you are likely to find in a MySQL table definition. Those mentionned here represent the datatypes that you will most likely need in the course of your database related projects.
CREATING TABLES, PRIMARY KEYS AND INDEXES:
Ok, now that we have a database, it is pretty useless unless we have tables in there to store our information. Of course SQL provides a series of CREATE statements to help us create tables, primary keys and indexes. Tables are the actual table that will hold the information we have to save. Here's the definition of the CREATE TABLE construct which can be used to create a table for our database: (note that this is a simplified definition, for a complete syntax of the CREATE TABLE statement take a look here.
CREATE TABLE <TableName> ( <FieldName> <DataType> [UNIQUE|AUTO_INCREMENT|NOT NULL], <FieldNameN> <DataType>, ... ) ;
With this statement you can build all of your table definitions. UNIQUE means it doesn't allow DUPLICATE values for that field in a given table. The AUTO_INCREMENT and the NOT NULL attributes are ways to specify certain behaviors for a given field (see the complete CREATE TABLE statement definition mentionned above). AUTO_INCREMENT basically tells the table that the field will be incremented in value everytime a record is added to the table. NOT NULL states that the field cannot allow itself to receive a NULL value (A key field for example should never allow a NULL value to be saved or it defeats the purpose of the key).
Primary Keys are indexes that are specifically created to make sure we don't have duplicates in our table or to make sure that each record we'll save in our table can be uniquely identified (or found). Make a special note that any table you create should have a means of being able to find a record and only one record in some way or another (A big rule of thumb so to speak). They are created in the CREATE TABLE statement as part of the field definitions (inside the parenthesis) as follows:
CREATE TABLE <TableName> ( <FieldName> <DataType> [UNIQUE|AUTO_INCREMENT|NOT NULL], <FieldNameN> <DataType>, ..., PRIMARY KEY (<FieldName>) ) ;
Indexes are preset orders in which the table records will be presented in. Like the primary key, indexes can be specified in the CREATE TABLE as well. Quite simply, it follows the same rules as PRIMARY KEY except that PRIMARY is not used. For example:
CREATE TABLE <TableName> ( <FieldName> <DataType> [UNIQUE|AUTO_INCREMENT|NOT NULL], <FieldNameN> <DataType>, ..., KEY (<FieldName>) ) ;
That simple. Note that you can of course combine KEY statements for as many fields as you want to create an index on. Just seperate them by commas as you would use to seperate the fields in the TABLE definition. For the sake of example, let's create the complete CREATE TABLE statement to create the first table we defined in the 2nd part of this series, the Customers table. It would look like:
CREATE TABLE Customers (CustomerID INTEGER AUTO_INCREMENT, CustomerName VARCHAR(50) NOT NULL, IsCompany BOOLEAN, CustomerSince DATE NOT NULL, Address1 VARCHAR(60), Address2 VARCHAR(60), City VARCHAR(40), State VARCHAR(25), ZipCode VARCHAR(8), PRIMARY KEY (CustomerID), KEY (CustomerSince) );
This is one of the many tables we have. Using the same syntax as the Customers table defined here, we could create a series of create table statements to create the whole database. I'm not going to define all the create tables at least not in the context of this series. But this table definition should help you tremendously when creating the definitions for the other tables. Here are a few useful Statements you might find yourself using often when creating databases and tables.
-- To Show the databases that are already created: SHOW DATABASES -- To show the tables that are created: SHOW TABLES [FROM <DatabaseName>] -- To display the structure of a table definition: DESCRIBE <TableName> -- To rename an existing table to another name: (very useful) RENAME TABLE <ExistingName> TO <NewName> -- To remove an existing table from the database: DROP TABLE <ExistingName>
INSERTING, UPDATING AND DELETING RECORDS:
So we now have a table in our Business database called Customers. The next logical step of course is to be able to add and manipulate data into the table. Here too, SQL provides all the constructs you'll need to add, update and delete records from your table(s). Let's start with Adding records. To do so, you need the INSERT statement which is defined as follows:
INSERT [INTO] <TableName> [(FieldName, FieldNameN, ...)] VALUES (Value, Value, ...);
One thing to notice here is that the FieldNames are optional. There is a condition though. You don't have to supply the field names if your VALUES list covers all the fields of your table. If you are only saving data for some fields, you'll need to specify those fieldnames. That's the only condition. If we take this to our example, and insert a customer into our Customers table. our SQL statement will look like this:
INSERT INTO Customers VALUES ('MystikShadows', FALSE, '09/11/2005', '99 NoName street', '', 'Nowhere', 'NoCountry', '00000' );
Make sure that the type of data you are inserting corresponds to the data type of the field you are inserting your value into. That's always important as you can probably imagine.
You have records and you notice you made a mistake in the values you just inserted. What do you do? Simple you use an UPDATE statement to change just the fields you need to change. This is one situation. Another situation where you might need the UPDATE statement is when a customer changes address on you and calls you to make the change. The UPDATE statement is defined as follows:
UPDATE <Tablename> SET <FieldName> = <Value>, <FieldNameN> = <Value>, ... WHERE <FieldName> = <SearchValue> [AND/OR <FieldName> = <SearchValue>];
This is where the PRIMARY KEY comes in handy. Say you have 2 customer records, one is identified as 1 and the other has 2 for it's CustomerID field. if you want to change the 2nd record's City and State fields, here is how it could be done in SQL:
UPDATE Customers SET City = 'NoPlace', State = 'New York' WHERE CustomerID = 2;
Once again you can see how simple things get with a language that is as close to english as SQL is. It basically says Update the Customers Table, Set City to 'NoPlace' and State to 'New York' where the CustomerID is 2. SQL will first seek the record that has CustomerID equal to 2 and then proceed to change the two fields we specified with the specified values. Note that in these examples I separated the constructs in multiple lines because they look clearer this way. but all these could be in one extra long line and would still work as they should. This is for formatting purposes only. You can use more than one field in your WHERE clause to further narrow down the resulting records that will be affected.
Now that we have records and we can change their values, what happens if you ever want to get rid of a customer alltogether as if he/she never existed? Once again SQL has a statement for the job. All you need to do is use the DELETE statement, which is defined as:
DELETE FROM <TableName> WHERE <FieldName> = <SearchValue> [AND/OR <FieldName> = <SearchValue>];
That's it, this simple statement can either delete one specific record from the table or delete a whole set of records (depending on what you specify in the WHERE Clause and the contents of your table). In our Customers table, let's say we'd want to delete the customer that has a CustomerID field of 2, we would do it as follows:
DELETE FROM Customers WHERE CustomerID = 2;
And there you have it. You now have all the knowledge you need to do some simple database maintenance on your own. You can now create databases, create tables in your databases, insert records, update them, delete them, you can do pretty much all you need as far databases are concerned. There's only one thing we haven't done yet as far as simple database operations are concerned. And we'll see that in the next section right now.
SELECTING FROM A TABLE:
What use would a database have to it's users if there was no means of interrogating it's contents? Absolutely none is the right answer. Indeed if you can't find specific data that you stored in a database, the database isn't of much use. Well SQL provides all you need for doing just that. It is called the SELECT statement. Logical name to get records from a database no? The SELECT statement is defined as follows: (note again this is a simplified version, look here for a complete definition of the SELECT syntax).
SELECT *|<FieldName>, FieldNameN>, ... FROM <TableName> WHERE <FieldName> = <SearchValue> [AND/OR <FieldName> = <SearchValue>]; [ORDER BY <FieldName> ASC|DESC];
This indicates to select all the fields (*) or the list of field we specify from the TableName where the conditions are met to return the record or group of records. If we don't specify a where clause, all the records in the table will be returned. Basically the SELECT statement can be used in more than one way. The ORDER BY clause simply tells SQL to display the returned records in the order of the FieldName(s) specific either ASCending OR DESCending order. Here is a typical select statement from the Customers table:
SELECT * FROM Customers ORDER BY CustomerName;
This is the shortest form of the SELECT statement, it will return every single field there is in the Customers table and all the records that have been added to the table.
SELECT CustomerName, Address1, Address2 FROM Customers WHERE CustomerID = 2;
This one will select only the 3 specified fields from the database and will return only the customers that have a CustomerID of 2 (in this case, only one customer will be returned). Now, let's say we have an Invoices table and we wanted to select specific fields from both Customers and Invoices table, a very straightforward way to do so would be to simply combine them in a SELECT statement, like so:
SELECT Customers.CustomerName, Customers.Address1, Customers.Address2, Invoices.InvoiceNumber, Invoices.InvoiceDate, Invoices.TotalInvoice FROM Customers, Invoices WHERE Invoices.CustomerID = 2;
What this particular SELECT statement will do is return the Customer's Name, Address1 and Address2 fields as well as the Invoices Invoice number, Date and Total all in one row for each invoices that has a CustomerID field set to two. Depending on your specific needs, this could be enough for alot of your database querying. Just combine your Fields (by specifying which table the belong to like I did) and list the tables in the FROM clause seperated by a comma. You can do this for more than 2 tables too.
Now things can get much more complicated than this. These SELECT statements are good when you only one a specific set of records (in this example, only the invoices that belong to the Customer that has a CustomerID of 2). Likewise, this is just oen set of tables. If you want to get more specific data from the database, then you just might need to learn all about the JOIN statement. For example, if you only want the details of the invoice, you could add it in the las select statement as a 3rd table, but the result will get messy, especially if you need 4 or 5 tables.
THE JOIN STATEMENT:
The JOIN statement plays the role of connecting two or more tables with conditions. This helps keep all the tables in sync while you are doing your querying. In this document, I will only be covering standard left joins (where the parent table has a one to many relationship to the joined table) as this is what you'll need to know for atleast 95% of your JOIN experience. The other joins work the same, it really all depends which table you are starting to join from. Here is how the JOIN statement is defined (it is usually part of a SELECT statement so we'll take our previous example and JOIN it instead of a standard combined SELECT query. Again this is a simplified Definition. You can look here for the complete JOIN syntax.
SELECT *|<FieldName>, FieldNameN>, ... FROM <ParentName> LEFT JOIN (<JoinedName>) ON (<ParentTable>.<FieldName> = <JoinedTable>.<FieldName>, ...) WHERE <FieldName> = <SearchValue> [AND/OR <FieldName> = <SearchValue>];
RIGHT JOIN statements are for when the Parent Table has a Many To One relationship with the joined table. Quite simply put, you could get "relevant" data from 2 or more Tables using this type of join statements. As an example, let's say we have Customers, Invoices and InvoiceDetails for tables and we would like all the invoices and their details (what they purchased) for a given customer and we'd like to know the customer's name as well. Using the LEFT JOIN statement, we could do it this way:
SELECT Customers.CustomerName, Invoices.InvoiceNumber, Invoices.InvoiceDate, Invoices.TotalInvoice, InvoiceDetails.PartNumber, InvoiceDetails.Description, InvoiceDetails.UnitPrice, InvoiceDetails.Quantity, InvoiceDetails.LineTotal FROM Invoices LEFT JOIN (Customers) ON (Invoices.CustomerID = Customers.CustomerID) LEFT JOIN (InvoiceDetails) ON (InvoiceDetails.InvoiceNumber = Invoices.InvoiceNumber) WHERE Invoices.CustomerID = 2 ORDER BY Invoices.InvoiceDate;
Now, this still seems understandable doesn't it? For all that matters it's a SELECT statement. The things to watch out for when creating a JOIN statement are. First, be sure that when you are joining two tables, the field you are using in the ON clause is relevant to the two tables. As you can see here I joined the Customers.CustomerID to the Invoices.CustomerID, both of these represent a field that Identifies the customer in both tables. I did the same thing on the 2nd LEFT JOIN Because both fields there represent an InvoiceNumber in both tables. If you can inderstand that small detail, JOINs will quickly become quite trivial to you.
AND THE FINAL WORD OF THIS 3RD INSTALLEMENT:
This concludes our 3rd installement in this database study. The best thing to do at this point is to fire up MySQL and practice what you've learned so far, insert some relevant data into your tables work on your INSERT, UPDATE and DELETE and get familiar with them as much as you can. Then once you have a couple of tables that can be JOINed together in some way, go right ahead and practice your JOIN statements.
In the next (and final) installement, I'll teach you another side of SQL. You see, SQL isn't only used to interrogate the database or insert and delete records. SQL has another very useful purpose. That purpose is reporting. Indeed you can create some very sophisticated reports in SQL that will really help your users see the real information they want to (read need to) see. We'll cover everything that MySQL's SQL implementation has to offer to help you create some very detailed and explicitally intricate reports on data that can help the user understand what they are looking at and also quickly see totals on given intervals of time or other factors. Reporting in SQL is just as natural as it is what we've read about in this document. So get yourself ready for the next and final installement. You'll seen then exactly how useful SQL can be. Until then, happy coding. As always, you can email me with comments or questions, I love to get emails.MystikShadows