Database Design - A Complete Study
Part 4 - Database Reporting

Written by Stéphane Richard (MystikShadows)

INTRODUCTION:

Welcome to this 4th and final part of the complete database study series. As I mentionned in the 3rd part of this series, we will be covering the part of Structured Query Language that can help you draw useful information and reports from your database. We will be using the tables we defined in Database Design - Part 2 for the examples in this final part. These techniques should help you in any database related project you might have. It doesn't matter if your database are for a business management, a list of contacts, a table is a table and as such you can create reports from them just by using Structured Query Language.

The best way to learn is to follow examples and that's just what I'll give you. We'll start with some very basic, but still useful, examples and work our way to more elaborate reports. This will make the SQL statements more and more complex and should show you, gradually more of the things to think about when undertaking the more complex reports. It will also show you more SQL statement clauses that you haven't seen yet in the previous part of this series. With all this you should be pretty well equipped to do what you need with your database projects.

DATABASE REPORTING BASICS:

When you think about it, a report wouldn't give any useful information if the tables you're reporting from didn't have any data to work with. By this I mean that finding out if there's any data is one of the most useful report you can have. Of course, there are many criteria you need to know about when you're trying to find out if there's anything to report on. In many cases, a report will be based on dates. You'll want to know, for example, if there's any records in the database that fall on a specific date, or a specific range of date. If you're working from he inventory table you might want to know how each inventory category that you defined are doing. This is useful, in this case, when you want to know if you're in overstock (if some of the products you bought just aren't selling enough so that you don't buy any of those the next time around. Likewise, there would be a great interest in knowing which products do sell so you can attempt to avoid having to backorder the best sellers. As a business manager you might want to know how much money you made this month, or this year so far. The list of reports you might need can be close to endless.

Alot of books you'd read on SQL would do just what I did and use business examples for reporting. You might also notice that SQL statement and syntax is probably as close to the english language as it can get. The main reason for that is that SQL was originally invented for business executives. It was supposed to be a means of giving these individuals and easy tool to use to get useful information about their business. Once it was created though, many people simply didn't have the time to learn it or learn it well enough to do anything useful with it. So when you think about it, SQL is popular today because it was a failure at what it was supposed to originally do. Today SQL is used by programmers and database managers, but there isn't alot of business executives that actually know SQL. Ironic isn't it? thank you moneo for reporting that fact to me.

Whether it's for a business, or not, when you create a table or a set of tables, you need to ask yourself what can this table give me for information. Typically, the answer to that question can and would end up as a report (a simple or a complex one). Now enough with the theories and reasons, let's get down to the actual reporting. No matter what answer you come up with when you ask yourself what a table can give back for information, since you are retrieving information from the database, you guessed it, all reporting clauses are added to the standard SELECT statement. So let's start shall we?

THE COUNT AND SUM CLAUSES:

I think these two names pretty much speak for themselves, and COUNTing and SUMmarizing are the best aspects of reporting. Very often you'll find yourself asking questions like How many things did I sell? how much money did I make so far? In business, these are essential questions. COUNT and SUM are of course the best way to do it. Unless a simple select which would give you back all the records you're asking for in a list, COUNT will only give you how many of what you asked for. this saves you the task of counting all the records you'd get if you didn't use the COUNT statement. Here's a simple example.

SELECT COUNT(*) FROM InvoiceHeader;

This will tell you how many invoices you've created so far. In this example, I used COUNT(*) because it is acceptable, however, when your tables build up data, this won't be the best way to use the COUNT clause. It will be highly recommended to specific one field instead of the *. Consider a table with 100 fields and 3 million records in it. COUNT(*) is quicker to type but longer to execute than COUNTing a specific field since even if it doesn't do anything with the fields they are still all part of the query. So, to use just one field you need to specify it in a COUNT(FieldName) clause. For example:

SELECT COUNT(InvoiceDate) FROM InvoiceHeader;

Which field you use doesn't really matter as it's contents have no meaning for a COUNT since it only counts records in a table, not values of a field. If you want to know how much money that represent, you would use the SUM(ColumnName) clause. Here's an example for the SUM clause.

SELECT SUM(GrandTotal) FROM InvoiceHeader;

This will total all of the GrandTotal fields from the InvoiceHeader table and give you the total amount that all the invoices have. Now unlike the COUNT statement, SUM does consider field values and contents, So the Field you choose to put in a SUM clause must be of a numeric nature or it will return an invalid type error. All these short examples had to do with all the records of a table. Typically you can shorten the range of needed records. You might, for example, want to know many invoices you created so far this month or this year. You can do this by simply adding a WHERE clause to your select statement. Our Invoiceheader table has an InvoiceDate field and we'll use just that to determine how many records we need to consider for this report. Here's an example:

SELECT SUM(GrandTotal) FROM InvoiceHeader WHERE InvoiceDate >= '2005-01-01';

Simple as that really, this will give you back every record from January 1st to today. Say you wanted to know about the invoices for the month of september only, you would only add a condition to the where clause. Like so:

SELECT SUM(GrandTotal) FROM InvoiceHeader WHERE InvoiceDate >= '2005-09-01' AND InvoiceDate <= '2005-09-31';

As you can see, filtering out unwanted records is easy to do. It doesn't have to be a date field either. From the inventory table, you could want to know all the products in a given category. In this case, it's the same type of SELECT query. For example:

SELECT SUM(GrandTotal) FROM Products WHERE CategoryID = 10;

By adding AND and more conditions to this example, you could find out about more than one category at a time. Again it all depends on the answers to the question "what can this table or set of tables give me back for information?". Now you know how to filter the unwanted records. Next comes how to break the records that you do want into signicant parts. This next section will show you a new SQL statement that is perfect for breaking down these records into any kind of workable groups.

THE GROUP BY AND ORDER BY CLAUSES:

A perfect example of this kind of GROUP BY use is our inventory table. Some of the many questions any inventory manager wants to be able to answer are: What do I need to order? How much did I sell each month? What are my hottest selling categories of products? and so on. Let's the the last question for example. As you can see in our table definitions, each product has a CategoryID field. By grouping the report on this field we can quickly find out how many types of products we have in our inventory. The SQL statement for this example would be:

SELECT CategoryID, COUNT(ProductID) FROM Products GROUP BY CategoryID;

Assuming we have 5 categories of products and typically more than one product for each of these categories, this example would return:

CategoryID ---------- -------- 1 121 5 100 3 32 2 58 4 101

You probably noticed that the 2nd column didn't have any name. When you are using a function like COUNT() and SUM() it doesn't assign a name to that column. You can use the AS statement to give it a meaningful name. This is good practice especially when you have more than one of these calculated columns in a report. Likewise, if you were using a programming language like FreeBasic and wanted to get the value of that column, you couldn't do it if it didn't have a name. Here's the same example that will name that column Products:

SELECT CategoryID, COUNT(ProductID) AS Products FROM Products GROUP BY CategoryID;

Which would give you:

CategoryID Products ---------- -------- 1 121 5 100 3 32 2 58 4 101

The same technique can be used to give better names to column headers if you feel they are not clear enough for the common user of this statement but if you took the time to name your field names properly, when you defined your table in the first place, you'll hardly need to rename them in a report. Note also that these categories are not in order, and many times your results won't be in the order the should be. The SELECT statement can be added an ORDER BY <FieldName> clause to allow to sort the result in any order you want them in. This Statement will order the categories for you.

SELECT CategoryID, COUNT(ProductID) AS Products FROM Products GROUP BY CategoryID ORDER BY CategoryID;

As simple as that, adding that ORDER BY Clause essentially sorts the results in the field order you specified.

CategoryID Products ---------- -------- 1 121 2 58 3 32 4 101 5 100

When using the GROUP BY clause, you can, add WHERE conditions just like any other SELECT statements. Let's say, for example, that we just wanted the first 3 categories for whichever purpose. We create our SQL statement like this:

SELECT CategoryID, COUNT(ProductID) AS Products FROM Products WHERE CategoryID <= 3 GROUP BY CategoryID ORDER BY CategoryID;

The where clause needs to be before the GROUP BY clause as shown in this example. You could use more than one condition too like any other SELECT statements. This particular example would return:

CategoryID Products ---------- -------- 1 121 2 58 3 32

As a rule of thumb, it's very important that you remember one thing. If you're going to need to groub by a certain field, that field must appear in your SELECT statement's list of selected fields. If you can remember that alot of unnecessary confusion can be avoided. Now So far it seems pretty simple doesn't it? By applying these techniques to any of the tables (maybe with different fields and querying condition) you can manage to get some useful information out of them. However, these example all concern one table at a time. The real power of SQL reporting is when you need information, presented in an intelligent and workable way, from more than one table. The next section will cover this aspect of reporting. It will combine what you've seen so far and integrate JOIN statements to mix tables together in a way that can't be done by simple SQL statements alone. But with these additions you'll quickly see how useful SQL was meant to be other than to maintain a table and it's data.

REAL WORLD REPORTING TECHNIQUES:

Keep in mind that the simple examples I've shown so far will come in handy because they can present you with information you need to know. However, when you can related 2 or more tables to create a report, things can begin to become even more useful. Once again examples are the best way to teach you reporting so that is just what I'll do. Let's examine a couple situations that are related to our table definitions and see how we can get the information we want out of them. Remember that if a select involves more than one table, in reporting, it will more than likely mean that you'll be using the JOIN statement.

As a first example, let's pretend that we want to know how many products were sold so far. However, to make our report a bit clearer to read, we will want to display the ProductName, it's price, how many of them were sold and the total amount of money it represents. Believe it or not, this type of question is asked very often in business. Anything that can give back how much money anything made is a question you'll hear often in your career as a developer. So then, taking this example, we know we'll need the Products table. since we want to know how many of them were sold we'll also need the InvoiceDetails table where each item sold is listed here along with the quantities sold. We'll want enough information for the report to be clear and easy to read so let's see what the statement can look like:

SELECT Products.ProductID, Products.ProductName, Products.CurrentPriceSold, SUM(InvoiceDetails.QuantitySold) AS Quantity, SUM(InvoiceDetails.LineTotal) AS AmountTotal FROM Products LEFT JOIN (InvoiceDetails) ON (Products.ProductID=InvoiceDetails.ProductID) GROUP BY Products.ProductID ORDER BY Products.ProductID;

This example only uses two tables to get all the information it needs. The LEFT JOIN statements makes sure that the data gathered is synchronized on the ProductID from the Products table and not the InventoryDetail table (if it was from the InventoryDetail, you could get more than one record for the same product and we don't need that in this particular case). This SELECT statement would return something like this:

ProductID ProductName CurrentPriceSold Quantity AmountTotal --------- ----------------------- ---------------- -------- ----------- 1 Panasonic KXP-1180 100.00 4 400.00 2 HP 5P Lazer Printer 300.00 2 600.00 3 NEC Accusync 70 17" 250.00 3 750.00 ... and so on for all the products in the Products table.

Can you see how much more readable this is compared to manually selecting all kinds of things independantly from these two table to try to get the same results but in a more complicated and less readable way? that's one of the advantages of the reporting part of SQL. As a second example, let's take this from a different perspective in the business. Let's assume that the boss wants to know how much money he made so far this year and that he wants those results on a monthly basis. Basically we'll need the same kind of information to give him those results, but we'll swap the Products table with the InvoiceHeader table since that table has the invoice dates that we can work with to give him the monthly results. Let's see how we can give him just that. The SQL Statement will look like this:

SELECT MONTH(InvoiceHeader.InvoiceDate) AS MonthNumber, COUNT(InvoiceNumber) AS InvoiceCount, SUM(InvoiceHeader.GrandTotal) AS InvoiceAmounts FROM InvoiceHeader LEFT JOIN (InvoiceDetails) ON (Invoiceheader.InvoiceNumber =InvoiceDetails.InvoiceNumber) WHERE InvoiceHeader.InvoiceDate >= '2005-01-01' AND InvoiceHeader.InvoiceDate <= '2005-12-31' GROUP BY MonthNumber ORDER BY MonthNumber;

This particular query return no physical records from any of the tables it uses, it just brings back results. Typically, a boss doesn't care if he sold a NEC 17" monitor, he'll want to know how much money it gave him instead. This example would return something like:

MonthNumber InvoiceCount InvoiceAmounts ----------- ------------ -------------- 1 59 2439.43 2 34 1834.22 3 58 2308.43 4 78 4898.32 5 120 6421.15 6 211 11243.24 7 377 15234.42 8 105 5734.52 9 77 4968.76 10 56 3452.42

And there you have it, this is exactly what the boss asked for so to speak. Now this might not seem very useful just looking at numbers. However, to those that want this, one of the crucial information you can extract from this report is that the highest selling months are June (6) and July (7). This can help them make sure they buy enough products to supply the higher demand they can expect this coming year in those months. Now how useful is this report I ask you? And basically that's what reporting is all about. If a report you are asked to create doesn't seem to return any intelligent information to you, or to the one that asked you for this report, it's not worth doing, period.

This brings us to the other question to ask yourself, that question is: "How useful is this report to the one that asked for it". A business manager, an accountant, an inventory manager and a sales department could all want somekind of information based on the inventory and invoices. However, they will typically want that information presented to them in a very different way that is useful to them alone. The first report we did in this section on the products would typically be useful for the Inventory manager. In this last example, well take that first query, and modify it a bit so that it will return, for example, all the sales by category. This is another often asked for report. And as you'll see here, we will need to include a 3rd table in the query to get the name of the category and he Products table will just return how many different products they have in the given category. Here's the SQL Statement:

SELECT Categories.CategoryID AS Category, Category.CategoryDescription AS CategoryName, COUNT(Products.ProductID) AS ProductCount, SUM(InvoiceDetails.QuantitySold) AS Quantity, SUM(InvoiceDetails.LineTotal) AS AmountTotal FROM Products LEFT JOIN (Products) ON (Categories.CategoryID=Products.CategoryID) LEFT JOIN (InvoiceDetails) ON (Products.ProductID=InvoiceDetails.ProductID) GROUP BY Categories.CategoryID ORDER BY Categories.CategoryID;

It's as simple as that, we selected the fields and functions we needed from the different tables (Categories, Products and InvoiceDetails) created the LEFT JOINs we needed and the report is ready. Here's a typical output from this statement:

Category CategoryName ProductCount Quantities AmountTotal -------- ------------------ ------------ ---------- ------------- 1 Monitors 121 59 2439.43 2 Printers 100 45 1834.22 3 Scanners 32 132 2308.43 4 Mices 58 564 4898.32 5 Hard Drives 101 254 6421.15

As you can see, it's all about taking the time to sit down and ask yourself what you or the one that needs the report really wants to know and how you need to present it so that you/he/she gets the exact answer they are looking for. This is in the case of a business. Alot of you might have your own forums for example and you could play with the tables of your forums to get specific information from them too like the average number of posts per month you get and other useful information you might need to know when managing a forum.

Reporting queries do have their limits however. In some very complex cases You might not be able to get your answer from one single SQL statement alone. Therefore, in these situations you'll need a tool that helps you get temporary results that you can work with to finish the reporting task completely.

ENTER THE TEMPORARY TABLES:

SQL does provide a means of temporarily saving a subset of records when needed in order to be able to use these temporary results to complete a reporting task. MySQL differs from other databases in that it doesn't support the SELECT INTO TABLE construct. Instead you can use an INSERT INTO Table construct to get the job done. The idea is that you would create a temporary table (designed to be used then dropped) from which you can execute another SELECT statement that will finish the job you started. As such, this technique will create the table, put the information that your first SELECT query will get into it, and then, in a second SELECT statement you would get information from that temporary table, JOIN it into other tables and get the final results you were hoping to get in the first place.

The example I'll use here could have probably been solved by a single SELECT statement. However, for the sake of clarity and your understanding, it will do the job quite well. Let say that you wanted to know how many new customers you had each month of this year. Let's also assume you wanted to know how many invoices and their totals that they purchased as well as the number of items they purchased for that month. Finally let's assume you want that report broken down by month. This typically involves two steps. The first step is to collect the data will need to work with. We'll put that information into a temporary table called CustomerData Once we have that, we'll use a 2nd query to report on the temporary data and break it down by month. Here's the query to gather the data into the temporary table.

INSERT INTO CustomerData SELECT Customers.CustomerID AS Customer, Customers.CustomerSince AS CustomerDate, MONTH(Customers.CustomerSince) AS CustomerMonth, InvoiceHeader.InvoiceDate AS DateOfInvoice, MONTH(InvoiceHeader.InvoiceDate) AS InvoiceMonth, SUM(InvoiceDetails.QuantitySold) AS Quantity, SUM(InvoiceDetails.LineTotal) AS AmountTotal FROM Products LEFT JOIN (InvoiceHeader) ON (Costumers.CustomerID=InvoiceHeader.CustomerID) LEFT JOIN (InvoiceDetails) ON (InvoiceHeader.InvoiceNumber=InvoiceDetails.InvoiceNumber) WHERE Customers.CustomerSince >= '2005-01-01' AND Customers.CustomerID = InvoiceHeader.CustomerID AND InvoiceHeader.InvoiceDate >= '2005-01-01' AND InvoiceMonth = CustomerMonth GROUP BY Customers.CustomerID ORDER BY Customers.CustomerID;

This gives a table with 7 columns that have the temporary results in them. Once we have that in the temporary table, we simply issue a 2nd SQL query based on those temporary results. here's what that might look like:

SELECT CustomerMonth, COUNT(CustomerMonth) AS NewCustomers, COUNT(DateOfInvoice) AS InvoiceCount, SUM(Quantity) AS ItemCount, SUM(AmountTotal) AS TotalAmount FROM CustomerData GROUP BY CustomerMonth ORDER BY CustomerMonth;

As you can see, in very complex situations, this use of temporary table really help keep things into perspective and keep the goal of the report clear. This example would give back a report that could look like this:

CustomerMonth NewCustomers InvoiceCount ItemCount InvoiceAmounts ------------- ------------ ------------ --------- -------------- 1 23 59 121 2439.43 2 34 56 97 1834.22 3 58 87 134 2308.43 4 78 128 231 4898.32 5 120 185 335 6421.15 6 211 465 684 11243.24 7 377 624 734 15234.42 8 105 160 342 5734.52 9 77 149 522 4968.76 10 56 134 316 3452.42

This report would show again that June and July are he hot sellers, however, with the added information, we can now deduct that since the ItemCount is closer to the InvoiceCount in July, that they have been buying more of the same thing during those months and bought a more diversified list of items in the remainder of the year. After knowing this they could compare that against the inventory history to make sure they are right in their assumption (which, in most cases, would prove to be right). It's important to remember that the more complex a query gets, the harder it gets to follow when it does not return the required information. Therefore, extra care needs to be invested when building the more complex query as debugging it later will prove a bigger headache than needed. Note that if things get more complex, there is nothing to stop you from breaking the reports into more than one temporary table as you need it. The important thing is to keep the reporting queries as clear as they have to be to be maintainable. This way, when situations occur, you'll be better prepared for the process of following what's going on and knowing where the situation is more likely to occur.

THE OFFICIAL FINAL WORD:

And there you have it. We now conclude the 4th and final part of the database series. In these 4 parts, I believe we covered alot of ground as far as database design and management is concerned. I definitaly hope it is useful to your database projects whatever they may be. As I mentionned before, this series covered the MySQL implementation of the SQL syntax. When you go to other databases, such as Oracle, SQL Server and PostGres You'll notice that alot of what you learned here will definitaly serve you well in these other databases too.

As always, I've tried my best to be clear, concise, to the point as much as could. However, some of you might have questions about what you've learned in this series as perhaps I could have written certain parts differenly and make them even easier to understand. Fell absolutely free to send me comments, ideas and suggestions about this series, or any other tutorial I've written in QB Express so far. If you have specific questions, feel free to email me those questions so I can answer them for you to make sure you understand everything that I tried teaching in this (and other) series. Until I write again, happy learning, designing and coding.


MystikShadows
Stéphane Richard
srichard@adaworld.com