THE ART OF DATA VALIDATION
Written by Stéphane Richard (Mystikshadows)
INTRODUCTION:
When you think of data validation, what's the first thing that comes to your mind? Is it, by any chance, to enter numbers
when an application is expecting numbers out of you, the user? If so, then you're already well under way to grasping the
full understand of just what data validation is all about. In essence data validation is about having valid data for a given
type of variable and wether you are creating a business application or a game, good data validation will add a certain level
of professionalism to your programming endeavours. There's more to data validation that simply validating the data, alot more
as you'll see there is alot of things that can be done to help in the data validation process.
In this document, we will cover all you'll ever need to know about data validation, what it really means, how to use it effectively
and ultimately how to minimize the use of data validation while still assuring that the data is indeed valid. So get ready, the
journey begins here.
DATA VALIDATION DEFINITIONS AND CONCEPTS:
Data validation, as explained above, is making sure that all data (whether user input variables, read from file or read from a database) are valid for their intended data types and stay valid throughout the application that is driving this data. What this means is data validation, in order to be as successful as it can be, must be implemented at all parts that get the data, processes it and saves or prints the results. Let's take the time and list those parts here and explain why they should be considered.
- User Interactivity Screens And Forms:
Any part of an application that requires the user to enter the data (in a data entry screen for example) should most definitaly be considered as a prime candidate for data validation. Human error always end up as the prime suspect for invalid data no matter how well intended the users are. You'll find that statistically, a very big majority of the invalid data situations are usually caused by human error. - File Manipulation Routines:
This entails all file related operations such reading from the file or writing to the file. Because usually, the data saved is the result of user supplied information, it is good practice to install a data validation procedure at this level. This includes database driven applications or even just a game screen for the settings to the game for example. when reading a file, double checking what you are reading is also good practice if not for anything else, to validate that the datafile is valid and not corrupted in any way. - Import and Export Routines:
Indeed, when you plan to have your application able to save the same data in different file formats for different other applications, you need to make sure that the other applications will be able to open the file, as expected, without a problem. The reason I distinguish Import and Export from regular File Manipulation is because they are, in most cases independant of one another. These reasons as especially true when the import/export is to/from another database system. And let's face it, today, Import and Export are a very sought after feature.
These are the 3 major areas where you'd want to be sure that proper data validation is applied properly. This will only help
with the rest of the application because you'll be able to cross out the data as being a cause for problem in possible errors
that might slip into the application. That is, of course, once the data validation is debugged. But making sure that these
are implemented in the development process rather than after the program is done will help minimize the burdon of integrating
data validation in your application. Let's take a look and the types of validation that exist and are commonly used to give
you an idea of where they can be used as well as how.
THE MANY TYPES OF DATA VALIDATION:
Why different types of data validation? The answer is simple. Essentially everything depends on the reason why you'd want to do
data validation, it's degree of importance (as far as the data available to the application) and the actually type of the data as
well. Here's a quick example just to put you into focus. Let's say you're making a mortage calculation program (that will be used
by professionals all over the the continent). Typically, this kind of application would get initial data for the calculation of the
mortgage itself. Hence a Principle amount, a length of the load, an interest rate, a data of first payment, the number of compounding
periods per per year. This would be aside the client's information itself such as address, telephone number and the likes. Once the
loan related data is entered the software would offer to generate an amortization table for each payment stating the payment amount,
the amount give to the payment of the interest, the amount given towards the payment of the capital itself and some running totals at
each payment period. If the initial data ends up being wrong, can you imagine just how bad the amortization table would end up being
at the end of the calculation? Don't you think that the entry screen should be 100% sure of it's data before it goes ahead and performs
the generation of the amortization table? The answer to this question ultimately must be yes. For the sake of the customer and the
financial institution that will be loaning the money. With this in mind, let's see here what we can do to help the software make sure
that the data is valid all the way.
We will be using the example of a mortgage calculation application because this, and other directly or indirectly related types of
application such ass long term rental places, banks, realestate related businesses will love the fact that the data must be accurate
for the reasons mentionned above and aq couple more that we'll see later in this document. So let's get right to it. The first thing
will see is the different types of data validation that can be performed and describing them.
- Field Level Validation:
The typical scenario you have here is you present a screen with a list of values that you will need the user to enter. Usually this entails that once the data is entered, it will be saved to a data file, or a database. You'll see later that there's more than one thing you can do, at the field level (for each value to be entered) to avoid errors caused by human interaction. Some forms will need certain values to be entered before another for the sake of being able to identify certain factors before giving the user field related help. Not all forms need a field level validation however. - Form Level Validation:
Same scenario as above but no field need to be entered before any other fields on the form. This usually means that the user will enter the information and validation is done once, for the whole form, usually before a save of the information is needed. Most of the time, client information screens and other general information inquiry screens can do fine with a simple form level validation as described here. They just need to make sure that mandatory fields are provided by the user and the rest of the information generally of the expected type or empty (if any field can be considered optional. - Data Saving Validation:
This type of validation is performed at the routine that will be performing the actual saving of the information to the file or database record. This is usually used in option screens or multiple data entry forms that all need to be entered before the record is physically saved. For example, an option screen typically has tabs offering more than one page of information that can be set by the user. The usyer can go to any and all of these tabs, change the values that are needed to be changed and the data gets saved once the user presses a "Save" or "Ok" button of some sort. - Search Criteria Validation:
Some may feel that this type of form (a search for form) can do without data validation. When you think about it. Think of the time saved, if you could help make sure that results were actually returned and that those results were relevent to what the user is looking for to a certain degree. In many cases this might not be important per se, but I like to believe that in other cases, this type of validation would be well sought after and most definitaly appreciated by the users of your application.
Many times people just don't bother implementation data validation at all while they are creating their application and this usually results in
very bad things happening that really could have been avoided with proper data validation. In a big project, the right data validation should be
established at the screen design phase of development as this is where you'd usually have time to talk about the role of each form and how important
the data validation would be for the given forms. Let's see now what is available as far as tips and tricks to help eliminate or atleast minimize
the risk of error in your application.
REMOVING/MINIMIZING HUMAN ERRORS:
As mentionned, there are ways around data validation. Ever heard the expression "prevention is the best policy"? Well as far as data validation is concerned, this saying definitaly applies. The best place to prevent human error is of course at the data entry screen level. From there, giving the user the right means of entering his data can, in most cases, eliminate the need for data validation. Indeed, a good data entry screen will be equipped with all that's needed to validate the data long before it can become dangerous to other parts of the application. Depending on the datatype at hand different types of field specific validations can be made available for different purposes. Let's see the different techniques and I'll explain what kind of data they apply to and how to effectively use them.
- Range Validation:
As you might have guessed, this usually applies to numeric values or even dates. They perform a test to make sure that a value entered is within a range of specific values. Note that this could apply to characters as well. For example say you're making a questionnaire application that offers multiple choice questions wouldn't be of much use to accept Z or any other letters if the only choices are A, B, C, D or E. If there is a reason to have a minimum or maximum value, of any data type, then a range validation routine would become mandatory. Another example is say you have a field that expects a numeric entry. You could code to only accept numeric keys, the decimal point and the minus sign and reject the rest of the keys. - Lookup Validation:
Typically, this type of validation is done when a value entered needs to be compared to a list of possible values. A good example that relates to our Mortgage calculation program is the number of compounding periods per year. Most financial institution would need something like 1, 2, 3, 4, 6, 12, 24, 26, 52 and 365 (or 366 for leap years) in this case, a lookup validation would take the value entered and compare it against that list of values to report if it's a valid entry or not. Of course, the best thing you can do is not let the user enter invalid data to begin with. Perhaps a drop down list offering the choices would be good, this way the user can only select a valid value and therefore no validation per se is needed for this field. Anytime you can integrate something to force the user (so to speak) to have no choice but to enter valid data only, I would encourage to do so because you can be 100% sure about those fields and not have to worry about them for any other parts of the application as well. - Masked Input Validation:
As a list of examples, a telephone number, a zip code, a social insurance number and a UPC code all have one thing in common, they have a specific Input pattern that should be respected when entered and when read to assure that the right information is read from the user and written to file. Masked or filtered input is the art of only allowing valid characters to be entered in an otherwise all opened data type, in this case, a string. typically, masked input is present to give the user an indication of the type of information is required by guiding them through the process of entering the value in a field. Visual aids could also be used for example, if a date is expected in a given field value, perhaps a little popup[ calendar to let the user pick the date to be entered visually would be good since it would then be impossible to select an invalid date from a calendar. This is but an example. There are different situations as well. Taking the time to consider these little aids in entering the expected data properly will help your application work the way it should.
The best thing I can say at this point is to use common sense when evaluating the need for data validation. Typically it's not worth implementing data validation on small projects that you'd make
for yourself because since they're your creations, chances are you'll know what data you need to enter and you can deal with the consequences of wrong data. For a game (depending on how widespread the
game is) you might want to consider it for the sake of not having to answer repetitive questions that not having data validation might give you. For business applications and commercial applications of
any type, you'd usually want to be as sure as possible about data validation. There's no room to mess around in the commercial application industry and if you can't be sure of what you are acquiring
from the user, how can you garantee the results for the businesses that will be using it? Inserting data validation where appropriate only lifts your application's overall quality to a higher level and
in today's computer world, any level higher is an excellent thing, it gives your application the edge it needs make itself worthy of considerations by those that will need an application like yours.
Many developers today already have libraries, in many languages, to take care of atleast part of the data validation process. Some even offer them for use in your own applications so you could find
them and download them for your own needs if you don't want to code for them yourself. Me I like to use data validation in any project, even those I do for myself
I try to put myself in the user's shoes, think of how I'd like to see things done and then make it happen. Sure when you're creating the application (even those for yourself) you might think I'll always
know what goes where, but think of what if life gets you out of using that application for a couple years and you get back to it then. Would you remember everything that goes everywhere? The moral of the story
is data validation can even protect you from yourself.
THE FINAL WORD:
And there you have it. Another aspect of the development process explained clearly, I hope. For an ending word, I can say that data validation, in many cases, has helped me even in ways I didn't think it could.
I think we've all heard about the KISS development method (Keep It Simple and Stupid) and data validation is one of the best tools you have to help keep things simple, for the potential users of your application.
Of course, the simpler you make it for your users, usually, the harder (and consequently the longer) it is for you to create the solution. That goes without saying but the rewards of such an effort are greater than
the effort you put into the data validation process. Users will like your application for one thing. But they will also tend to use your application correctly as well which results in less support phone calls and
emails and that should always be your ultimate goal, to make something that won't cause you to have to stay up all night answering questions about your application.
Like everything else I write, I can only hope I was clear enough. If I wasn't I would like to know, that's how one betters himself after all. So if there's a part of this article that just doesn't seem clear to you,
email me, I'll see what I can do to make it as clear as it can be and that will be added/altered in this document to help make it as clear as possible. Happy coding.
Stéphane Richard
srichard@adaworld.com