After having a good discussion on “Table Relationships” in previous chapter, we are going to discuss Data Integrity in this chapter. First, we would discuss the term Data Integrity, followed by a discussion on data integrity types, and finally, we would go through an image which summarizes the data integrity concepts in one sight view. Lets start the discussion with the definition first.
Data Integrity
In relational database management systems, data integrity ensures the accuracy, reliability and consistency of the data during any operation like as data manipulation, data retrieval, data storage, backup or restore operation etc. It also guarantees that the recording of data is accurate and as intended. Having any bad or unintended data in the database, is a failure of data integrity. For example, having sales detail in Sales table of a product not available in Product master is a failure of referential integrity.
Below are few scenarios where we might require data integrity to accept and maintain valid, accurate and intended data in the database:
- We don’t have a NULL value as a product in Product master table. (Entity integrity)
- To ensure that we don’t have duplicate products in Product Master table. (Entity integrity)
- We don’t have any alphanumeric value in Amount column of Sales table. (Domain integrity)
- To ensure that Sales table does not have any product which is not available in Product master table. (Referential integrity)
- Deletion of any product from Product master table is not allowed if it has a reference in other tables, such as, Sales table. (Referential integrity)
- Sale of a product is not allowed if its not available in stock. (User defined integrity or custom integrity)
In above examples, we can see that data integrity ensures the accuracy and consistency of data during data storage and manipulation operations. It defines, what valid values and changes on these values, are allowed in a column of a table. It also maintains the accurate and consistent related data throughout multiple related tables.
Lets discuss each type of data integrity in brief below:
1. Entity Integrity
Entity integrity talks about the primary key of a table. It states that a table must have a defined primary key to ensure unique and not null values in primary key column(s) of the table. Entity integrity rule on a column does not allow duplicate and null values in the column(s). The primary key column must identify each row uniquely in the table.
For example, in a student table, student identification number must be unique and not null to uniquely identify each student and his / her details.
2. Domain Integrity
Domain integrity defines the type, range and format of data allowed in a column. Domain integrity states that all values in a column must be of same type; i.e., an integer type column can only store integer values and not character data. It also defines that the range of the accepted values; e.g., a four byte integer type column can accept values between -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647) including boundary values (SQL Server 2014). It also defines the nullability of the column; i.e., does null values allowed in the column or not.
We define domain integrity on a table in column definitions. Apart from column definitions, we can also use constraints, stored procedures, functions and triggers to implement domain integrity on tables.
3. Referential Integrity
Referential integrity talks about the foreign key concepts and ensures that the child tables do not have any orphaned record. Referential integrity ensures that data in related tables remains accurate and consistent before and after changes. In detail, if we have an Employee table with Employee_ID column having a reference in Salary table, to delete an employee from Employee table, we must have to delete all the records of that employee from Salary table and all other related tables first. By enabling referential integrity on Salary table, we ensure that we don’t have any orphaned record in Salary table which would mislead the paid salary amount; i.e., we will not be able to trace that who got the sum.
In case we implement referential integrity on a table, we cannot perform few operations such as:
- Cannot use TRUNCATE command on a table being referenced by other table(s).
- Cannot insert a record in child table when there is no related record in parent table.
- Cannot delete a record from parent table when there is a reference in any other table.
- Cannot update a record in child table which results orphaned records in child table.
- Cannot update records in parent table which causes orphaned records in child table.
4. User Defined Integrity
Finally, if there is some custom business requirement which does not fit in any of the above discussed integrity types, we can define our own integrity constraint for a column with column definition or using functions or triggers. Creating such kind of custom constraints is known as User Defined Integrity or Custom integrity.
Below image summarizes the data integrity concepts in one view:
Data integrity is used to ensure accurate, reliable and consistent data throughout the database. To avoid invalid data, we must apply appropriate data integrity om columns.
In next chapter, we are going to understand the “Types of keys” in SQL Server.