Working with surrogate Keys

Introduction

A lot of developpers are still confusing, keys and uniqueness of data because they have been teached at school in a confusing way: they use natural keys and of course use these type of keys to ensure the uniqueness in the same time.

When you use surrogate keys, you ensure that every set of data in a row will be uniquely identifed and can have a correspondance in another related table.

The uniqueness of real data, will be ensure in a another way. We are goinf to see how in this article

In a relational Database, the best way to manage this uniqueness for a key is to generate a number that is incremented regularly regarding the most recent number already generated

In some databases before, you had to manage it manually (with triggers for example), but in SQL Server a specification exists and you can use it (IDENTITY)

Creation of the Parent table

CREATE TABLE [dbo].[employees](
[IDemployees] [int] IDENTITY(1,1) NOT NULL,
[LastName] [char](30) NULL,
[FirstName] [char](30) NULL,
CONSTRAINT [PK_employees] PRIMARY KEY CLUSTERED
(
[IDemployees] ASC
)
)

Now that your table is created, you can start with some inserts of data

INSERT INTO dbo. employees
(
LastName
, FirstName
)
VALUES
(
'Patulacci',
'Marc'
)

 

You can now have a look at the data inserted

SELECT * FROM dbo.Employees

 

Here is the result :

IDemployees        LastName              FirstName

1                  Patulacci             Marc

We can observe that a int was generated from 1 – IDENTITY (1,1) –

Let’s try some insertion of data in our new table

INSERT INTO dbo. employees
(
LastName
, FirstName
)
VALUES
(
'Robichais',
'Bob'
)

here is the result

IDemployees        LastName              FirstName

1                  Patulacci              Marc
2                  Robichais              Bob

We can observe that the Primary Key (IDemployees) has been incremented of 1 – IDENTITY (1,1) –

We now can be sure that each set of rows in the table will be identified as unique.

We can also be sure that nothing can change this value because we let the database manage its own counter (initialization, incrementation)

 

A big mistake is trying to manage the uniqueness in the code by generating the values of a counter code side instead, and force the insertion of the value from the code in the database (I saw that in numerous of big projects…)

How to implement uniqueness in Data

Now that we have defined a primary key for our table, we want to be sure that a couple of data is unique.

For example, here we do not want our employees names to be inserted two times.

Furthermore, we would like our database to manage this part of work (I don’t want to make a select on my table, in order to detect if the information I need to insert, already exists and I want to avoid to manage the code under consequences)

If the database can do that for us, we can:

  • Save a lot of time by generating less code
  • Save a lot of time at the execution because we save one select at least before the insertion.

 

Here is the script that can help us:


ALTER TABLE dbo.employees
ADD CONSTRAINT [unique_names]
UNIQUE NONCLUSTERED
(
[LastName]
,[FirstName]
)

Although we have created an index with a unique property (that is the constraint)

Now we should be able to insert more information in our employees table, except if the information defined by the uniqueness constraint is not respected (two employees with the same name)


INSERT INTO dbo. employees
(
LastName
, FirstName
)
VALUES
(
'Robichais',
'Bob'
)

sends the following error message:

Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint 'unique_names'. Cannot insert duplicate key in object 'dbo.employees'.
The statement has been terminated.

On the code side, we just have to trap the error and manage an appropriate error message. You can even do that on the server and send the complete string already generated via  an Ouput parameter.

The other technic consists in delimiting the statements with try / catch keywords in order to easily manage the exceptions and errors