Monday, December 8, 2008

What is a Primary Key?

If I have a small company, any individual employee can be distinguished by a combination of First Name, Middle Initial, and Last Name. This combination uniquely identifies each employee.

If I have a larger company, the chances increase that I could have two employees that have the same name. So the name can no longer be used to uniquely identify a record. So instead, I could use a number like Social Security Number.

A primary key is a special kind of index (see What is an Index?) that is composed of a field (SSN) or combination of fields (First/Middle/Lastname), which uniquely identify a record.
A primary key has a number of useful properties.

First, the value in the primary key cannot be duplicated. If it is a single field, that value cannot be repeated. If it is composed of multiple fields, that combination of values cannot be duplicated. So in the samples above, having a primary key would mean that I could not put two "Roger J Carlson"s in my database.

Secondly, the primary key cannot be NULL and no portion of the primary key (in the case of a multiple field key) can be NULL. (See What does NULL mean?). The NULL means the value of the field is unknown. Obviously, if we don't know the value of the fields, we can't guarantee the value is unique. So disallowing NULLS guarantees we have a valid value in the key fields.

Thirdly, in order to create relationships, there must be a unique index on the field. Since by definition, a primary key is a unique index, a primary key makes an ideal join field. I will discuss this in a later post.

There are two basic types of primary keys: Natural Keys and Surrogate Keys.

Natural keys

A natural key is one composed of a field or fields that already exist in the table. In my examples above, both Social Security Number and Firstname/Middle/Lastname are natural keys. Natural keys can be composed of a single field or multiple fields.

It is important to note that a table can only have a *single* primary key. It is incorrect to say that a table has multiple primary keys. In the case of a multi-field primary key, it has a single primary key composed of multiple fields.

Surrogate keys

A surrogate key is an artificially created number. It has no real-world meaning, and is used mostly in relationships with other tables. In Access, you use the Autonumber datatype to create a surrogate key. This number is system-created and is guaranteed to be unique.

The disadvantage of a surrogate key is that it does not have real-world uniqueness. It would be possible to enter two records for Roger J Carlson, each with a different system-created number. To protect against that, you should also create a unique index on those fields that would otherwise create a natural key.

The advantage of a surrogate key is that it will never be affected by real-world changes to the database. It is also much more efficient to join tables on a single number than on multiple text fields.

Opinions differ, but I prefer a surrogate primary key with a separate unique index. This separates the functions of the primary key: the surrogate key for relationships and the unique index to control real-world uniqueness. In this way, if the conditions that effect the real-world uniqueness (as in the case of moving from a small business to a large business mentioned above), the table relationships will not be disturbed.

Creating a primary key in the Access User Interface is easy. Just open the table in Design View. For a surrogate or single field primary key, select a single record and click the Primary Key button.

For a multi-field primary key, just hold the Control [Crtl] key as you select the fields and then click the primary key button.


Josh Croft said...

"To protect against that, you should also create a unique index on those fields that would otherwise create a natural key."

Hi Roger! I'm loving going through your very helpful 'What is Normalization' document and I arrived here.

Can you explain that quote a bit more? (I took it from the surrogate key text)

Thanks a ton!

PS: I found my way to your document from the sidebar of a subreddit.. it's highly regarded there (or so I gather)

Roger Carlson said...

Okay. The purpose of a primary key is to create a unique record. But uniqueness from a database engine perspective is not necessarily the same as uniqueness from a real-world perspecive.

Natural keys perform both these functions.

However, surrogate keys only perform one of them, i.e. identifying records for the database engine to join. It does not, however, help the database user very much to identify unique entities.

For instance, suppose I a customer (Roger J Carlson) to a customer table and the surogate key gets assigned a value of 233. Now, another data entry person also enters the same customer. This copy of Roger J Carlson is assigned a value of 245.

Both Roger J Carlson's refer to the same real-world customer. Which one is right? Impossible to know.

So, while the record is unique from the database engine perspective, it's not from a real-world perspective.

To guard against that, when you use a surrogate key, you also need to create a unique index on a field or combination of fields that will uniquely identify the real-world Roger J Carlson.

Once Roger J Carlson has been entered, it will there after be impossible to enter another Roger J Carlson.

Obviously, in a large database, such a unique key wouldn't work. However, I could use Name, Gender, Birthdate (or some other unique combination of real-world attributes).

Hope this helps.

riazameen said...

A separate field can be added in the table if no existing field produces unique values for every record. MS Access asks the user to create this type of field if the user does not specify primary key while saving the table. MS Access uses the field name ID and data type AutoNumber for this field.