Sql Server Surrogate Key Generation

Posted on  by 

Apr 20, 2006 Every RDBMS of which I’m aware offers a feature to make surrogate keys easier by automatically generating the next larger value upon insert. In SQL Server, it’s called an IDENTITY column. In MySQL, it’s called AUTOINCREMENT. It’s possible to generate the value in SQL, but it’s easier and generally safer to let the RDBMS do it instead.

Posted Feb 28, 2011

By Gregory A. Larsen

In my last article I talked about the difference between surrogate keys and natural keys. In that article I discussed how surrogate keys are made up keys, meaning they do not appear naturally in the data. In this article I will be showing you how to generate those surrogate keys using an identity column. I will be exploring what is an identity column, how to define an identity column and the different methods of populating an identity column.

What is an identity column?

An identity column is a single column in a table that has its identity column property set. A table doesn't need to have an identity column. When a table has an identity column, that column is automatically populated with an integer value every time a new row is added to the table; more on this is a minute. The value of an identify column is based on a seed and increment value that is associated with the identify column; more detail on this further down in this article.

An identity column property can only be set on columns that are declared as a decimal, int, numeric, smallint, bigint, or tinyint. If the identity property is associated with a numeric or decimal, the scale must be set to 0. When you set the identity property, there are two components of that property: seed and increment. Additionally, the column must be defined to not allow NULL values to be inserting into it. You can set up an identity column when you declare a table, or you can set up an identify column on an existing table column by altering the column properties.

Creating a table with an identity column

When you create a table you can define the identity column. You can also add an identity column to a pre-existing table; more on that later. To define an identity column when you create a table you just need to set the IDENTITY property on the CREATE TABLE statement. Here is an example:

Above I created a column called 'ID' that is my IDENTITY column. Note I specified 'IDENTITY(1,1).' The '1,1' notation specifies the 'seed' and 'increment' value for the identity column. The 'seed' value is used to set the value of the ID column for the first row inserted into the table. The 'increment' value is used to populate the identity column value for additional rows added to the table, by adding this value to identity column value of the previously inserted row. The 'seed' and 'increment' values need to be an integer, both positive and negative values are allowed. In my example above I said I wanted my first row inserted to have an identity column value of 1. The second inserted row would have an identity column value of 2, and so on and so forth.

You can also create an identity column when creating a table using a SELECT statement with an INTO table clause. To do this you use the IDENTITY function. The IDENTITY function has the following syntax:

Where data_type is one of the valid identity column data types listed above, seed is the identity column value for the first row added, increment is an integer value that is added to the identity column value of the prior inserted row and column_nameis the name of the IDENTITY column to be created.

Here is an example of how to create a new table that has an identity column using a SELECT ... INTO method:

Here I am using the SELECT...INTO syntax to create the table MyTableNew. To define my identity column I used the IDENTITY function to define an integer column where the identity properties have a seed value of 1 and an increment value of 1.

Altering an existing table to have an identity column

Occasionally you might find you need to add the identity property to an existing column in an existing table, or adding a new identity column to an existing table. Let me explore how to do this, and the issues you might run into.

First let's talk about altering a table to add an identity column to an existing table. By adding an identity column, I mean adding a brand new column to a table. To do that you need to alter the table definitions. Let's assume I have the following table definition:

For this example, assume that this table already has 39 different rows in this table, where the County Code contains abbreviation of the County name to uniquely identify each row, the ReferenceID is basically row number that is manually populated, and the CountyName the spelled out the name of the county. Say I decided I wanted to put a surrogate key column on this table that is an INT and populate it using the IDENTITY property. To do that I would just need to run the following ALTER TABLE statement:

Upon executing this ALTER statement, SQL Server will first alter the table adding the CountyID column. Then once the column is added SQL Server will number all the existing rows automatically based on the identity property.

Assume I want to set the identity property of my existing ReferenceID which has already been populated manually with a row number. There is no simple one statement method to accomplish this. Instead I have to jump through a number of hoops to do this.

Assume my original table above looked like this:

Where I have 39 existing records populated in this table, where each row has a unique reference number that has been set manually. Assume for now there are no constraints on this table. In order to make the ReferenceID my identity column, I would first need to rename the table to say to something like dbo.CountyOld. Then I could create my new County table using the following code, which sets the ReferenceID as an identity:

After this I would set the IDENTIFY_INSERT ON (more on this in the next section) for this table. Then run the following code:

After the INSERT statement was done running, I would turn the IDENTITY_INSERT OFF for this table, and then drop the dbo.CountyOld table. If I had constraints on my table I would have to take the necessary actions to drop and recreate those constraints.

Alternatively, I can use the 'Design' feature of a table in SQL Server Management Studio to set the identity properties on an existing table. Using SQL Server Management Studio, perform similar steps as I described above.

Inserting records into a table that has an identity column

When you have a table with an identity column there are things you need to think about when inserting records into these tables. Let me go through a couple of INSERT statements to describe how inserting records is done.

First, let me talk about how to insert records where the identity column is populated automatically using the identity properties. Remember the table dbo.MyTable that I created above, it had three columns — ID, MyShortDesc, and MyLongDesc — where the identity property was set on the IDcolumn. This is the table I will be using for my example, and here is an INSERT statement that adds a new row to this table:

In this example I specified the column names I was populating with values in the dbo.MyTable by placing those columns inside parenthesis immediately following dbo.MyTable. Note how I didn't specify the identity column ID. I didn't have to include this column in my INSERT statement because it will automatically be populated using the identity property setting associated with this column. Another way to write this insert statement is like this:

Here I left off the column names following the table name dbo.MyTable. I was able to do that because SQL Server knows the only other column on this table is the identity column, and it knows how to populate the value for that column.

Generate Surrogate Key In Sql Server

What if I wanted to set the identify column value myself on the INSERT statement. How is this done? As it turns out this isn't as simple as one might think. I CANNOT just execute this code:

If I try to run an INSERT statement similar to this where I try to identify a value of the identity column I would get this error:

This error message tell me I need to set the IDENTITY_INSERT value to ON if I wanted to explicitly set the identity value. Let's try this again and set the IDENTITY_INSERT value to ON by using this code:

By using the SET statement to set the IDENTITY_INSERT option to ON, it allows me to set the identify column ID to a value '12.' Keep in mind that you can only have the IDENTIFY_INSERT value turned on for only one table at a time in a session. Also, when you have IDENTITY_INSERT on you are able to insert multiple rows with the same identity column value, provided you don't have a constraint that restricts duplicate values in your identity column. You can also insert rows that have an identity column value greater than the last identity column value created. This will leave holes in your identify column values and will also set the value SQL Server is keeping that helps it determine the next identify value. Once you are done inserting rows, where you are setting the identity column value, you should turn off the IDENTITY_INSERT option by running the following command:

Deleting records from a table that has identify column

You might be wondering what happens with identity column values when you delete a record in a table that has an identity column. When rows are deleted, the identity values are not reused. Therefore, over time you will have gaps in your identity column values based on the records that have been deleted. If this is a problem for your situation, you might consider using a trigger to populate a sequential number column instead of using an identity column.

Server

Value of using an identity column

Identity columns make it easy to have surrogate key columns that are automatically populated. Having a column be populated by the identity property also makes it easy to create unique identity column values for each row. Next time you want a surrogate key when you design a table, consider creating the key as an identity column.





Latest Forum Threads
MS SQL Forum
TopicByRepliesUpdated
SQL 2005: SSIS: Error using SQL Server credentialspoverty3August 17th, 07:43 AM
Need help changing table contentsnkawtg1August 17th, 03:02 AM
SQL Server Memory confifurationbhosalenarayan2August 14th, 05:33 AM
SQL Server – Primary Key and a Unique Keykatty.jonh2July 25th, 10:36 AM



Primary Key Generation Using Oracle's Sequence

Oracle provides the sequence utility to automatically generate unique primary keys. To use this utility to auto-generate primary keys for a CMP entity bean, you must create a sequence table and use the @AutomaticKeyGeneration annotation to point to this table.

In your Oracle database, you must create a sequence table that will create the primary keys, as shown in the following example:

This creates a sequences of primary key values, starting with 1, followed by 2, 3, and so forth. The sequence table in the example uses the default increment 1, but you can change this by specifying the increment keyword, such as increment by 3. When you do the latter, you must specify the exact same value in the cacheSize attribute of the @AutomaticKeyGeneration annotation:

If you have specified automatic table creation in the CMP bean's project settings, the sequence table will be created automatically when the entity bean is deployed. For more information, see @JarSettings Annotation. For more information on the definition of a CMP entity bean, see below.

Primary Key Generation Using SQL Server's IDENTITY

In SQL Server you can use the IDENTITY keyword to indicate that a primary-key needs to be auto-generated. The following example shows a common scenario where the first primary key value is 1, and the increment is 1:

In the CMP entity bean definition you need to specify SQLServer(2000) as the type of automatic key generator you are using. You can also provide a cache size:

If you have specified automatic table creation in the CMP bean's project settings, the sequence table will be created automatically when the entity bean is deployed. For more information, see @JarSettings Annotation. For more information on the definition of a CMP entity bean, see below.

Primary Key Generation Using a Named Sequence Table

A named sequence table is similar to the Oracle sequence functionality in that a dedicated table is used to generate primary keys. However, the named sequence table approach is vendor-neutral. To auto-generate primary keys this way, create a named sequence table using the two SQL statements shown in the example:

In the CMP entity bean definition you need to specify the named sequence table as the type of automatic key generator you are using. You can also provide a cache size:

If you have specified automatic table creation in the CMP bean's project settings, the sequence table will be created automatically when the entity bean is deployed. For more information, see @JarSettings Annotation. For more information on the definition of a CMP entity bean, see the next section.

Note. When you specify a cacheSize value for a named sequence table, a series of unique values are reserved for entity bean creation. When a new cache is necessary, a second series of unique values is reserved, under the assumption that the first series of unique values was entirely used. This guarantees that primary key values are always unique, although it leaves open the possibility that primary key values are not necessarily sequential. For instance, when the first series of values is 10...20, the second series of values is 21-30, even if not all values in the first series were actually used to create entity beans.

Defining the CMP Entity Bean

When defining a CMP entity bean that uses one of the primary key generators, you use the the @AutomaticKeyGeneration annotation to point to the name of the primary key generator table to obtain primary keys. Also, you must define a primary key field of type Integer or Long to set and get the auto-generated primary key. However, the ejbCreate method does not take a primary key value as an argument. Instead the EJB container adds the correct primary key to the entity bean record.

The following example shows what the entity bean might look like. Notice that the bean uses the named sequence option described above, and that Sql Server Surrogate Key GenerationejbCreate method does not take a primary key:

Create Table Surrogate Key Sql Access

Sql Server Surrogate Key Generation

Sql Server Surrogate Key Generation Download

Related Topics

Coments are closed