UPDATE/ADD/ALTER Column and Table in SQL


The following SQL statement updates the first customer (CustomerID = 1) with a new contact person and a new city.


UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;

UPDATE Multiple Records

It is the WHERE clause that determines how many records that will be updated.

The following SQL statement will update the contactname to “Juan” for all records where country is “Mexico”:


UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';

Update Warning!

Be careful when updating records. If you omit the WHERE clause, ALL records will be updated!


UPDATE Customers
SET ContactName='Juan';


To add a column in a table, use the following syntax:

ALTER TABLE table_name
ADD column_name datatype;


To change the data type of a column in a table, use the following syntax:

SQL Server / MS Access:

ALTER TABLE table_name
ALTER COLUMN column_name datatype;


WHERE CustomerName='Alfreds Futterkiste';

DELETE FROM table_name
WHERE condition;

Delete All Records

DELETE * FROM table_name;


To delete a column in a table, use the following syntax (notice that some database systems don’t allow deleting a column):

ALTER TABLE table_name
DROP COLUMN column_name;

SQL Inteview Questions

SET @first = 1
SET @step = 1
SET @last = 1000

WHILE(@first <= @last) BEGIN INSERT INTO TEST_NUMBER VALUES(@first) SET @first += @step


SELECT TOP (1000) [IncrNum]

-- SELECT [IncrNum] FROM 1 to 1000
WHERE [IncrNum] <= 1000

WHERE [IncrNum] <= 1000
AND ([IncrNum] % 2 <> 0)

SISS to SQL Server Data Type Translations

It can be extremely confusing when you first encounter SSIS data types.  At first glance, they seem to be nothing like SQL Server data types you love and know.  That’s why I’ve provided below a conversion chart of SSIS data types to SQL Server data types.  This information is readily available on MSDN but it always seems difficult to find.  Hope this helps!

SISS to SQL Server Data Type Translations


SQL – Queries Tuning and Optimization Techniques

In SQL, it is very difficult to write complex SQL queries involving joins across many (at least 3-4) tables and involving several nested conditions because a SQL statement, once it reaches a certain level of complexity, is basically a little program in and of itself.

A database index is a data structure that improves the speed of operations on a database table. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. Indexing is incredibly important when working with large tables, however, occasionally smaller tables should be indexed, if they are expected to grow.

Try to consistently indent and don’t be afraid to use multiple lines. You don’t have to write it all at once. Complex queries can sometimes just be a collection of simple queries. You need to follow some basic guidelines and Take the time to think these through such as-

List all of the columns that are to be returned

  1. List all of the columns that are used in the WHERE clause
  2. List all of the columns used in the JOINs (if applicable)
  3. List all the tables used in JOINs (if applicable)
  4. Get the correct records selected first
  5. Save the complex calculations for last
  6. If you do use a Common Table Expression (CTE), be aware that the query only persists until the next query is run, so in some cases where you are using the CTE in multiple queries, it might be better for performance to use a temp table.


Once you have the above information organized into this easy-to-comprehend form, it is much easier to identify those columns that could potentially make use of indexes when executed.

It makes a big difference to really understand how the data is combined, selected, filtered, and output. Here, query Optimization tricks come into the picture to increase the performance of the program or software. There are a lot of guideline points to tune your query which do work as the boost of the query performance.  These guideline points are mentioned below:

  1. SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should be included in every stored procedure, trigger, etc. that you write.
  2. The SQL query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.
  3. HAVING clause is used to filter the rows after all the rows are selected if you are using aggregation functions. It is just like a filter. Do not use HAVING clause for any other purposes.
  4. It is the best practice to avoid sub queries in your SQL statement and try to minimize the number of subquery block in your query if possible.
  5. Use operator EXISTS, IN and table joins appropriately in your query. The reason is- Usually IN has the slowest performance
  6. IN is efficient when most of the filter criteria are in the sub-query.
  7. EXISTS is efficient when most of the filter criteria is in the main query.
  8. Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
  9. Be careful while using conditions in WHERE clause.
  10. To write queries which provide efficient performance follow the general SQL standard rules.
  11. Use single case for all SQL verbs
  12. Begin all SQL verbs on a new line
  13. Separate all words with a single space
  14. Right or left aligning verbs within the initial SQL verb
  15. Indexes have the advantages as well as disadvantages as given below-
  16. Do not automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.
  17. Indexes should be measured on all columns that are frequently used in WHERE, ORDER BY, GROUP BY, TOP and DISTINCT clauses.
  18. Do not add more indexes on your OLTP tables to minimize the overhead that occurs with indexes during data modifications.
  19. Drop all those indexes that are not used by the Query Optimizer, generally.
  20. If possible, try to create indexes on columns that have integer values instead of characters. Integer values use less overhead than character values.
  21. To provide up-to-date statistics, the query optimizer needs to make smart query optimization decisions. You will generally want to leave the “Auto Update Statistics” database option on. This helps to ensure that the optimizer statistics are valid, ensuring that queries are properly optimized when they are run.
  22. If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:
  23. Of the search criteria in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
  24. If at least one of the search criteria in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
  25. If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.
  26. queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:
  27. Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or the GROUP BY clauses.
  28. Include a clustered index on the columns in the GROUP BY clause.
  29. Include a clustered index on the columns found in the SELECT clause.
  30. Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often.
  31. When you need to execute a string of Transact-SQL, you should use the sp_executesql stored procedure instead of the EXECUTE statement.
  32. When calling a stored procedure from your application, it is important that you call it using its qualified name.
  33. Use stored procedures instead of views because they offer better performance and don’t include code, variable or parameters that don’t do anything.
  34. If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications.
  35. Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application’s performance.
  36. Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
  37. If you use the CONVERT function to convert a value to a variable length data type such as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30.
  38. If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data.
  39. Don’t use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server.
  40. Don’t return more data (both rows and columns) from SQL Server than you need to the client or middle-tier and then further reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server resources and network bandwidth.
To tune our SQL queries, understanding our database does play the most important role. In SQL, typically each table column has an associated data type. Text, Integer, Varchar, Date, and more, are typically available types for developers to choose from. When writing SQL statements, make sure you choose the proper data type for the column. Sometimes it’s easier to break up subgroups into their own select statement. To write a query, we need to know about the actual need for the query and scope of the query also.




read more:

Quick Tricls: Column and block text selection using SSMS

There are several ways to select text as shown below, including the ability to select and edit columns.

Using SHIFT to Select Text

It is well known that using the SHIFT key you can perform normal text selection in SSMS.

If you put your cursor to the left of “dbo.DimEmployee” and hold the SHIFT key and then put your cursor at the end of “dbo.DimReseller” it will select the first three lines of code as shown below.

ssms shift select

Using SHIFT+ALT to Select Columns

If you would like to select columns or blocks then Microsoft SQL Server offers a solution for you. You can use the key shortcut SHIFT+ALT as described in the following steps. Please note that this feature works using SSMS for SQL Server 2008 and up.

Place your cursor to the left of “dbo.DimEmployee”, press SHIFT+ALT then click at the end of “dbo” in “dbo.DimProductCategory”. This will select columns or blocks in SQL Server Management Studio as shown below.

ssms shift alt select

Using SHIFT+ALT to Select Columns and Insert Text

In SSMS for SQL Server 2012 and up, you can also use SHIFT+ALT to insert text in this block mode.

First place the cursor in the first row where you would like to insert the text (to the left dbo.DimEmployee in our example). Press SHIFT+ALT and click in the last line where you would like to append this text (left of dbo.DimProductCategory). Now type “SELECT * FROM ” and this text will be inserted for each line as shown below.

If you would like to select columns or blocks then Microsoft SQL Server offers a solution for you

Using CTRL+SHIFT+END to Select Text

If you want to select all text from a starting point to the end you can use CTRL+SHIFT+END.

Put your cursor at the beginning point and press CTRL+SHIFT+END to select all text from that point to the end of the text as shown below.

ssms ctrl shift end to select

Using CTRL+SHIFT+HOME to Select Text

If you want to select all text from a starting point to the beginning you can use CTRL+SHIFT+HOME.

Put your cursor at the beginning point and press CTRL+SHIFT+HOME to select all text from that point to the beginning of the text as shown below.

ssms ctrl shift home select

Using CTRL+A to Select All Text

If you want to select all text you can use CTRL+A.

Just press CTRL+A anywhere in the query editor and this will select all text as shown below.

ssms ctrl a to select all text


Read more:


SQL Server Data Types you Must Know

Why data types are important

  1. The data is stored in the database in a consistent and known format.
  2. Knowing the data type allows you to know which calculations and formulations you can use on the column.
  3. Data types affect storage. Some values take up more space when stored in one data type versus another.  Take our age tables above for example.
  4. Data types affect performance. The less time the database has to infer values or convert them the better.  “Is December, 32, 2015 a date?”

Commonly used SQL Server Data Types

There are over thirty different data types you can choose from when defining columns.  Many of these are set up for very specific jobs such as storing images, and others more suitable to general use.

Here is the data types you’ll most frequently encounter in your everyday use of SQL.  These are:

  • INT
  • BIT

INT – Integer Data Type

The integer data type is used to store whole numbers.  Examples include -23, 0, 5, and 10045.  Whole numbers don’t include decimal places.  Since SQL server uses a number of computer words to represent an integer there are maximum and minimum values which it can represent.  An INT datatype can store a value from -2,147,483,648 to 2,147,483,647.

Practical uses of the INT data type include using it to count values, store a person’s age, or use as an ID key to a table.

But INT wouldn’t be so good to keep track of a terabyte hard drive address space, as the INT data type only goes to 2 billion and we would need to track into the trillions.  For this you could use BIGINT.

The INT data type can be used in calculations.  Since DaysToManufacture is defined as INT we can easily calculate hours by multiplying it by 24:

       DaysToManufacture * 24 as HoursToManufacture
FROM   Production.Product

Here you can see the results

Use of INT to perform calculations.

There are many operations and functions you can use with integers which we’ll cover once we dig into functions.

VARCHAR and NVARCHAR – Text Values

Both VARCHAR and NVARCHAR are used to store variable length text values.  “VARCHAR” stands for variable length character.

The number of characters to store in a VARCHAR or NVARCHAR are defined within the column.   For instance as you can see in the following column definition from the object explorer, the product name is defined to hold fifty characters.

VARCHAR definition shown in SQL Server Management Studio

What makes VARCHAR popular is that values less than fifty characters take less space.  Only enough space to hold the value is allocated.  This differs from the CHAR data type which always allocates the specified length, regardless of the length of the actual data stored.

The VARCHAR datatype can typically store a maximum of 8,000 characters.  The NVARCHAR datatype is used to store Unicode text.  Since UNICODE characters occupy twice the space, NVARCHAR columns can store a maximum of 4,000 characters.

The advantage NVARCHAR has over VARCHAR is it can store Unicode characters.  This makes it handy to store extended character sets like those used for languages such as Kanji.

If your database was designed prior to SQL 2008 you’ll most likely encounter VARCHAR; however, more modern databases or those global in nature tend to use NVARCHAR.

DATETIME – Date and Time

The DATETIME data type is used to store the date and time.  An example of a DATATIME value is

1968-10-23 1:45:37.123

This is the value for October 23rd, 1968 at 1:45 AM.  Actually the time is more precise than that.  The time is really 45 minutes, 37.123 seconds.

In many cases you just need to store the date.  In these cases, the time component is zeroed out.  Thus, November 5th, 1972 is

1972-11-05 00:00:00.000

A DATETIME can store dates from January 1, 1753, through December 31, 9999.  This makes the DATETIME good for recording dates in today’s world, but not so much in William Shakespeare’s.

As you get more familiar with the various SQL built-in functions you’ll be able to manipulate the data.  To give you a glimpse, we’ll use the YEAR function to count employees hired each year.  When given a DATETIME value, the YEAR function return the year.

The query we’ll use is

SELECT   YEAR(HireDate),
FROM     HumanResources.Employee

And here are the results

Use YEAR on DATETIME data type

The benefit is the DATETIME type ensures the values are valid dates.  Once this is assured, we’re able to use a slew of functions to calculate the number of days between dates, the month of a date and so on.

We’ll explore these various functions in detail in another blog article.

DECIMAL and FLOAT – Decimal Points

As you may have guessed DECIMAL and FLOAT datatypes are used to work with decimal values such as 10.3.

I lumped DECIMAL and FLOAT into the same category, since they both can handle values with decimal points; however, they both do so differently:

If you need precise values, such as when working with financial or accounting data, then use DECIMAL.  The reason is the DECIMAL datatype allows you to define the number of decimal points to maintain.


DECIMAL data types are defined by precision and scale.  The precision determine the number of total digits to store; whereas, scale determine the number of digits to the right of the decimal point.

A DECIMAL datatype is specified as DECIMAL(precision,scale).

A DECIMAL datatype can be no more than 38 digits.  The precision and scale must adhere to the following relation

0 <= scale <= precision <= 38 digits

In the Production.Product table, the weight column’s datatype is defined as DECIMAL(8,2).  The first digit is the precision, the second the scale.

Weight is defined to have eight total digits, two of them to the right of the decimal place.  We’ll the following sample query to illustrate how this data type.

FROM     Production.Product
WHERE    Weight BETWEEN 29.00 and 189.00

The results follow:

Using DECIMAL data type to display results


Where DECIMAL datatypes are great for exact numbers, FLOATs are really good for long numeric values.  Though a DECIMAL value can have 38 digits total, in many engineering and scientific application this is inadequate.  For scientific applications where extreme numeric values are encountered, FLOAT rises to the top!

FLOATS have a range from – 1.79E+308 to 1.79E+308.  That means the largest value can be 179 followed by 306 zeros (large indeed!).

Because of the way float data is stored in the computer (see IEEE 754 floating point specification) the number stored is an extremely close approximation.  For many application this is good enough.

Because of the approximate behavior, avoid using <> and = operators in the WHERE clause.  Many a DBA has been burned by the statement.

WHERE mass = 2.5

Their expectation are dashed when mass is supposed to equal 2.5, but really, in the computer it is stored as 2.499999999999999; therefore, not equal to 2.500000000000000!

That is the nature of floating points and computers.  You and I see 2.499999999999999 and think for practical purposes it is 2.5, but to the computer, were off just a bit.  J

BIT – Boolean or Yes/No values

There’s times when you just need to store whether something “is” or “is not.”  For instance, whether an employee is active.  It is in these cases that the BIT datatype comes to its own.  This data type be one of three states: 1, 0, or NULL.

The value of 1 signifies TRUE and 0 FALSE.

In this query we’re listing all salaried position job titles

FROM   HumanResources.Employee
WHERE  SalariedFlag = 1

Here are the results

Using the BIT data type in Searches

We could have also use ‘True’ instead of 1.  Here is the same example using ‘True’

FROM   HumanResources.Employee
WHERE  SalariedFlag = 'True'

And the opposite using ‘False’

FROM   HumanResources.Employee
WHERE  SalariedFlag = 'False'

I tend to stick with 1 and 0, since it is easier to type, but if you’re going for readability, then ‘True’ and ‘False’ are good options.

Read more:

Using SSIS to Generate Surrogate Keys

By Richard Hiers,2017

In the following post, I’ll be explaining how to use ETL and SSIS to generate surrogate keys for dimension tables in a data warehouse. Although most dimensional data possesses a natural key or a business key in the source tables, Kimball and everyone else I’ve read strongly recommend using a surrogate key for dimensional tables. Our prospects, for example, all have a unique ProspectID, but it is recommended that the ETL system generate a unique identifying key be for each record. This surrogate key then becomes the foreign key in the Fact tables.

You can read more about the advantages of using a surrogate key in Kimball’s The Data Warehouse Toolkit, 3rd Ed. starting on page 98, Corr’s Agile Data Warehouse Design starting on page 141, and Adamson’s Star Schema on page 30 and following. But in brief, some of the key benefits of using a surrogate key are:

  1. Isolates the data warehouse from any instability in the source OLTP system keys where business keys may possibly be deleted, recycled, and modified in ways that could be detrimental to the functioning of the data warehouse.
  2. Facilitates pulling dimensional attributes from disparate source systems in which each has its own independent business keys.
  3. Supports slowly changing dimensions (SCD) where there can be multiple versions of any given entity in the dimension.

So the question becomes, how do you create and maintain these surrogate keys?

The Need: An SSIS Surrogate Key Generator

It would be possible to design the dimension table with an auto-incrementing identity key, but at least for performance reasons, that doesn’t seem to be the preferred mechanism. From Kimball starting on page 469: “For improved efficiency, consider having the ETL tool generate and maintain the surrogate keys. Avoid the temptation of concatenating the operational key of the source system and a date/time stamp.”  OK, so how do I accomplish that?

The Challenge: An SSIS Surrogate Key Generator

Given this and similar recommendations found on the web, and given the maturity of Microsoft’s SSIS ETL tools, I would expect this to be pretty straight forward. I would expect that I could just drag the Surrogate Key Creator transformation into my Data Flow and go on my merry way. But its not that straight forward. The tool doesn’t exist and even in Knight’s Professional Microsoft SQL Server 2014 Integration Services, the topic is not addressed that I can find, though I haven’t yet read it from stem to stern.

Given the complexity of some of the solutions I found on the web, it is really surprising to me that Microsoft hasn’t made this simpler to accomplish. But that seems to be where we are. With the help of some instructions on the web, I was eventually able to craft an ETL surrogate key generator independent of my data warehouse RDBMS. In the examples below I am creating a generator for my Expected Entrance Term dimension table.

Using a Script Component

I was able to create a working surrogate key (SK) generator first referring to the concepts and script found in a 2011 post from Joost van Rossum’s SSIS blog. I’ll briefly outline the solution since I found his instructions to be incomplete (He seems to have assumed a higher level of familiarity with the various components and thus omitted some details). After that, I’ll outline a simpler method which does not require the use of a script.

  1. Start by querying the destination dimension table for the last SK used. On the Control Flow tab in Microsoft Visual Studio, drag an Execute SQL Task to the canvas. Double click to edit.
  2. ResultSet set to “Single row”
  3. Define your OLE DB Connection
  4. Insert your SQL Statement (modify as needed):
FROM SemesterTerms

SSIS sql task control flow

  1. On the Result Set tab, Click the Add button to add a Variable. Leave the Variable Name as “User::Counter” and rename the Result Name to whatever you prefer.

result set variable

  1. Click OK to save your changes
  2. Add a Data Flow Task, connect the Execute SQL Task to it, and double-click to edit
  3. In the Data Flow, add an OLE DB Source to the canvas and configure it to select the records you need to populate your dimension (from the Term table in my case)
  4. Connect that to a Script Component and click to edit it (choose “Transformation” for the type when you drop it on the canvas)
  5. Go to the Inputs and Outputs tab and click Add Column and give it a Name and DataType (I used TERMKEY and four-byte signed integer [DT_I4]). Note, name it something other than “RowId”, you will need to modify the code below)
  6. Make sure the ScriptLanguage property is set to C# and click the Edit Script… button to open the script editor
  7. Replace the place-holder code with the C# code from Rossum’s blog post (step 7) referenced above. (make sure you select the C# code and not the VB.NET code). NOTE: If you did not name your new Column above “RowId” you will need to change the modify row 59 in the sample code to reflect your Column name.
  8. Save the script and close that window and then click OK to save and close the Script Component editor
  9. Add an OLE DB Destination and configure it to map the source columns plus the new key column to the appropriate destination columns. The data flow will look like this:

surrogate key generator with c# script

  1. Once you have saved this project, click the Start button to take it for a spin. On the first run you should see all of your dimension records added to the data warehouse with a sequential keys starting at 1.
  2. Start the project a second time, and you will see you will now have twice the records, but the surrogate keys for the second load will start where the previous load ended. In my case the first load ended with and SK of 280 and the second load started with 281.

Nice! Of course in a production ETL system, the second load would be only include any new rows (and assigned the next higher keys) or changes which would be handled differently based on the type of attribute that changed (are we tracking historical values, or just the current value?).

A Simpler Surrogate Key Generator

While attempting to fill in the gaps of the instructions I found for the script method above, I came across a simpler method that doesn’t require a script on Sam Vanga’s blog (scroll down to his “With ROW_NUMBER()” method). It utilizes the same Execute SQL Task, but makes utilizes the results in the initial OLE DB Source transformation tool rather than in a Script Component. Here are the steps I took:

First, perform steps 1-7 from the previous method.

  1. In the Data Flow, add an OLE DB Source to the canvas and configure it to select the records you need to populate your dimension (from the Term table in my case). However, in addition to the query you used in the previous method, you will add another column to the SELECT to make use of the variable returned from the Execute SQL Task.
,TermCalendarID AS TERM_ID
FROM SemesterTerms
  1. As in the query above, add the statement “ROW_NUMBER() OVER(ORDER BY <column_name>) + ? AS TERM_KEY”. Without the “+ ?” , this query orders the rows by the column indicated and then assigns sequential numbers, starting with 1, to each row. In my case, I had 280 rows in the result, numbered sequentially from 1-280.
  2. The “?” (question mark) is the parameter placeholder in the OLE DB Source for SSIS. After adding the question mark, click on the Parameters… button to the right of the query window. NOTE, if you don’t have a “?” in your query, you will get the message: “The SQL statement does not contain any parameters.”
  3. Select the variable you defined in step 5 (User::Counter in my case) and click OK to save.

SSIS OLE DB source editor parameters

What this does then (“ROW_NUMBER() OVER(ORDER BY <column_name>) + ? AS TERM_KEY”), is give each row a sequential row number, and then add to each the variable number which is the MAX surrogate key value of the destination dimension table. On initial load, the TERM dimension table is empty, so “0” is added to each row number. The destination ends up with 280 rows, with TERM-KEYs numbered sequentially 1-280. On the next subsequent load, each new row number will have 280 added to it. So if on the next load 5 new terms had been added to the source table, the first would be ROW_NUMBER 1, plus 280 which would result in a new surrogate key of 281, and so on.

  1. Add an OLE DB Destination and configure it to map the source columns plus the new key column to the appropriate destination columns. OK to save.

This simpler surrogate key generator populates the destination dimension just as the previous one did. I haven’t compared the performance of these two methods, but Sam Vanga claims that the ROW_NUMBER() method is twice as fast as the script option (he is using a different script than I though).


Kimball specified that the surrogate key generator should “independently generate surrogate keys for every dimension; it should be independent of database instance and able to serve distributed clients… consider having the ETL tool generate and maintain the surrogate keys” (pp. 469-470). Do these two methods meet that specification? They certainly do not rely on database triggers or on an identity column. I’m not sure what he is thinking of with “distributed clients” (Can anyone provide an insight? Please leave a comment below.) Are these methods examples of the ETL tool “maintaining” the keys? Kimball seems to be saying the generator is a thing. When I started looking around for an SSIS key generator, I was surprised to find only methods. I’ll continue to look for other options, but for now, this seems to be the best option. Let me know your thoughts and experience.


read more:

Install AdventureWorks2014 and AdventureWorksDW2014 Step by Step

Before installation, please download 2 sample databases from below,



Open SQL Server Management Studio, Right click on “Databases”, select “Restore Database”,

Select “Device” and click button on the right,

Click “Add” button,

Select AdventureWorks2014.bak, click “OK” button.

Click “OK” button

Click “OK” button.

AdventureWorks2014 has been installed.

Please refer above steps to install AdventureWorksDW2014, it’s exactly the same steps.




Installing SQL Server 2016 Developer Edition and SQL Server Management Studio Step by Step

Installing SQL Server 2016 Developer Edition

First, you need SQL Server 2016 Developer Edition. Here is a link of how to get SQL Server 2016 Developer Edition for free: SQL Server 2016 Developer Edition is Free.

Once you have SQL Server 2016 Developer Edition, start the installation by executing setup.exe. SQL Server Installation Center starts:


Click the Installation page:


Click the link labelled “New SQL Server stand-alone installation or add features to an existing installation”:


Accept the defaults on the Product Key page – Developer Edition and no product key, then click the Next button:


On the License Terms page check the “I accept the license terms” checkbox and click the Next button:


If the Install Rules page tests pass, click the Next button to proceed:


On the Feature Selection page, select the features you want to install and a location for the instance root files. I installed the Database Engine Services, Analysis Services, Reporting Services – Native, Integration Services, and Client Tools Connectivity, Client Tools Backwards Compatibility, Master Data Services. Click the Next button to proceed:


On the Instance Configuration page select default instance if it’s your first time installing SQL Server, otherwise select Named instance and give an instance name and clicked the Next button:


On the Server Configuration page, I accepted the defaults for services and startup types:


On the Database Engine Configuration page, I opted to use Mixed Mode, supplied a strong password, and added the current user to the SQL Server Administrators group.


On the Analysis Services Configuration page, select Multidimensional and Data Mining Mode (from where you can also see Tabular Mode and PowerPivot Mode), and click Add Current User, then click Next button:


On the Reporting Services Configuration page, select install only, then click the Next button:

The Ready to Install page displayed; I clicked the install button to begin the installation.


Installing SQL Server Management Studio

Once the installation completed, I clicked the “Install SQL Server Management Tools” link on the SQL Server Installation Center’s Installation page:


The link takes me to a page titled Download SQL Server Management Studio.


Once the download is complete, click the Run button to install SSMS 2016:


The SQL Server Management Studio (SSMS) installation starts:



3 Effective Ways of Generating Surrogate Keys With SSIS

A surrogate key is an auto-generated value, usually integer, in the dimension table. It is made the primary key of the table and is used to join a dimension to a fact table. Among other benefits, surrogate keys allow you to maintain history in a dimension table. Despite their popularity, SSIS doesn’t have a built-in solution for generating surrogate keys. Let’s take a look at a few alternatives in this post.

First, create the following table. We will import data from Person. Person table (AdventureWorks sample) into this table. Note: PersonSK is the surrogate key.
CREATE TABLE [dbo].Person
, [FirstName] NVARCHAR(50) NULL
, [LastName] NVARCHAR(50) NULL
) ;

1. With IDENTITY()

Drag a data flow task on to the control flow and configure the OLE DB source.

Next, drag OLE DB destination and connect it to the source. Specify the connection manager, and choose Table or view – fast load as the data access mode. This performs a bulk insert and is the fastest of all.

Destination table has 3 columns, but source has only 2 columns. In the mappings page, map input and output columns for FirstName and LastName and ignore the mapping for PersonSK.

When you run the package, becasue PersonSK is an identity column in the table, SQL Server will automatically generate values for you. This solution is easy and fast, but sometimes depending on your ETL methodology, you can’t rely on IDENTITY().

2.With Script Component

I frequently use Script Transformation. The steps are nicely written in this post by Phil Brammer (b).  This is simply a script used as a transformation. The script generates a row number for each row passed through the data flow.

3.. With ROW_NUMBER()

You can use ROW_NUMBER() when working with SQL Server data source to let the database engine do the work for you. This can be faster.

If you’re doing an incremental load, first find the maximum key value from the destination. I’ll use the following query. It’ll return zero if there were no rows, else it returns the maximum value.

FROM    dbo.Person

Add an Execute SQL Task to the control flow and set the result property to single row. Then, add a variable to hold the return value.

Next, connect a data flow task to the execute SQL task. We will use the following SQL statement in the OLE DB source editor. In addition to the LastName and FirstName columns, we are using ROW_NUMBER() function to generate a unique number for every row.

, [LastName]
, [FirstName]

The query will generate numbers starting from 1 for each row, but while loading to the destination we don’t want to start from 1. We want to find the maximum value in the destination and start from the next highest value. So, I’m adding the max value to every row number using parameters in the OLE DB source.

In the OLE DB destination, check the box that says Keep Identity. By doing this we are asking SSIS to keep the identity values that are generated by the source. In the mappings page, you’ll see a new input that was created in the OLE DB source. Map it to PersonSK surrogate key.

Go ahead and run the package. If it’s all good you will see cute little green tick marks like below.


In this example, we looked at different options to generate surrogate keys while loading dimensions using SSIS. We used IDENTITY() constraint in SQL Server. We talked about Script Component. Finally, we saw making use ROW_NUMBER() function. Last option is twice as fast as using the Script Component with around 20,000 rows and an index on LastName column.