INDEX MATCH in Excel

The INDEX MATCH MATCH formula is the combination of two functions in Excel: INDEX and MATCH.

=INDEX() returns the value of a cell in a table based on the column and row number.

=MATCH() returns the position of a cell in a row or column.

Combined, the two formulas can look up and return the value of a cell in a table based on vertical and horizontal criteria.  For short, this is referred to as just the Index Match function.

#1 How to use the INDEX formula

Below is a table showing people’s name, height and weight. We want to use the INDEX formula to look up Kevin’s height… here is an example of how to do it.

Follow these steps:

  1. Type “=INDEX(” and select the area of the table then add a comma
  2. Type the row number for Kevin, which is “4” and add a comma
  3. Type the column number for Height, which is “2” and close the bracket
  4. The result is “5.8”

Index formula Excel

 

#2 How to use the MATCH formula

Sticking with the same example as above, let’s use MATCH to figure out what row Kevin is in.

Follow these steps:

  1. Type “=MATCH(” and link to the cell containing “Kevin”… the name we want to look up
  2. Select the all the cells in the Name column (including the “Name” header)
  3. Type zero “0” for an exact match
  4. The result is that Kevin is in row “4”

Match formula Excel

Use MATCH again to figure out what column Height is in.

Follow these steps:

  1. Type “=MATCH(” and link to the cell containing “Height”… the criteria we want to look up
  2. Select the all the cells across the top row of the table
  3. Type zero “0” for an exact match
  4. The result is that Height is in column “2”

Match function

 

#3 How to combine INDEX and MATCH

Now we can take the two MATCH formulas and use them to replace the “4” and the “2” in the original INDEX formula. The result is an INDEX MATCH MATCH formula.

Follow these steps:

  1. Cut the MATCH formula for Kevin and replace the “4” with it
  2. Cut the MATCH formula for Height and replace the “2” with it
  3. The result is Kevin’s Height is “5.8”
  4. Congratulations, you now have a dynamic INDEX MATCH formula!

Index Match Match in Excel

 

https://corporatefinanceinstitute.com/resources/excel/study/index-match-formula-excel/

UPDATE/ADD/ALTER Column and Table in SQL

UPDATE Table

The following SQL statement updates the first customer (CustomerID = 1) with a new contact person and a new city.

Example

UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;

UPDATE Multiple Records

It is the WHERE clause that determines how many records that will be updated.

The following SQL statement will update the contactname to “Juan” for all records where country is “Mexico”:

Example

UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';

Update Warning!

Be careful when updating records. If you omit the WHERE clause, ALL records will be updated!

Example

UPDATE Customers
SET ContactName='Juan';

ALTER TABLE – ADD Column

To add a column in a table, use the following syntax:

ALTER TABLE table_name
ADD column_name datatype;

ALTER TABLE – ALTER/MODIFY COLUMN

To change the data type of a column in a table, use the following syntax:

SQL Server / MS Access:

ALTER TABLE table_name
ALTER COLUMN column_name datatype;

DELETE COLUMN/TABLE TABLE

DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste';

DELETE FROM table_name
WHERE condition;

Delete All Records

DELETE * FROM table_name;

ALTER TABLE – DROP COLUMN

To delete a column in a table, use the following syntax (notice that some database systems don’t allow deleting a column):

ALTER TABLE table_name
DROP COLUMN column_name;

SQL Inteview Questions

--CREATE a TABLE FOR INCREMENTAL NUMBBER FROM 1 to 1000
CREATE TABLE dbo.TEST_NUMBER (IncrNum int NOT NULL)
;
--INSERT VALUES USING LOOP
DECLARE @first AS INT
SET @first = 1
DECLARE @step AS INT
SET @step = 1
DECLARE @last AS INT
SET @last = 1000

BEGIN TRANSACTION
WHILE(@first <= @last) BEGIN INSERT INTO TEST_NUMBER VALUES(@first) SET @first += @step

END COMMIT TRANSACTION

-- SELECT TOP 1000 ROWS 
SELECT TOP (1000) [IncrNum]
 FROM [dbo].[TEST_NUMBER]

-- SELECT [IncrNum] FROM 1 to 1000
SELECT *
FROM [dbo].[TEST_NUMBER]
WHERE [IncrNum] <= 1000

-- SELECT ODD NUMBERS
SELECT *
FROM [dbo].[TEST_NUMBER]
WHERE [IncrNum] <= 1000
AND ([IncrNum] % 2 <> 0)

SISS to SQL Server Data Type Translations

It can be extremely confusing when you first encounter SSIS data types.  At first glance, they seem to be nothing like SQL Server data types you love and know.  That’s why I’ve provided below a conversion chart of SSIS data types to SQL Server data types.  This information is readily available on MSDN but it always seems difficult to find.  Hope this helps!

SISS to SQL Server Data Type Translations

http://www.sqlservercentral.com/blogs/dknight/2010/12/22/ssis-to-sql-server-data-type-translations/

SQL – Queries Tuning and Optimization Techniques

In SQL, it is very difficult to write complex SQL queries involving joins across many (at least 3-4) tables and involving several nested conditions because a SQL statement, once it reaches a certain level of complexity, is basically a little program in and of itself.

A database index is a data structure that improves the speed of operations on a database table. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. Indexing is incredibly important when working with large tables, however, occasionally smaller tables should be indexed, if they are expected to grow.

Try to consistently indent and don’t be afraid to use multiple lines. You don’t have to write it all at once. Complex queries can sometimes just be a collection of simple queries. You need to follow some basic guidelines and Take the time to think these through such as-

List all of the columns that are to be returned

  1. List all of the columns that are used in the WHERE clause
  2. List all of the columns used in the JOINs (if applicable)
  3. List all the tables used in JOINs (if applicable)
  4. Get the correct records selected first
  5. Save the complex calculations for last
  6. If you do use a Common Table Expression (CTE), be aware that the query only persists until the next query is run, so in some cases where you are using the CTE in multiple queries, it might be better for performance to use a temp table.

 

Once you have the above information organized into this easy-to-comprehend form, it is much easier to identify those columns that could potentially make use of indexes when executed.

It makes a big difference to really understand how the data is combined, selected, filtered, and output. Here, query Optimization tricks come into the picture to increase the performance of the program or software. There are a lot of guideline points to tune your query which do work as the boost of the query performance.  These guideline points are mentioned below:

  1. SET NOCOUNT ON at the beginning of each stored procedure you write. This statement should be included in every stored procedure, trigger, etc. that you write.
  2. The SQL query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.
  3. HAVING clause is used to filter the rows after all the rows are selected if you are using aggregation functions. It is just like a filter. Do not use HAVING clause for any other purposes.
  4. It is the best practice to avoid sub queries in your SQL statement and try to minimize the number of subquery block in your query if possible.
  5. Use operator EXISTS, IN and table joins appropriately in your query. The reason is- Usually IN has the slowest performance
  6. IN is efficient when most of the filter criteria are in the sub-query.
  7. EXISTS is efficient when most of the filter criteria is in the main query.
  8. Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
  9. Be careful while using conditions in WHERE clause.
  10. To write queries which provide efficient performance follow the general SQL standard rules.
  11. Use single case for all SQL verbs
  12. Begin all SQL verbs on a new line
  13. Separate all words with a single space
  14. Right or left aligning verbs within the initial SQL verb
  15. Indexes have the advantages as well as disadvantages as given below-
  16. Do not automatically add indexes on a table because it seems like the right thing to do. Only add indexes if you know that they will be used by the queries run against the table.
  17. Indexes should be measured on all columns that are frequently used in WHERE, ORDER BY, GROUP BY, TOP and DISTINCT clauses.
  18. Do not add more indexes on your OLTP tables to minimize the overhead that occurs with indexes during data modifications.
  19. Drop all those indexes that are not used by the Query Optimizer, generally.
  20. If possible, try to create indexes on columns that have integer values instead of characters. Integer values use less overhead than character values.
  21. To provide up-to-date statistics, the query optimizer needs to make smart query optimization decisions. You will generally want to leave the “Auto Update Statistics” database option on. This helps to ensure that the optimizer statistics are valid, ensuring that queries are properly optimized when they are run.
  22. If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:
  23. Of the search criteria in the WHERE clause, at least one of them should be based on a highly selective column that has an index.
  24. If at least one of the search criteria in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.
  25. If none of the columns in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.
  26. queries that include either the DISTINCT or the GROUP BY clauses can be optimized by including appropriate indexes. Any of the following indexing strategies can be used:
  27. Include a covering, non-clustered index (covering the appropriate columns) of the DISTINCT or the GROUP BY clauses.
  28. Include a clustered index on the columns in the GROUP BY clause.
  29. Include a clustered index on the columns found in the SELECT clause.
  30. Adding appropriate indexes to queries that include DISTINCT or GROUP BY is most important for those queries that run often.
  31. When you need to execute a string of Transact-SQL, you should use the sp_executesql stored procedure instead of the EXECUTE statement.
  32. When calling a stored procedure from your application, it is important that you call it using its qualified name.
  33. Use stored procedures instead of views because they offer better performance and don’t include code, variable or parameters that don’t do anything.
  34. If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications.
  35. Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application’s performance.
  36. Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
  37. If you use the CONVERT function to convert a value to a variable length data type such as VARCHAR, always specify the length of the variable data type. If you do not, SQL Server assumes a default length of 30.
  38. If you are creating a column that you know will be subject to many sorts, consider making the column integer-based and not character-based. This is because SQL Server can sort integer data much faster than character data.
  39. Don’t use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server.
  40. Don’t return more data (both rows and columns) from SQL Server than you need to the client or middle-tier and then further reduce the data to the data you really need at the client or middle-tier. This wastes SQL Server resources and network bandwidth.
Conclusion
To tune our SQL queries, understanding our database does play the most important role. In SQL, typically each table column has an associated data type. Text, Integer, Varchar, Date, and more, are typically available types for developers to choose from. When writing SQL statements, make sure you choose the proper data type for the column. Sometimes it’s easier to break up subgroups into their own select statement. To write a query, we need to know about the actual need for the query and scope of the query also.

 

 

 

read more:
http://www.sql-datatools.com/2016/05/sql-queries-tuning-and-optimization.html

Quick Tricls: Column and block text selection using SSMS

There are several ways to select text as shown below, including the ability to select and edit columns.

Using SHIFT to Select Text

It is well known that using the SHIFT key you can perform normal text selection in SSMS.

If you put your cursor to the left of “dbo.DimEmployee” and hold the SHIFT key and then put your cursor at the end of “dbo.DimReseller” it will select the first three lines of code as shown below.

ssms shift select

Using SHIFT+ALT to Select Columns

If you would like to select columns or blocks then Microsoft SQL Server offers a solution for you. You can use the key shortcut SHIFT+ALT as described in the following steps. Please note that this feature works using SSMS for SQL Server 2008 and up.

Place your cursor to the left of “dbo.DimEmployee”, press SHIFT+ALT then click at the end of “dbo” in “dbo.DimProductCategory”. This will select columns or blocks in SQL Server Management Studio as shown below.

ssms shift alt select

Using SHIFT+ALT to Select Columns and Insert Text

In SSMS for SQL Server 2012 and up, you can also use SHIFT+ALT to insert text in this block mode.

First place the cursor in the first row where you would like to insert the text (to the left dbo.DimEmployee in our example). Press SHIFT+ALT and click in the last line where you would like to append this text (left of dbo.DimProductCategory). Now type “SELECT * FROM ” and this text will be inserted for each line as shown below.

If you would like to select columns or blocks then Microsoft SQL Server offers a solution for you

Using CTRL+SHIFT+END to Select Text

If you want to select all text from a starting point to the end you can use CTRL+SHIFT+END.

Put your cursor at the beginning point and press CTRL+SHIFT+END to select all text from that point to the end of the text as shown below.

ssms ctrl shift end to select

Using CTRL+SHIFT+HOME to Select Text

If you want to select all text from a starting point to the beginning you can use CTRL+SHIFT+HOME.

Put your cursor at the beginning point and press CTRL+SHIFT+HOME to select all text from that point to the beginning of the text as shown below.

ssms ctrl shift home select

Using CTRL+A to Select All Text

If you want to select all text you can use CTRL+A.

Just press CTRL+A anywhere in the query editor and this will select all text as shown below.

ssms ctrl a to select all text

 

Read more:
https://www.mssqltips.com/sql-server-tip-category/52/sql-server-management-studio/

 

SQL Server Data Types you Must Know

Why data types are important

  1. The data is stored in the database in a consistent and known format.
  2. Knowing the data type allows you to know which calculations and formulations you can use on the column.
  3. Data types affect storage. Some values take up more space when stored in one data type versus another.  Take our age tables above for example.
  4. Data types affect performance. The less time the database has to infer values or convert them the better.  “Is December, 32, 2015 a date?”

Commonly used SQL Server Data Types

There are over thirty different data types you can choose from when defining columns.  Many of these are set up for very specific jobs such as storing images, and others more suitable to general use.

Here is the data types you’ll most frequently encounter in your everyday use of SQL.  These are:

  • INT
  • VARCHAR, NVARCHAR
  • DATETIME
  • DECIMAL, FLOAT
  • BIT

INT – Integer Data Type

The integer data type is used to store whole numbers.  Examples include -23, 0, 5, and 10045.  Whole numbers don’t include decimal places.  Since SQL server uses a number of computer words to represent an integer there are maximum and minimum values which it can represent.  An INT datatype can store a value from -2,147,483,648 to 2,147,483,647.

Practical uses of the INT data type include using it to count values, store a person’s age, or use as an ID key to a table.

But INT wouldn’t be so good to keep track of a terabyte hard drive address space, as the INT data type only goes to 2 billion and we would need to track into the trillions.  For this you could use BIGINT.

The INT data type can be used in calculations.  Since DaysToManufacture is defined as INT we can easily calculate hours by multiplying it by 24:

SELECT Name,
       DaysToManufacture,
       DaysToManufacture * 24 as HoursToManufacture
FROM   Production.Product

Here you can see the results

Use of INT to perform calculations.

There are many operations and functions you can use with integers which we’ll cover once we dig into functions.

VARCHAR and NVARCHAR – Text Values

Both VARCHAR and NVARCHAR are used to store variable length text values.  “VARCHAR” stands for variable length character.

The number of characters to store in a VARCHAR or NVARCHAR are defined within the column.   For instance as you can see in the following column definition from the object explorer, the product name is defined to hold fifty characters.

VARCHAR definition shown in SQL Server Management Studio

What makes VARCHAR popular is that values less than fifty characters take less space.  Only enough space to hold the value is allocated.  This differs from the CHAR data type which always allocates the specified length, regardless of the length of the actual data stored.

The VARCHAR datatype can typically store a maximum of 8,000 characters.  The NVARCHAR datatype is used to store Unicode text.  Since UNICODE characters occupy twice the space, NVARCHAR columns can store a maximum of 4,000 characters.

The advantage NVARCHAR has over VARCHAR is it can store Unicode characters.  This makes it handy to store extended character sets like those used for languages such as Kanji.

If your database was designed prior to SQL 2008 you’ll most likely encounter VARCHAR; however, more modern databases or those global in nature tend to use NVARCHAR.

DATETIME – Date and Time

The DATETIME data type is used to store the date and time.  An example of a DATATIME value is

1968-10-23 1:45:37.123

This is the value for October 23rd, 1968 at 1:45 AM.  Actually the time is more precise than that.  The time is really 45 minutes, 37.123 seconds.

In many cases you just need to store the date.  In these cases, the time component is zeroed out.  Thus, November 5th, 1972 is

1972-11-05 00:00:00.000

A DATETIME can store dates from January 1, 1753, through December 31, 9999.  This makes the DATETIME good for recording dates in today’s world, but not so much in William Shakespeare’s.

As you get more familiar with the various SQL built-in functions you’ll be able to manipulate the data.  To give you a glimpse, we’ll use the YEAR function to count employees hired each year.  When given a DATETIME value, the YEAR function return the year.

The query we’ll use is

SELECT   YEAR(HireDate),
         Count(*)
FROM     HumanResources.Employee
GROUP BY YEAR(HireDate)
ORDER BY YEAR(HireDate)

And here are the results

Use YEAR on DATETIME data type

The benefit is the DATETIME type ensures the values are valid dates.  Once this is assured, we’re able to use a slew of functions to calculate the number of days between dates, the month of a date and so on.

We’ll explore these various functions in detail in another blog article.

DECIMAL and FLOAT – Decimal Points

As you may have guessed DECIMAL and FLOAT datatypes are used to work with decimal values such as 10.3.

I lumped DECIMAL and FLOAT into the same category, since they both can handle values with decimal points; however, they both do so differently:

If you need precise values, such as when working with financial or accounting data, then use DECIMAL.  The reason is the DECIMAL datatype allows you to define the number of decimal points to maintain.

DECIMAL

DECIMAL data types are defined by precision and scale.  The precision determine the number of total digits to store; whereas, scale determine the number of digits to the right of the decimal point.

A DECIMAL datatype is specified as DECIMAL(precision,scale).

A DECIMAL datatype can be no more than 38 digits.  The precision and scale must adhere to the following relation

0 <= scale <= precision <= 38 digits

In the Production.Product table, the weight column’s datatype is defined as DECIMAL(8,2).  The first digit is the precision, the second the scale.

Weight is defined to have eight total digits, two of them to the right of the decimal place.  We’ll the following sample query to illustrate how this data type.

SELECT   DISTINCT Weight
FROM     Production.Product
WHERE    Weight BETWEEN 29.00 and 189.00
ORDER BY Weight DESC

The results follow:

Using DECIMAL data type to display results

FLOAT

Where DECIMAL datatypes are great for exact numbers, FLOATs are really good for long numeric values.  Though a DECIMAL value can have 38 digits total, in many engineering and scientific application this is inadequate.  For scientific applications where extreme numeric values are encountered, FLOAT rises to the top!

FLOATS have a range from – 1.79E+308 to 1.79E+308.  That means the largest value can be 179 followed by 306 zeros (large indeed!).

Because of the way float data is stored in the computer (see IEEE 754 floating point specification) the number stored is an extremely close approximation.  For many application this is good enough.

Because of the approximate behavior, avoid using <> and = operators in the WHERE clause.  Many a DBA has been burned by the statement.

WHERE mass = 2.5

Their expectation are dashed when mass is supposed to equal 2.5, but really, in the computer it is stored as 2.499999999999999; therefore, not equal to 2.500000000000000!

That is the nature of floating points and computers.  You and I see 2.499999999999999 and think for practical purposes it is 2.5, but to the computer, were off just a bit.  J

BIT – Boolean or Yes/No values

There’s times when you just need to store whether something “is” or “is not.”  For instance, whether an employee is active.  It is in these cases that the BIT datatype comes to its own.  This data type be one of three states: 1, 0, or NULL.

The value of 1 signifies TRUE and 0 FALSE.

In this query we’re listing all salaried position job titles

SELECT DISTINCT JobTitle
FROM   HumanResources.Employee
WHERE  SalariedFlag = 1

Here are the results

Using the BIT data type in Searches

We could have also use ‘True’ instead of 1.  Here is the same example using ‘True’

SELECT DISTINCT JobTitle
FROM   HumanResources.Employee
WHERE  SalariedFlag = 'True'

And the opposite using ‘False’

SELECT DISTINCT JobTitle
FROM   HumanResources.Employee
WHERE  SalariedFlag = 'False'

I tend to stick with 1 and 0, since it is easier to type, but if you’re going for readability, then ‘True’ and ‘False’ are good options.

Read more:
https://www.essentialsql.com/commonly-used-sql-server-data-types/

Data Modelling Examples

Exercise 1

An art researcher has asked you to design a database to record details of ARTISTS and the MUSEUM in which their PAINTINGS are displayed. For each painting, the researcher wants to know the size of the canvas, year painted, title, and style. The nationality, date of birth and death of each artist must be recorded. For each museum, record details of its location and speciality, if it has one.

Picture2

Exercise 2

The president of a BOOK wholesaler has told you that she wants information about the PUBLISHERS, AUTHORS and books that she carries. She knows that publishers publish many books and books can be written by one or more authors (obviously authors can also write more than one book). In order to track accurate information about the publisher, the wholesaler wants to know the publishers’ name, their phone number and their address. They also want to know who to call when they need to contact the publisher. The wholesaler also wants to know the book titles, the ISBN number, the date it was published, the genre, and the author of the book. She also wants some basic information about the authors.

Picture3

 

Using SSIS to Generate Surrogate Keys

By Richard Hiers,2017

In the following post, I’ll be explaining how to use ETL and SSIS to generate surrogate keys for dimension tables in a data warehouse. Although most dimensional data possesses a natural key or a business key in the source tables, Kimball and everyone else I’ve read strongly recommend using a surrogate key for dimensional tables. Our prospects, for example, all have a unique ProspectID, but it is recommended that the ETL system generate a unique identifying key be for each record. This surrogate key then becomes the foreign key in the Fact tables.

You can read more about the advantages of using a surrogate key in Kimball’s The Data Warehouse Toolkit, 3rd Ed. starting on page 98, Corr’s Agile Data Warehouse Design starting on page 141, and Adamson’s Star Schema on page 30 and following. But in brief, some of the key benefits of using a surrogate key are:

  1. Isolates the data warehouse from any instability in the source OLTP system keys where business keys may possibly be deleted, recycled, and modified in ways that could be detrimental to the functioning of the data warehouse.
  2. Facilitates pulling dimensional attributes from disparate source systems in which each has its own independent business keys.
  3. Supports slowly changing dimensions (SCD) where there can be multiple versions of any given entity in the dimension.

So the question becomes, how do you create and maintain these surrogate keys?

The Need: An SSIS Surrogate Key Generator

It would be possible to design the dimension table with an auto-incrementing identity key, but at least for performance reasons, that doesn’t seem to be the preferred mechanism. From Kimball starting on page 469: “For improved efficiency, consider having the ETL tool generate and maintain the surrogate keys. Avoid the temptation of concatenating the operational key of the source system and a date/time stamp.”  OK, so how do I accomplish that?

The Challenge: An SSIS Surrogate Key Generator

Given this and similar recommendations found on the web, and given the maturity of Microsoft’s SSIS ETL tools, I would expect this to be pretty straight forward. I would expect that I could just drag the Surrogate Key Creator transformation into my Data Flow and go on my merry way. But its not that straight forward. The tool doesn’t exist and even in Knight’s Professional Microsoft SQL Server 2014 Integration Services, the topic is not addressed that I can find, though I haven’t yet read it from stem to stern.

Given the complexity of some of the solutions I found on the web, it is really surprising to me that Microsoft hasn’t made this simpler to accomplish. But that seems to be where we are. With the help of some instructions on the web, I was eventually able to craft an ETL surrogate key generator independent of my data warehouse RDBMS. In the examples below I am creating a generator for my Expected Entrance Term dimension table.

Using a Script Component

I was able to create a working surrogate key (SK) generator first referring to the concepts and script found in a 2011 post from Joost van Rossum’s SSIS blog. I’ll briefly outline the solution since I found his instructions to be incomplete (He seems to have assumed a higher level of familiarity with the various components and thus omitted some details). After that, I’ll outline a simpler method which does not require the use of a script.

  1. Start by querying the destination dimension table for the last SK used. On the Control Flow tab in Microsoft Visual Studio, drag an Execute SQL Task to the canvas. Double click to edit.
  2. ResultSet set to “Single row”
  3. Define your OLE DB Connection
  4. Insert your SQL Statement (modify as needed):
1
2
SELECT MAX([TERM_KEY]) AS MaxSK
FROM SemesterTerms

SSIS sql task control flow

  1. On the Result Set tab, Click the Add button to add a Variable. Leave the Variable Name as “User::Counter” and rename the Result Name to whatever you prefer.

result set variable

  1. Click OK to save your changes
  2. Add a Data Flow Task, connect the Execute SQL Task to it, and double-click to edit
  3. In the Data Flow, add an OLE DB Source to the canvas and configure it to select the records you need to populate your dimension (from the Term table in my case)
  4. Connect that to a Script Component and click to edit it (choose “Transformation” for the type when you drop it on the canvas)
  5. Go to the Inputs and Outputs tab and click Add Column and give it a Name and DataType (I used TERMKEY and four-byte signed integer [DT_I4]). Note, name it something other than “RowId”, you will need to modify the code below)
  6. Make sure the ScriptLanguage property is set to C# and click the Edit Script… button to open the script editor
  7. Replace the place-holder code with the C# code from Rossum’s blog post (step 7) referenced above. (make sure you select the C# code and not the VB.NET code). NOTE: If you did not name your new Column above “RowId” you will need to change the modify row 59 in the sample code to reflect your Column name.
  8. Save the script and close that window and then click OK to save and close the Script Component editor
  9. Add an OLE DB Destination and configure it to map the source columns plus the new key column to the appropriate destination columns. The data flow will look like this:

surrogate key generator with c# script

  1. Once you have saved this project, click the Start button to take it for a spin. On the first run you should see all of your dimension records added to the data warehouse with a sequential keys starting at 1.
  2. Start the project a second time, and you will see you will now have twice the records, but the surrogate keys for the second load will start where the previous load ended. In my case the first load ended with and SK of 280 and the second load started with 281.

Nice! Of course in a production ETL system, the second load would be only include any new rows (and assigned the next higher keys) or changes which would be handled differently based on the type of attribute that changed (are we tracking historical values, or just the current value?).

A Simpler Surrogate Key Generator

While attempting to fill in the gaps of the instructions I found for the script method above, I came across a simpler method that doesn’t require a script on Sam Vanga’s blog (scroll down to his “With ROW_NUMBER()” method). It utilizes the same Execute SQL Task, but makes utilizes the results in the initial OLE DB Source transformation tool rather than in a Script Component. Here are the steps I took:

First, perform steps 1-7 from the previous method.

  1. In the Data Flow, add an OLE DB Source to the canvas and configure it to select the records you need to populate your dimension (from the Term table in my case). However, in addition to the query you used in the previous method, you will add another column to the SELECT to make use of the variable returned from the Execute SQL Task.
1
2
3
4
5
6
7
SELECT ROW_NUMBER() OVER(ORDER BY TermCalendarID) + ? AS TERM_KEY
,TextTerm AS TERM_DESCRIPTION
,Term AS TERM_CODE
,TermCalendarID AS TERM_ID
,TermStartDate AS TERM_START_DATE
,TermEndDate AS TERM_END_DATE
FROM SemesterTerms
  1. As in the query above, add the statement “ROW_NUMBER() OVER(ORDER BY <column_name>) + ? AS TERM_KEY”. Without the “+ ?” , this query orders the rows by the column indicated and then assigns sequential numbers, starting with 1, to each row. In my case, I had 280 rows in the result, numbered sequentially from 1-280.
  2. The “?” (question mark) is the parameter placeholder in the OLE DB Source for SSIS. After adding the question mark, click on the Parameters… button to the right of the query window. NOTE, if you don’t have a “?” in your query, you will get the message: “The SQL statement does not contain any parameters.”
  3. Select the variable you defined in step 5 (User::Counter in my case) and click OK to save.

SSIS OLE DB source editor parameters

What this does then (“ROW_NUMBER() OVER(ORDER BY <column_name>) + ? AS TERM_KEY”), is give each row a sequential row number, and then add to each the variable number which is the MAX surrogate key value of the destination dimension table. On initial load, the TERM dimension table is empty, so “0” is added to each row number. The destination ends up with 280 rows, with TERM-KEYs numbered sequentially 1-280. On the next subsequent load, each new row number will have 280 added to it. So if on the next load 5 new terms had been added to the source table, the first would be ROW_NUMBER 1, plus 280 which would result in a new surrogate key of 281, and so on.

  1. Add an OLE DB Destination and configure it to map the source columns plus the new key column to the appropriate destination columns. OK to save.

This simpler surrogate key generator populates the destination dimension just as the previous one did. I haven’t compared the performance of these two methods, but Sam Vanga claims that the ROW_NUMBER() method is twice as fast as the script option (he is using a different script than I though).

Conclusion

Kimball specified that the surrogate key generator should “independently generate surrogate keys for every dimension; it should be independent of database instance and able to serve distributed clients… consider having the ETL tool generate and maintain the surrogate keys” (pp. 469-470). Do these two methods meet that specification? They certainly do not rely on database triggers or on an identity column. I’m not sure what he is thinking of with “distributed clients” (Can anyone provide an insight? Please leave a comment below.) Are these methods examples of the ETL tool “maintaining” the keys? Kimball seems to be saying the generator is a thing. When I started looking around for an SSIS key generator, I was surprised to find only methods. I’ll continue to look for other options, but for now, this seems to be the best option. Let me know your thoughts and experience.

 

read more:
https://diybidwguy.com/2017/10/20/using-ssis-generate-surrogate-keys/

How to Create a Stored Procedures with Dynamic Search & Paging (Pagination)

I used this article as a reference for my very first stored procedure for my internship project so I would like to share it with you if you are looking for some ideas to start a  stored procedure with search and pagination.


How can we apply paging (or pagination) on the returned recordsets from a Stored Procedures?

Pagination is required when a lot of records are returned from an SP and the webpage becomes very heavy to load it, like more than 1k records. And it also becomes difficult for a user to handle that much records and view on a single screen.

Many webpages displaying records has First, Previous, Next & Last buttons on top and bottom of the data grid. So to make the buttons function we’ve to implement the paging functionality on SPs itself.

Let’s check with a sample code using [Person].[Contact] Table of [AdventureWorks] Database:

CREATE PROCEDURE USP_GET_Contacts_DynSearch_Paging
(
    -- Pagination
    @PageNbr            INT = 1,
    @PageSize           INT = 10,
    -- Optional Filters for Dynamic Search
    @ContactID          INT = NULL,
    @FirstName          NVARCHAR(50) = NULL,
    @LastName           NVARCHAR(50) = NULL,
    @EmailAddress       NVARCHAR(50) = NULL,
    @EmailPromotion     INT = NULL,
    @Phone              NVARCHAR(25) = NULL
)
AS
BEGIN
    DECLARE
        @lContactID         INT,
        @lFirstName         NVARCHAR(50),
        @lLastName          NVARCHAR(50),
        @lEmailAddress      NVARCHAR(50),
        @lEmailPromotion    INT,
        @lPhone             NVARCHAR(25)
    
    DECLARE
        @lPageNbr   INT,
        @lPageSize  INT,
        @lFirstRec  INT,
        @lLastRec   INT,
        @lTotalRows INT
 
    SET @lContactID         = @ContactID
    SET @lFirstName         = LTRIM(RTRIM(@FirstName))
    SET @lLastName          = LTRIM(RTRIM(@LastName))
    SET @lEmailAddress      = LTRIM(RTRIM(@EmailAddress))
    SET @lEmailPromotion    = @EmailPromotion
    SET @lPhone             = LTRIM(RTRIM(@Phone))
 
    SET @lPageNbr   = @PageNbr
    SET @lPageSize  = @PageSize
    
    SET @lFirstRec  = ( @lPageNbr - 1 ) * @lPageSize
    SET @lLastRec   = ( @lPageNbr * @lPageSize + 1 )
    SET @lTotalRows = @lFirstRec - @lLastRec + 1
 
    ; WITH CTE_Results
    AS (
        SELECT ROW_NUMBER() OVER (ORDER BY ContactID) AS ROWNUM,
            ContactID,
            Title,
            FirstName,
            MiddleName,
            LastName,
            Suffix,
            EmailAddress,
            EmailPromotion,
            Phone
        FROM Person.Contact
        WHERE
            (@lContactID IS NULL OR ContactID = @lContactID)
        AND (@lFirstName IS NULL OR FirstName LIKE '%' + @lFirstName + '%')
        AND (@lLastName IS NULL OR LastName LIKE '%' + @lLastName + '%')
        AND (@lEmailAddress IS NULL OR EmailAddress LIKE '%' + @lEmailAddress + '%')
        AND (@lEmailPromotion IS NULL OR EmailPromotion = @lEmailPromotion)
        AND (@lPhone IS NULL OR Phone = @lPhone)
    )
    SELECT
        ContactID,
        Title,
        FirstName,
        MiddleName,
        LastName,
        Suffix,
        EmailAddress,
        EmailPromotion,
        Phone
    FROM CTE_Results AS CPC
    WHERE
        ROWNUM > @lFirstRec
    AND ROWNUM < @lLastRec
    ORDER BY ROWNUM ASC
 
END
GO

let’s test this SP:

-- No parameters provided, fetch first 10 default records:
EXEC USP_GET_Contacts_DynSearch_Paging
 
-- On providing @PageSize=20, will fetch 20 records:
EXEC USP_GET_Contacts_DynSearch_Paging @PageSize=20
 
-- On providing @PageNbr=2, @PageSize=10, will display second page, ContactID starting from 11 to 20:
EXEC USP_GET_Contacts_DynSearch_Paging @PageNbr=2, @PageSize=10
 
-- On providing @PageNbr=1, @PageSize=50, @FirstName = 'Sam', it will search FurstName like Sam and will fetch first 50 records:
EXEC USP_GET_Contacts_DynSearch_Paging @PageNbr=1, @PageSize=50, @FirstName = 'Sam'
Retrived from :

https://sqlwithmanoj.com/2011/12/30/creating-stored-procedures-with-dynamic-search-paging-pagination/