Web Services ETL Process in Integration Services 2012
Stock up for future use
Sometimes in the process of work, data from web services are needed, especially SOAP connections are almost standard today.
The ETL process (Extract - Transform - Load) is a term from Business Intelligence and describes the process of collecting and transforming data to create an analytical database (for example, a data warehouse).
The SOAP data exchange protocol and web services described by WSDL are common windows into the world of almost all ERP systems, many online portals and financial organizations.
I will try to describe the ETL process step by step using one of the most powerful tools in the class - MS Integration Services.
So, we will consider a test problem.
It is necessary to collect data on exchange rates against the ruble at each date of last year and load them into a table for subsequent analysis. The Central Bank of Russia provides historical data - in the form of web services with a good description.
This seems to be the solution.
Create a project
First, create a project in Business Intelligence Development Studio (And, more recently, SQL Server Data Tools).
There is a Web-service-task element in the toolbox, and we will transfer it to the workspace.
In the HttpConnection line, create a new connection - select NewConnection
and set the URL http://www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx
By the way, you can go there - do not be shy.
WSDL - the file is available there, here it is www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx?WSDL
Download and select it in the settings dialog.
That's it, go to the Input tab.
Here you can choose among the services and their methods - provided by the provider.
We need GetCourseOnDate - issuing a DataSet for the requested date.
We put the date value manually - in the Value field, a little later we’ll go back and automate this.
Output, we simply create the file by creating a new connection output - into the file output.xml.
Run the process - it went. The file is created - exchange rates look like exchange rates.
The format is strange: diffgr attributes, inline diagram. This is the XML output of the Dataset object.
In BI-studio there is an XML-Source connector - it gives a zero result when you try to set it on such a file. (Reads the circuit).
We put after the Web Service Task - Skript Task.
object rawConnection = Dts.Connections["output"].AcquireConnection(Dts.Transaction); DataSet CoursesFile = new DataSet(); string filePath = (string)rawConnection; object rawConnection2 = Dts.Connections["output2"].AcquireConnection(Dts.Transaction); string filePath_Out = (string)rawConnection2; object rawConnection3 = Dts.Connections["output.xsd"].AcquireConnection(Dts.Transaction); string filePath_Schema = (string)rawConnection3; CoursesFile.ReadXml(@filePath); CoursesFile.AcceptChanges(); CoursesFile.WriteXmlSchema(@filePath_Schema); CoursesFile.WriteXml(@filePath2);
The script in C #, you can use all the tools .net platform and 2012 VisualStudio.
I added to the beginning using System.Xml;
We turn to the connection collection - I created 2 more to save the circuit and for output.
The output is pure XML - thanks to acceptChanges ();
Reconsidering the beginning
It would be nice to take a range of dates - and get exchange rates for each, and then record, supplementing the courses with the date field, in the database.
First, create a temporary dimension, it will come in handy more than once - in every data warehouse.
I create with my variation a script spied on the network, here it is.
-- Delete time dimension if it already exists. IF Exists(Select Name from sysobjects where name = 'Dim_Time') BEGIN Drop Table Dim_Time END GO -- Standard options for creating tables SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO -- Create your dimension table -- Adjust to your own needs Create Table dbo.Dim_Time ( Dateid int IDENTITY (1,1) PRIMARY KEY CLUSTERED, Date date, DateString varchar(10), Day int, DayofYear int, DayofWeek int, DayofWeekName varchar(10), Week int, Month int, MonthName varchar(10), Quarter int, Year int, IsWeekend bit, IsLeapYear bit ) -- Declare and set variables for loop Declare @StartDate datetime, @EndDate datetime, @Date datetime Set @StartDate = '2000/01/01' Set @EndDate = '2020/12/31' Set @Date = @StartDate -- Loop through dates WHILE @Date <=@EndDate BEGIN -- Check for leap year DECLARE @IsLeapYear BIT IF ((Year(@Date) % 4 = 0) AND (Year(@Date) % 100 != 0 OR Year(@Date) % 400 = 0)) BEGIN SELECT @IsLeapYear = 1 END ELSE BEGIN SELECT @IsLeapYear = 0 END -- Check for weekend DECLARE @IsWeekend BIT IF (DATEPART(dw, @Date) = 1 OR DATEPART(dw, @Date) = 7) BEGIN SELECT @IsWeekend = 1 END ELSE BEGIN SELECT @IsWeekend = 0 END -- Insert record in dimension table INSERT Into Dim_Time ( [Date], [DateString], [Day], [DayofYear], [DayofWeek], [Dayofweekname], [Week], [Month], [MonthName], [Quarter], [Year], [IsWeekend], [IsLeapYear] ) Values ( @Date, CONVERT(varchar(10), @Date, 105), -- See links for 105 explanation Day(@Date), DATEPART(dy, @Date), DATEPART(dw, @Date), DATENAME(dw, @Date), DATEPART(wk, @Date), DATEPART(mm, @Date), DATENAME(mm, @Date), DATENAME(qq, @Date), Year(@Date), @IsWeekend, @IsLeapYear ) -- Goto next day Set @Date = @Date + 1 END GO
We will transfer the new element
TEST Base and OLE DB to the workspace using the localhost.TEST connection along the way.
How to make the process take every date from the result and pass it to WebServiceTask.
Right click on the workspace - Variables. Or the View menu - Other Windows - Variables. This is a very powerful tool, but we will mention it only in passing, without it SSIS can not even half declared.
Create a UDate variable of type System.Object, because we will output the result of the request.
Connect it to our SQL Task
Now we will add sequentially to the SQL-Task Foreach Loop Container.
This is a container (you can stick any sequence of actions there), it performs a foreach routine.
In our case, it will connect to web services and record exchange rates for each date from the UDate set.
That's how it is set up
The Date variable created by us, of type Date, will be available inside the container, so we will pass the current date for the record.
That's what we put in the container.
Let's see what DataFlow follows our script - everything is simple there. We read XML using XML - Source,
add the Date column - write to the table.
And so for each date.
In the Derived Column element - insert our Date variable - for writing to the database.
Here is the result