Web Services ETL Process in Integration Services 2012

  • Tutorial

Stock up for future use


Sometimes in the process of work, data from web services are needed, especially SOAP connections are almost standard today.

The ETL process (Extract - Transform - Load) is a term from Business Intelligence and describes the process of collecting and transforming data to create an analytical database (for example, a data warehouse).

The SOAP data exchange protocol and web services described by WSDL are common windows into the world of almost all ERP systems, many online portals and financial organizations.

I will try to describe the ETL process step by step using one of the most powerful tools in the class - MS Integration Services.

So, we will consider a test problem.

Task



It is necessary to collect data on exchange rates against the ruble at each date of last year and load them into a table for subsequent analysis. The Central Bank of Russia provides historical data - in the form of web services with a good description.
This seems to be the solution.

Create a project



First, create a project in Business Intelligence Development Studio (And, more recently, SQL Server Data Tools).

There is a Web-service-task element in the toolbox, and we will transfer it to the workspace.


In the HttpConnection line, create a new connection - select NewConnection
and set the URL http://www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx
By the way, you can go there - do not be shy.

WSDL - the file is available there, here it is www.cbr.ru/DailyInfoWebServ/DailyInfo.asmx?WSDL
Download and select it in the settings dialog.

That's it, go to the Input tab.

image

Here you can choose among the services and their methods - provided by the provider.
We need GetCourseOnDate - issuing a DataSet for the requested date.
We put the date value manually - in the Value field, a little later we’ll go back and automate this.
Output, we simply create the file by creating a new connection output - into the file output.xml.

image

Run the process - it went. The file is created - exchange rates look like exchange rates.
The format is strange: diffgr attributes, inline diagram. This is the XML output of the Dataset object.

In BI-studio there is an XML-Source connector - it gives a zero result when you try to set it on such a file. (Reads the circuit).

We put after the Web Service Task - Skript Task.

Workaround


Hidden text
   object rawConnection = Dts.Connections["output"].AcquireConnection(Dts.Transaction);
            DataSet CoursesFile = new DataSet();
           string filePath = (string)rawConnection;
           object rawConnection2 = Dts.Connections["output2"].AcquireConnection(Dts.Transaction);
           string filePath_Out = (string)rawConnection2;
            object rawConnection3 = Dts.Connections["output.xsd"].AcquireConnection(Dts.Transaction);
           string filePath_Schema = (string)rawConnection3;
            CoursesFile.ReadXml(@filePath);
            CoursesFile.AcceptChanges();
            CoursesFile.WriteXmlSchema(@filePath_Schema);
            CoursesFile.WriteXml(@filePath2);

The script in C #, you can use all the tools .net platform and 2012 VisualStudio.
I added to the beginning using System.Xml;
We turn to the connection collection - I created 2 more to save the circuit and for output.

The output is pure XML - thanks to acceptChanges ();

Reconsidering the beginning



It would be nice to take a range of dates - and get exchange rates for each, and then record, supplementing the courses with the date field, in the database.

First, create a temporary dimension, it will come in handy more than once - in every data warehouse.

I create with my variation a script spied on the network, here it is.

Hidden text
-- Delete time dimension if it already exists.
IF Exists(Select Name from sysobjects where name = 'Dim_Time')
BEGIN
    Drop Table Dim_Time
END
GO
-- Standard options for creating tables
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- Create your dimension table
-- Adjust to your own needs
Create Table dbo.Dim_Time
(
    Dateid int IDENTITY (1,1) PRIMARY KEY CLUSTERED,
    Date date,
    DateString varchar(10),
    Day int,
    DayofYear int,
    DayofWeek int,
    DayofWeekName varchar(10),
    Week int,
    Month int,
    MonthName varchar(10),
    Quarter int,
    Year int,
    IsWeekend bit,
    IsLeapYear bit
)
-- Declare and set variables for loop
Declare
@StartDate datetime,
@EndDate datetime,
@Date datetime
Set @StartDate = '2000/01/01'
Set @EndDate = '2020/12/31'
Set @Date = @StartDate
-- Loop through dates
WHILE @Date <=@EndDate
BEGIN
    -- Check for leap year
    DECLARE @IsLeapYear BIT
    IF ((Year(@Date) % 4 = 0) AND (Year(@Date) % 100 != 0 OR Year(@Date) % 400 = 0))
    BEGIN
        SELECT @IsLeapYear = 1
    END
    ELSE
    BEGIN
        SELECT @IsLeapYear = 0
    END
    -- Check for weekend
    DECLARE @IsWeekend BIT
    IF (DATEPART(dw, @Date) = 1 OR DATEPART(dw, @Date) = 7)
    BEGIN
        SELECT @IsWeekend = 1
    END
    ELSE
    BEGIN
        SELECT @IsWeekend = 0
    END
    -- Insert record in dimension table
    INSERT Into Dim_Time
    (
    [Date],
    [DateString],
    [Day],
    [DayofYear],
    [DayofWeek],
    [Dayofweekname],
    [Week],
    [Month],
    [MonthName],
    [Quarter],
    [Year],
    [IsWeekend],
    [IsLeapYear]
    )
    Values
    (
    @Date,
    CONVERT(varchar(10), @Date, 105), -- See links for 105 explanation
    Day(@Date),
    DATEPART(dy, @Date),
    DATEPART(dw, @Date),
    DATENAME(dw, @Date),
    DATEPART(wk, @Date),
    DATEPART(mm, @Date),
    DATENAME(mm, @Date),
    DATENAME(qq, @Date),
    Year(@Date),
    @IsWeekend,
    @IsLeapYear
    )
    -- Goto next day
    Set @Date = @Date + 1
END
GO


We will transfer the new element

image

image

TEST Base and OLE DB to the workspace using the localhost.TEST connection along the way.

How to make the process take every date from the result and pass it to WebServiceTask.

Using variables.

Variables


Right click on the workspace - Variables. Or the View menu - Other Windows - Variables. This is a very powerful tool, but we will mention it only in passing, without it SSIS can not even half declared.
Create a UDate variable of type System.Object, because we will output the result of the request.
Connect it to our SQL Task

image

Now we will add sequentially to the SQL-Task Foreach Loop Container.
This is a container (you can stick any sequence of actions there), it performs a foreach routine.
In our case, it will connect to web services and record exchange rates for each date from the UDate set.
That's how it is set up
image
image

The Date variable created by us, of type Date, will be available inside the container, so we will pass the current date for the record.

That's what we put in the container.

image

Let's see what DataFlow follows our script - everything is simple there. We read XML using XML - Source,
add the Date column - write to the table.
And so for each date.

image

In the Derived Column element - insert our Date variable - for writing to the database.

Here is the result
image

Also popular now: