Cubing IIS Logs

Using the data warehouse built in the <a title=”Data Warehousing IIS Logs” target=”_blank” href=””>previous post and SQL Server Analysis Services (SSAS) we can cube our log data.  The most compelling reason to create a data cube for the log data was to gain access to MDX (MultiDimensional eXpression). MDX comes with a great deal more built in functions than T-SQL and offers a clean way of looking at data without an unreadable mess of complex joins and unions.  I will get into MDX in other posts, for now I just want to demonstrate how easy it can be to quickly cube data in four steps:

  1. Create Analysis Services Project
  2. Create a Data Source
  3. Define a Data Source View
  4. Create Data Cube and Dimensions

Set-up

First things first, open up BIDS by going to start menu, Microsoft SQL Server 2008 and clicking on SQL Server Business Intelligence Development Studio (BIDS).  With BIDS now open create a new project by selecting from the FILE menu a new project (File -> New -> Project…) to get the following dialogue.

Select “Analysis Services Project” from the Templates window then move on to name the project and select a location for the solution directory to be created.  In my case, I have named the project IISLogCube and left the default path in the Location field – which defaults to the Visual Studio 2008\Projects folder.  Clicking OK will create your solution and project directory and deliver you into the new project.  Pin the Solution Explorer window to the right and look at project structure created for you.

The folders we are interested in for now are the first four. The good news is creating the items for these folders is largely automated and/or wizard driven. Let’s get started by creating a Data Source.

Create a Data Source

Right click on “Data Sources” and select “New Data Source” to being the Data Source Wizard (below).

Click “New…” to discover and select the server and database. I will be using the Logs database we created for the IIS Logs Data Warehouse.  Select the name of your server from the drop-down list under “Server name:”.  If you’re running local either type in (localhost) or click Refresh to discover local server names.  Configure whatever server credentials are necessary under “Log on to the server”.  Once the server configurations are complete move down to “Connect to a database” and select the database from the drop-down list.  Optionally, you can click “Test Connection” to make sure all the settings are good to go before clicking OK.

Once you’re connection is configured you’ll notice the new Data Connection option back on the Wizard window. Select the Data Connection you just configured and select “Next”.

The Impersonation Information is needed to connect to the data source.  If you are unsure what to select here, choose Inherit (named Default in BIDS 2005).  You will know if you chose correctly when you process the cube, which we’ll do later.  We can come back and change this option at any time, so don’t fret too much about this option for now.

From here, click “Next”, name your Data Source then select “Finish”.  You’ll now see your Data Source in the Solution Explorer under the “Data Sources” folder. Now we’ll need a Data Source View.

Define a Data Source View

Right click on Data Source Views and select “New Data Source View…”.  Select Data Source from the list of “Relational Data Sources” and select “Next”.

Name matching is meant to assist with how relationships are discovered in your data source.  It is helpful if the data source matches one of the conventions here but it is not required. The first option applies to this sample data source.  Select the appropriate option for your data source then click “Next”.

Now we’re getting to some of the more interesting configurations.  Here you’ll choose the tables and views from the database in your data source that you want available to your data cube.  If you select more tables than what you think you need, that’s OK as you don’t have to use them all in the Cube. That being said, its not a good idea to choose ALL available objects (unless you really need to) as that may be a considerable amount of tables and views and may slow down processing of your cube.  At a minimum, you’ll need to bring in a Fact table and a few tables that will act as dimensions. Click “Next”, name your view and click “Finish”.

Now that you have created your view, you may need to clean it up.  As you will notice, the relationships were not detected for Companies and Clock table as related to the LogFact table.  We can manually clean up these relationships now – and we’ll want to do so to get the most of our Cube creation step.

This is where some of the name matching we configured in the Data Source View wizard came into play.  You’ll see the relationship was detected and set-up for the UrlID between the LogFact and Urls table.  Other relationships were not detected, however, because the primary keys were not properly identified in our data source (intentionally, for this demonstration purpose).

We can set the record straight, so to speak, by right clicking on the primary key in each table here and selecting “Set Logical Primary Key”.  Take Users for example.  Right click on “UserID” on the Users table and choose “Set Logical Primary Key”.  You’ll see the key icon appear next to the “UserID” field.  Next, we’ll want to draw the relationship between the LogFact table and the Users table.  To do so let’s click on the “UserID” field on the LogFact table and choose “New Relationship…”.  The left shows the Foreign Key and will automatically be populated since we initiated this new relationship by clicking on the “UserID” field on the LogFact table. Set the Primary Key (the right side) by selecting the primary key table “Users” from the drop-down list then the field from the drop-down list below.

After defining logical primary keys and setting up appropriate relationships we get the following corrected View.  Some of the relationships may not make total sense just yet, but we’ll see why we have them when we review our relationships later.

Create Data Cube and Dimensions

Creating the Cube and Dimensions is automated for us by completing the Cube Wizard.  Right click on “Cubes” folder in the Solution Explorer and select “New Cube”.  As you’ve likely noticed by now the steps we have completed up until now have made things pretty simple and automated.  By selecting “Use existing tables” in the Cube Wizard we are able to create a cube from the tables we selected and set-up in the Data Source View step.

After selecting “Next” we are prompted to identify the table that defines the measurements, most commonly known as the fact table.  If you hadn’t followed the traditional “Fact Table” approach, then you can select “Suggest” and the wizard will attempt to determine which table(s) are potentially a fact table.  More than one Fact table may be selected, which is appropriate for Fact Constellation Schemas.

After identifying the fact table(s) select “Next”.  On the following screen you’ll identify (or confirm) the fields that are checked are measurements you want for the cube.  Selecting “Next” again completes the measures group portion of the Cube Wizard.

Following the measure groups portion, you’ll need to select the Dimensions you want included in the cube.  The wizard will provide a list of tables that are already selected, assuming that you want them as dimensions.  Selecting “Next” through the remainder of the wizard confirms the selection of Dimensions.  Lastly, provide a name for the Cube and click “Finish”.

Completing the Cube Wizard automatically creates the cube and dimensions.  The resulting Start Schema is produced for our IIS Logs.

Also, note the solution explorer now has Cubes and Dimensions represented in their respective folders.

Now that our cube is complete, we want to build and deploy.  To do so, right click on the Analysis Services Database “IISLogCube” and select “Process…”.  You’ll be prompted for confirmation that you want to build and deploy the project.  Select “Yes”. If you are not deploying to server (localhost) then you will likely receive an error since the default Server for the Analysis Services Database is localhost.

Right click on “IISLogsCube” and selection “Properties”  On the resulting window select “Deployment” and under section “Target” change the “Server” from localhost to the appropriate server name that you will be deploying this database to.

If there are no errors during deployment then you’ll be prompted with the pre-process information.  Clicking “Run…” will begin the processing of the cube. Clicking “Run…” shows the process progress window.  Once the processing is complete you can close the progress window and the “Process Database” window.

At this point, the cube has been successfully built and deployed.  While this has been a somewhat long description of the above three steps its largely due to a thorough look at the Wizards that help this process along.  It doesn’t take long before cubing data is a fast and fluid process – the Wizards in BIDS certainly does streamline the process.

There is a great deal of configuration and tweaking that may be applied to the cube.  I’ll go over thee in future posts, including creating Hierarchies.

 

Advertisements

One Response to “Cubing IIS Logs”

  1. I really like this article. I am following if for my current project. Excellent. I only wish the database DDL scripts were included

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: