Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

Transform multiple, independant tables into a star schema

$
0
0
Hello all!

I have a problem which I will describe via an example. Lets assume we have four source data tables like this:
sourcedata.jpg

We want to use BI on this data to answer the following questions:
  • Which car model drove the most hrs?
  • Which error code appeard most in 2012?
  • Which error code was the most common for car xy?
  • ...


So my idea was to transform the tables into a starmodel (should be the adequate & easiest way to get things going right?) which should look like this:
starmodel.jpg


To transform the data, I'm using Kettle. Since I'm new to the whole process, I need some help. I tried it the following way:

  1. I used a Excel Input to load the data. I also uniformed some things like dates in that step.
  2. For each Loaded Excelfile I used several "Combination lookup/update" Steps to put the fields in the dimension tables.
  3. What to do now? How do I merge the Dimensions and Facts? I tried to do a Merge Join on the Streams by the Dimension Keys but failed because it was much to slow. I think this is not the proper solution. Here you can see what I build by now :
    etl.jpg
  4. The Process of the lookup/updateing is very very slow and takes several minutes for the four documents. The biggest one has 800.000 records. Will the process be faster if the records came from a DB?


Thanks for your help in advance!
Attached Images

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>