Hello,
i have built a "load registry" in our datawarehouse, so that every table is loaded only if all its prerequisite tables have been loaded successfully. The result is that there are levels:
1. tables with no prerequisites
2. tables with prerequisutes in level1
etc.
So in every level i have a lot of tables i can execute in parallel:
The PDI job has a start, then db connections are read from properties file and after that i have 10 parallel branches: all of which first make a select from database to get the maximum parallelity and MOD division by their order (usually we order by last load execution times) (so that branch 1 executes the loadings where ordinal_position / maximum parallelity = 1 branc 2 where = 2 etc). If the maximum allowed parallelity is for example 4 then only 4 first branches actually are doing something.).
This solution works, but is not beautiful:
1. Is there a nicer way to run the loads in parallel without having to draw the 10 dummy parallel steps, even if only for example 5 of them are actually used?
2. Is there a way to split one resultset in a job? or in any other way? So i do not have to make 10 queries to the database to discover that actually the maximum parallelity allowed is for example 5 for this level, and even if i get the maximum first then i still would like to make ony one query and split the resultset.
Br,
pxr
i have built a "load registry" in our datawarehouse, so that every table is loaded only if all its prerequisite tables have been loaded successfully. The result is that there are levels:
1. tables with no prerequisites
2. tables with prerequisutes in level1
etc.
So in every level i have a lot of tables i can execute in parallel:
The PDI job has a start, then db connections are read from properties file and after that i have 10 parallel branches: all of which first make a select from database to get the maximum parallelity and MOD division by their order (usually we order by last load execution times) (so that branch 1 executes the loadings where ordinal_position / maximum parallelity = 1 branc 2 where = 2 etc). If the maximum allowed parallelity is for example 4 then only 4 first branches actually are doing something.).
This solution works, but is not beautiful:
1. Is there a nicer way to run the loads in parallel without having to draw the 10 dummy parallel steps, even if only for example 5 of them are actually used?
2. Is there a way to split one resultset in a job? or in any other way? So i do not have to make 10 queries to the database to discover that actually the maximum parallelity allowed is for example 5 for this level, and even if i get the maximum first then i still would like to make ony one query and split the resultset.
Br,
pxr