Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

Dynamically set the parallelity inside a job?

$
0
0
Hello,

i have built a "load registry" in our datawarehouse, so that every table is loaded only if all its prerequisite tables have been loaded successfully. The result is that there are levels:
1. tables with no prerequisites
2. tables with prerequisutes in level1
etc.

So in every level i have a lot of tables i can execute in parallel:
The PDI job has a start, then db connections are read from properties file and after that i have 10 parallel branches: all of which first make a select from database to get the maximum parallelity and MOD division by their order (usually we order by last load execution times) (so that branch 1 executes the loadings where ordinal_position / maximum parallelity = 1 branc 2 where = 2 etc). If the maximum allowed parallelity is for example 4 then only 4 first branches actually are doing something.).

This solution works, but is not beautiful:
1. Is there a nicer way to run the loads in parallel without having to draw the 10 dummy parallel steps, even if only for example 5 of them are actually used?
2. Is there a way to split one resultset in a job? or in any other way? So i do not have to make 10 queries to the database to discover that actually the maximum parallelity allowed is for example 5 for this level, and even if i get the maximum first then i still would like to make ony one query and split the resultset.

Br,
pxr

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>