I have a question. This is more of a question about how Pentaho works, before I endeavor out and build some new JOBs and processes to help my problem. ANY HELP IS APPRECIATED.
I have a JOB that grabs data, sends that data to results, and then passes those parameters/variables via "execute for each row" to another JOB that executes a REST API query for each row, retrieving the results and then writing those results down. Everything works fine, no errors and the data is accurate. The way my JOB works right now, I send one "line" result at a time to the job, it spits out the results, then executes the next "line", the loop taking as long as the approximate execution of the query, 200-300 milliseconds per loop. I have measured the speed of the REST API query to average about 200 to 300 milliseconds, and since the source of the API is not under my control, I can not change or impact my performance of the API. The destination of this API can handle lots of queries per second, in fact it handles 1400-3600 per second right now, but the results per query are always being returned in 200 to 300 milliseconds.
SO.. my 2k line source took 27 minutes. Not good. since i really need to process about 1milllion lines a day. This is a question of Pentaho design, and how to deal with this problem and try to speed up my process.
1st thing I was thinking is I could actually BUILD more JOBS with the same capabilities, and try to split up the source to be able to query more times per second. If I build 2 JOBs, split the source, then execute across the two JOBs, I'd cut my time in half. etc.
What I really want to be able to do is keep that one unique JOB that works fine, but execute that same exact JOB as fast as I can via the source to get more efficiency, but I don't know if can I query the same JOB via multiple sources at the same time and have data continuity.
All of this to ask, can I execute the same JOB with passed through parameters/variables, multiple times/at the same time, via multiple sources, and not have a data integrity/loss issue? If I can't, how have others dealt with similar problems?