Hi there--
Yes I mentioned this before. I thought it was something in my job entries/ transformations that had recursion, but now I'm thinking it's native to Pentaho itself.
I've seen several bug reports in Pentaho --- one even showed a 'dummy' loop --- literally ... it was a job that had a start step, then a dummy 1, then a dummy 2, back to the start step. It literally did nothing but bounce back and forth.
This very quickly (we're talking minutes) crashes.
The same case happens with me --- after 80 loops of whatever steps, the job crashes (heap space overflow). I even had a random dummy step in my transformation as a visual aid for the loop --- I took that out, it gets to about 120 loops and crashes now. That much memory overhead, that never gets cleared out, due to a dummy step?? What gives here?
Oddly enough I have a similar transformation that has never crashed (it's only ever needed 250 loops though) --- so it's possible the architecture of the job itself may slow down this "memory bloat" that gathers. I'm going to use a couple memory tools soon to see what I can figure out.
Anyone else ever experience this? The massive amount of loops is only needed for the initial data load. It basically checks if a page is the "last page" -- if not, is loads the next page.
After the initial load, it would only really loop 4-6 times each time it was run.
Only work-around I can see is either A). Manually put a maximum of 50 loops within the job. The initial data load (and any other big loads) would have to be run 6-7 times manually.
B). Put the looping task onto the operating system ... aka instead of Pentaho looping, maybe it can output a command line to run itself again if a certain condition is met at the end? Not sure if this is a good idea.
C). Somehow optimize the job to consume less memory/ give Pentaho more memory to work with. This is fine, but if an initial load, or reload, or any kind of large loops of data is ever needed, it's a problem waiting to happen.
Yes I mentioned this before. I thought it was something in my job entries/ transformations that had recursion, but now I'm thinking it's native to Pentaho itself.
I've seen several bug reports in Pentaho --- one even showed a 'dummy' loop --- literally ... it was a job that had a start step, then a dummy 1, then a dummy 2, back to the start step. It literally did nothing but bounce back and forth.
This very quickly (we're talking minutes) crashes.
The same case happens with me --- after 80 loops of whatever steps, the job crashes (heap space overflow). I even had a random dummy step in my transformation as a visual aid for the loop --- I took that out, it gets to about 120 loops and crashes now. That much memory overhead, that never gets cleared out, due to a dummy step?? What gives here?
Oddly enough I have a similar transformation that has never crashed (it's only ever needed 250 loops though) --- so it's possible the architecture of the job itself may slow down this "memory bloat" that gathers. I'm going to use a couple memory tools soon to see what I can figure out.
Anyone else ever experience this? The massive amount of loops is only needed for the initial data load. It basically checks if a page is the "last page" -- if not, is loads the next page.
After the initial load, it would only really loop 4-6 times each time it was run.
Only work-around I can see is either A). Manually put a maximum of 50 loops within the job. The initial data load (and any other big loads) would have to be run 6-7 times manually.
B). Put the looping task onto the operating system ... aka instead of Pentaho looping, maybe it can output a command line to run itself again if a certain condition is met at the end? Not sure if this is a good idea.
C). Somehow optimize the job to consume less memory/ give Pentaho more memory to work with. This is fine, but if an initial load, or reload, or any kind of large loops of data is ever needed, it's a problem waiting to happen.