I have a transformation which loads data in the range of 60 - 80 million rows.
When run individually it gives a throughput of more than 7000 rows/sec. The transformation has to be run for about 70 source tables each holding data in the range of 60 - 80 million rows. When run in parallel the throughput falls to less than 1000 rows/sec ( 6 transformations were run in parallel ).
Below are the settings that are currently used:
- In database connection defaultRowPrefetch = 200
- In transformation properties 'Nr of rows in rowset' is 50000
- In pan.sh JAVAMAXMEM="2048"
- In server/data-integration-server/start-pentaho.sh, CATALINA_OPTS="-Xmx2048m
- The JDBC driver used is ojdbc6.jar
There are no indexes on the target table. The database is Oracle. The server has 4 cores.
Please advise on what else can be done to increase performance.
When run individually it gives a throughput of more than 7000 rows/sec. The transformation has to be run for about 70 source tables each holding data in the range of 60 - 80 million rows. When run in parallel the throughput falls to less than 1000 rows/sec ( 6 transformations were run in parallel ).
Below are the settings that are currently used:
- In database connection defaultRowPrefetch = 200
- In transformation properties 'Nr of rows in rowset' is 50000
- In pan.sh JAVAMAXMEM="2048"
- In server/data-integration-server/start-pentaho.sh, CATALINA_OPTS="-Xmx2048m
- The JDBC driver used is ojdbc6.jar
There are no indexes on the target table. The database is Oracle. The server has 4 cores.
Please advise on what else can be done to increase performance.