I need to transfer millions of rows from teradata to MSSQl server.
I have coded a customized java code for data transfer by embedding Pentaho kettle transformation APIs.
What I did:
I used tableinput meta to specify the select query to get data from teradata.
I created the tableoutput meta with details of the target database.
Connected these two steps using transHopMeta
Generated xml for the transformation
I followed: http://wiki.pentaho.com/display/EAI/...a+API+Examples
My problems:
1. While running the transformation,
I could see that we don't have to create the target table. When i print the sql string [sql=transmeta.getSQLStatementString()], I can see that Pentaho APIs are automatically creating the DDL statements dynamically. If the table doesn't exist it genrerates create table ddl, if it exists it creates alter table ddl etc.
BUT, if I run the transformations from xml file, the target tables are not getting created. It works only if the tables are already there in target db.
For the transformatiuon to work, I need to add the following code for database connection and stuff.
+++++++++++++
Database targetDatabase = new Database((LoggingObjectInterface)transMeta,transMeta.findDatabase(targetDatabaseName));
targetDatabase.connect();
targetDatabase.execStatements(sql);
+++++++++++
Since I have to run lots of queries simultaneously, each time connecting and disconnecting the db connections is a headache.
2. Currently, my target table contains the exact columns i specified in the select query of tableinput meta. I need some extra custom columns in my target table..like date, and an auto incrementing primary key column. How do I do that?
After some research I got some hints related to setvariables meta. But couldn't figure out how to do it programmatically, integrating with my two existing tableinput and tableoutput meta.
3. Since I need to run multiple queries, I need a restartability point. ie when a transformation fails or stuck, I need to start it again from where it got stuck.
I saw PDI 5 has checkpoints for this purpose. But all the forums talks about how to add checkpoints using PDI GUI.
Should I create a manual database table for logging check points and use setvariable step to perform the restartability option programmatically?
Everywhere I can find posts for how to do it using Pentaho kettle with the GUI.
But I need help on how to do it in row code using java APIs of PDI.
Any help will be appreciated :)
I have coded a customized java code for data transfer by embedding Pentaho kettle transformation APIs.
What I did:
I used tableinput meta to specify the select query to get data from teradata.
I created the tableoutput meta with details of the target database.
Connected these two steps using transHopMeta
Generated xml for the transformation
I followed: http://wiki.pentaho.com/display/EAI/...a+API+Examples
My problems:
1. While running the transformation,
I could see that we don't have to create the target table. When i print the sql string [sql=transmeta.getSQLStatementString()], I can see that Pentaho APIs are automatically creating the DDL statements dynamically. If the table doesn't exist it genrerates create table ddl, if it exists it creates alter table ddl etc.
BUT, if I run the transformations from xml file, the target tables are not getting created. It works only if the tables are already there in target db.
For the transformatiuon to work, I need to add the following code for database connection and stuff.
+++++++++++++
Database targetDatabase = new Database((LoggingObjectInterface)transMeta,transMeta.findDatabase(targetDatabaseName));
targetDatabase.connect();
targetDatabase.execStatements(sql);
+++++++++++
Since I have to run lots of queries simultaneously, each time connecting and disconnecting the db connections is a headache.
2. Currently, my target table contains the exact columns i specified in the select query of tableinput meta. I need some extra custom columns in my target table..like date, and an auto incrementing primary key column. How do I do that?
After some research I got some hints related to setvariables meta. But couldn't figure out how to do it programmatically, integrating with my two existing tableinput and tableoutput meta.
3. Since I need to run multiple queries, I need a restartability point. ie when a transformation fails or stuck, I need to start it again from where it got stuck.
I saw PDI 5 has checkpoints for this purpose. But all the forums talks about how to add checkpoints using PDI GUI.
Should I create a manual database table for logging check points and use setvariable step to perform the restartability option programmatically?
Everywhere I can find posts for how to do it using Pentaho kettle with the GUI.
But I need help on how to do it in row code using java APIs of PDI.
Any help will be appreciated :)