I've researched these forums extensively and tried seemingly every option mentioned, but I am unable to find a way handle this use case. Any help would be greatly appreciated. Imagine a file with two rows:
WidgetName,Color,Weight
-------------------------
WidgetA,Blue,<empty>
WidgetA,<empty>,10
I want to process this data and put it in a Widget table in an Oracle database, with schema:
WidgetName,Color,Weight
When processing the above file, what I want is one row in the database at the end that looks like this:
WidgetA,Blue,10
In my transform, for each row I look up to see whether a row for WidgetName (using just WidgetName as a key) already exists. If it does, I do an update, if it doesn't I do an insert. Since Pentaho doesn't process rows sequentially by default, when I run the basic transform, two rows are inserted, presumably because the insert of the first row has not finished before the lookup for the second row happens. I know the overall logic in the transform is correct because if I split the two rows in my file into two separate files and process them one at a time, I get one row, as desired. This led me down the path of single threader.
In my single threader attempt, I have a single threader step that reads from the file and sends each row to the initial job one at a time. I know the single threader one-at-a-time part is working because I put in a delay step and this is reflected in the run time. However, the single thread still does not solve my problem. The exact same behavior exists. I can see by watching the database that no commit is executed until the entire transform is done. Even though the rows are processed one at a time, the second row still does not find the inserted data from the first row.
I have tried decreasing the commit size of all my insert steps to 1, trying to force autocommit to true, manually inserting a SQL step with an explicit commit call after each insert, changing the cache values, changing the initial transformation's "Make the transformation database transactional" settings, all to no avail.
Does anyone have any idea how I might accomplish this use case?
Thanks!
WidgetName,Color,Weight
-------------------------
WidgetA,Blue,<empty>
WidgetA,<empty>,10
I want to process this data and put it in a Widget table in an Oracle database, with schema:
WidgetName,Color,Weight
When processing the above file, what I want is one row in the database at the end that looks like this:
WidgetA,Blue,10
In my transform, for each row I look up to see whether a row for WidgetName (using just WidgetName as a key) already exists. If it does, I do an update, if it doesn't I do an insert. Since Pentaho doesn't process rows sequentially by default, when I run the basic transform, two rows are inserted, presumably because the insert of the first row has not finished before the lookup for the second row happens. I know the overall logic in the transform is correct because if I split the two rows in my file into two separate files and process them one at a time, I get one row, as desired. This led me down the path of single threader.
In my single threader attempt, I have a single threader step that reads from the file and sends each row to the initial job one at a time. I know the single threader one-at-a-time part is working because I put in a delay step and this is reflected in the run time. However, the single thread still does not solve my problem. The exact same behavior exists. I can see by watching the database that no commit is executed until the entire transform is done. Even though the rows are processed one at a time, the second row still does not find the inserted data from the first row.
I have tried decreasing the commit size of all my insert steps to 1, trying to force autocommit to true, manually inserting a SQL step with an explicit commit call after each insert, changing the cache values, changing the initial transformation's "Make the transformation database transactional" settings, all to no avail.
Does anyone have any idea how I might accomplish this use case?
Thanks!