Pentaho 5.3 EE
Hi All,
I am trying to figure out a solution but really struggling to resolve...
I have a stream of data originally read from an excel spreadsheet. There are multiple spread sheets, I have the column meta data stored in a database so I know for each file which fields I am expecting to receive.
I am able to validate and read the spreadsheet, which now needs converting to a flat (delimited) file.
Some columns in the spreadsheet are NULL values.
As part of the conversion process, I need to deal with NULL values.
Some of these NULL values will be converted to blank values, others, defined in the database meta data tables, will be populated using values from the previous row.
Example input stream:
Example after processing:
my_ref column is defined to be copied forward / populated from previous row.
So, my_ref on row 2, would be set to the value from the previous row (0001).
There could n number of rows in this state.
There could be n number of columns in a row requiring to be copied forward.
NULL values in mycol1 / mycol2 / mycol3 are not defined to be copied forward and so will be converted to blank values (depending on data type).
I have found examples where all null values are populated from previous row values, but its the specific fields nature of my problem causing the complication.
My thoughts on a solution transformation:
I am not sure I can use the analytic query step as I do not know until run time what columns I will receive and so cannot set the steps meta data.
Also, as the key identifer (my_ref) may be NULL, it is difficult to group the records.
I don't think the analytic query step supports metadata injection?
I am not sure I can use a javascript step as I need to do the lookup.
I can retrieve the column meta before hand, so I know what fields I will receive and which should be populated from previous, but need to find a way to relate it to the incoming data in order to do the comparison.
Hopefully the above makes sense...
Thanks
Jason
Hi All,
I am trying to figure out a solution but really struggling to resolve...
I have a stream of data originally read from an excel spreadsheet. There are multiple spread sheets, I have the column meta data stored in a database so I know for each file which fields I am expecting to receive.
I am able to validate and read the spreadsheet, which now needs converting to a flat (delimited) file.
Some columns in the spreadsheet are NULL values.
As part of the conversion process, I need to deal with NULL values.
Some of these NULL values will be converted to blank values, others, defined in the database meta data tables, will be populated using values from the previous row.
Example input stream:
my_ref | mycol1 | mycol2 | mycol2 | |
1 | 0001 | A | B | C |
2 | NULL | D | E | F |
3 | 0005 | G | NULL | H |
4 | 0007 | I | J | NULL |
Example after processing:
my_ref | mycol1 | mycol2 | mycol2 | |
1 | 0001 | A | B | C |
2 | 0001 | D | E | F |
3 | 0005 | G | H | |
4 | 0007 | I | J | 0 |
So, my_ref on row 2, would be set to the value from the previous row (0001).
There could n number of rows in this state.
There could be n number of columns in a row requiring to be copied forward.
NULL values in mycol1 / mycol2 / mycol3 are not defined to be copied forward and so will be converted to blank values (depending on data type).
I have found examples where all null values are populated from previous row values, but its the specific fields nature of my problem causing the complication.
My thoughts on a solution transformation:
- For each row, look at each field and retrieve its field name (normalise each row to get field name / type / value?).
- If value is NULL, Use the field name to lookup the meta data to see if this field should be populated from previous row.
- If row is to be populated from previous, determine previous row value.
- If not, check data type and set to 0 / blank string etc depending on data type.
I am not sure I can use the analytic query step as I do not know until run time what columns I will receive and so cannot set the steps meta data.
Also, as the key identifer (my_ref) may be NULL, it is difficult to group the records.
I don't think the analytic query step supports metadata injection?
I am not sure I can use a javascript step as I need to do the lookup.
I can retrieve the column meta before hand, so I know what fields I will receive and which should be populated from previous, but need to find a way to relate it to the incoming data in order to do the comparison.
Hopefully the above makes sense...
Thanks
Jason