Hi there.
I'm trying to build a pipeline that will compare two source files and identify discrepancies, that is records with that have different values in ANY position within the record. I want to output one row per discrepancy, something like
Record 1, Variable7, Source A="12.1", Source B= "12.11"
I've got a transformation that reads the two sources, sorts by id, and then does an inner join so i get an output record that consists of the record from source B appended to the one from source A. Then I want to use javascript to compare and to output the rows. The code below is a simple test, based on an example I found written by Slawomir Chodnicki. This works on a simple test file with 7 columns in each source so the merged record is 14 columns wide. That size is hard coded right now.
The code loops through the row asking whether row[1]!= [row]. If they are not equal it outputs a row with the source A and B values. I need to add the variable name but my immediate problem is that the equality test fails when the field is a date, time, or datetime. I've looked up various posts about Penatho and about javascript in general and I see that two date objects are never equal. There are a number of sugestions about how to compare two dates but all of them pre-suppose that I know I've got two dates. The real sources I want to compare have 181 columns in each consisting of a mixture of numbers, dates and text. How can I make a coparison when I don't know the variable type? I've loked around and I keep getting directed to posts about what to do when the variable type = unkown. That's not the situation her. Every variable will be of known type, just not known by me.
Many thanks for any help.
Nick
//loop through input record comparing start 1st halF to 2nd
for (var i=0;i<7;i++) {
// create a new row object, that fits our output size
var outrowrow = createRowCopy(getOutputRowMeta().size());
// find the index of the first field appended by this step
var idx = getInputRowMeta().size();
// fill each field (notice how the index is increased)
outrowrow[idx++] = row[i];
outrowrow[idx++] = row[i+7];
if (row[i] !==row[i+7]) {
// output the row
putRow(outrowrow);
}
}
// do not output the trigger row
trans_Status = SKIP_TRANSFORMATION;
//trans_Status = ERROR_TRANSFORMATION;
I'm trying to build a pipeline that will compare two source files and identify discrepancies, that is records with that have different values in ANY position within the record. I want to output one row per discrepancy, something like
Record 1, Variable7, Source A="12.1", Source B= "12.11"
I've got a transformation that reads the two sources, sorts by id, and then does an inner join so i get an output record that consists of the record from source B appended to the one from source A. Then I want to use javascript to compare and to output the rows. The code below is a simple test, based on an example I found written by Slawomir Chodnicki. This works on a simple test file with 7 columns in each source so the merged record is 14 columns wide. That size is hard coded right now.
The code loops through the row asking whether row[1]!= [row]. If they are not equal it outputs a row with the source A and B values. I need to add the variable name but my immediate problem is that the equality test fails when the field is a date, time, or datetime. I've looked up various posts about Penatho and about javascript in general and I see that two date objects are never equal. There are a number of sugestions about how to compare two dates but all of them pre-suppose that I know I've got two dates. The real sources I want to compare have 181 columns in each consisting of a mixture of numbers, dates and text. How can I make a coparison when I don't know the variable type? I've loked around and I keep getting directed to posts about what to do when the variable type = unkown. That's not the situation her. Every variable will be of known type, just not known by me.
Many thanks for any help.
Nick
//loop through input record comparing start 1st halF to 2nd
for (var i=0;i<7;i++) {
// create a new row object, that fits our output size
var outrowrow = createRowCopy(getOutputRowMeta().size());
// find the index of the first field appended by this step
var idx = getInputRowMeta().size();
// fill each field (notice how the index is increased)
outrowrow[idx++] = row[i];
outrowrow[idx++] = row[i+7];
if (row[i] !==row[i+7]) {
// output the row
putRow(outrowrow);
}
}
// do not output the trigger row
trans_Status = SKIP_TRANSFORMATION;
//trans_Status = ERROR_TRANSFORMATION;