Hi - I have the situation that I want to design an address table. I want to ensure each unique address only exists in the table once. As I process an address, if it already exists in the table, I want to update a 'last seen' field. If it doesn't exist then I want to insert it. I have several fields, for example street number, street name, street direction. Not every street has a direction, so many times this field is blank. Pentaho inserts NULL instead of blank during the insert. Because of this, when this address is seen again, the database lookup doesn't detect that it is the same as the previous address (because null != null).
I have set KETTLE_EMPTY_STRING_DIFFERS_FROM_NULL to Y.
Going into table output, the street direction is an empty string according to preview data. It appears empty and does not say <null>. The database, however, receives this as null because in it's infinite wisdom, an empty string and a null are the same to Oracle.
However, the issue is that when that address comes up again, I do a database lookup on all the address fields, including street direction. Since it is null in Oracle it isn't considered to be equal. There is no mechanism in database lookup to use an NVL(FIELD,'-') or something to allow me to match against nulls. I can't return values from an Execute SQL statement, and the table is too large to use a Table input step and do the conversion / matching in Kettle.
So how can I do this? I guess maybe I substitute an empty string in the transformation for a key character on both the insert and subsequent match? Is there a quick way like the 'if null' step, or do I need to check each field with javascript? I have to think someone has encountered this before with Oracle and Table Output/Database Lookup.
Thanks!
-Aaron