I have date dimension as common and is used in many of my transformations.
The date dimension is populated using combination lookup/update, and I also "hashcode" that is part of combinational lookup.
When I run my first transformation, the entries added in date dimension are unique.
pcwh=# select * from date_dimension
pcwh-# ;
date_key | year | week_number | month_number | day_in_month | quarter_number | day_end_date | hashcode
----------+------+-------------+--------------+--------------+----------------+--------------+-----------
1 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 641661091
2 | | | | | | | 34
3 | 2013 | 16 | 4 | 22 | 2 | 2013-04-22 | 814362725
4 | 2013 | 16 | 4 | 21 | 2 | 2013-04-21 | 728011844
(4 rows)
Now, when I run my second transformation, it is adding a duplicate row.
pcwh=# select * from date_dimension
pcwh-# ;
date_key | year | week_number | month_number | day_in_month | quarter_number | day_end_date | hashcode
----------+------+-------------+--------------+--------------+----------------+--------------+------------
1 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 641661091
2 | | | | | | | 34
3 | 2013 | 16 | 4 | 22 | 2 | 2013-04-22 | 814362725
4 | 2013 | 16 | 4 | 21 | 2 | 2013-04-21 | 728011844
5 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 1723169982
(5 rows)
As you see Row 1 and Row 5 are pointing to the same date, but a new entry has been added by the second transformation.
How can we avoid duplication ? Here, it is adding a duplication, as the hashcode computed is different.
The date dimension is populated using combination lookup/update, and I also "hashcode" that is part of combinational lookup.
When I run my first transformation, the entries added in date dimension are unique.
pcwh=# select * from date_dimension
pcwh-# ;
date_key | year | week_number | month_number | day_in_month | quarter_number | day_end_date | hashcode
----------+------+-------------+--------------+--------------+----------------+--------------+-----------
1 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 641661091
2 | | | | | | | 34
3 | 2013 | 16 | 4 | 22 | 2 | 2013-04-22 | 814362725
4 | 2013 | 16 | 4 | 21 | 2 | 2013-04-21 | 728011844
(4 rows)
Now, when I run my second transformation, it is adding a duplicate row.
pcwh=# select * from date_dimension
pcwh-# ;
date_key | year | week_number | month_number | day_in_month | quarter_number | day_end_date | hashcode
----------+------+-------------+--------------+--------------+----------------+--------------+------------
1 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 641661091
2 | | | | | | | 34
3 | 2013 | 16 | 4 | 22 | 2 | 2013-04-22 | 814362725
4 | 2013 | 16 | 4 | 21 | 2 | 2013-04-21 | 728011844
5 | 2013 | 16 | 4 | 20 | 2 | 2013-04-20 | 1723169982
(5 rows)
As you see Row 1 and Row 5 are pointing to the same date, but a new entry has been added by the second transformation.
How can we avoid duplication ? Here, it is adding a duplication, as the hashcode computed is different.