Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

Joining large files via MapReduce with Kettle

$
0
0
I have to very large files that are table exports from an RDBMS (tab-delimited), and the files are stored in HDFS. I need to "join" these two files on one of the "columns".

If you are familiar with Pig, you know that it is possible to use MapReduce to join files within HDFS. I was wondering if there is a way from within Kettle to join files using MapReduce. I know you can use a Join Rows step, but I don't think that uses the power of the Hadoop cluster to do the work. I also know that you could layer the files under HBase, but since I'm using this for ETL, the files won't be static.

Any pointers would be appreciated.

-Barry

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>