I am writing here to expose my doubts regarding the integration between Hadoop and Kettle.
I read in the official documentation (http://wiki.pentaho.com/display/BAD/...ro+and+Version) how to set up and configure Kettle for a specific Hadoop distribution. The way to communicate with Hadoop, and all other components of the ecosystem, is to download a “shim” (a small library that intercepts API calls and either redirects or handles them, or changes the calling parameters) for a specific Hadoop distro and version.
The component above exists for all commercial distributions, even for the most updated versions. However, for vanilla Apache Hadoop there is no downloadable component that allow to develop big data solutions without a commercial version of the platform (exists only the pre-configured 0.20.2 version).
I would like to ask the following:
1) The plug-ins for vanilla Hadoop distributions are not updated, so is there any commercial or technical reasons? While hadoop supports is just for commercial version I suppose could exist commercial reasons without any specifics tecnichal reason.
2) So, if there are not any technical reasons, what is the best way to develop a plug-in for Hadoop 2.2.0?
P.S. The versions of installed components on my cluster are: HBase 0.98 + Hive 0.11 + Zookeeper 3.4.6
Thanks in advance.
Regards
Pietro and Gaetano.
I read in the official documentation (http://wiki.pentaho.com/display/BAD/...ro+and+Version) how to set up and configure Kettle for a specific Hadoop distribution. The way to communicate with Hadoop, and all other components of the ecosystem, is to download a “shim” (a small library that intercepts API calls and either redirects or handles them, or changes the calling parameters) for a specific Hadoop distro and version.
The component above exists for all commercial distributions, even for the most updated versions. However, for vanilla Apache Hadoop there is no downloadable component that allow to develop big data solutions without a commercial version of the platform (exists only the pre-configured 0.20.2 version).
I would like to ask the following:
1) The plug-ins for vanilla Hadoop distributions are not updated, so is there any commercial or technical reasons? While hadoop supports is just for commercial version I suppose could exist commercial reasons without any specifics tecnichal reason.
2) So, if there are not any technical reasons, what is the best way to develop a plug-in for Hadoop 2.2.0?
P.S. The versions of installed components on my cluster are: HBase 0.98 + Hive 0.11 + Zookeeper 3.4.6
Thanks in advance.
Regards
Pietro and Gaetano.