Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

Hive JDBC Query too slow: too many fetches after query execution: Kettle Xform

$
0
0
Here is how the balances are different between the reports.

I have setup a kettle tranform with only one step "Table Input" that fires a query on a Hive table.Attach hive job screenshot.
KettleJob.jpg

The connection sets up fine (except that it only connects to default database. Known issue, there is a jira on it). Attached connection screenshot.
KettleHiveConnection.jpg

It works fine but is very slow. I checked the hiveserver2 logs and looks like the query executes really fast but after that there are a lot fetchresults executed and it takes forever for the data to make it back. Its fine if the resultset is small but for ~300000 rows it takes like 10 mins. We are using Cloudera4.5 (hence the bigdata plugin for kettle cdh42), out network is top of the line. Went I execute the same query on Hue(CDH query portal) on my desktop, it is very fast to return the same resultset, no fetchresults are seen in the logs.

I checked the Hive-JDBC code for cdh42. Looks like the fetchsize is hardcoded to 50. And I can't figure out how and if I can send it down as a parameter.
https://github.com/pentaho/hive/blob...Statement.java

I am conjecturing that fetchsize is the issue but cannot say for sure since I could not figure if I have the right piece of code above and how I can build it. It is not setup like the other pentaho projects in github.
Can someone please help me out? Everything about the setup works fine and I am getting killed in performance. Thanks!!!

Hiverserver2 log for reference below:
5:46:24.270 PM INFO org.apache.hadoop.hive.ql.exec.Task
2014-02-20 17:46:24,265 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 85.06 sec
5:46:25.285 PM INFO org.apache.hadoop.hive.ql.exec.Task
2014-02-20 17:46:25,281 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 85.06 sec
5:46:26.301 PM INFO org.apache.hadoop.hive.ql.exec.Task
2014-02-20 17:46:26,296 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 85.06 sec
5:46:26.303 PM INFO org.apache.hadoop.hive.ql.exec.Task
MapReduce Total cumulative CPU time: 1 minutes 25 seconds 60 msec
5:46:26.322 PM INFO org.apache.hadoop.hive.ql.exec.Task
Ended Job = job_201402072147_0180
5:46:26.330 PM INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator
Moving tmp dir: hdfs://nameservice1/tmp/hive-hive/hive_2014-02-20_17-45-52_736_1233364556268491863-2/_tmp.-ext-10001 to: hdfs://nameservice1/tmp/hive-hive/hive_2014-02-20_17-45-52_736_1233364556268491863-2/_tmp.-ext-10001.intermediate
5:46:26.338 PM INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator
Moving tmp dir: hdfs://nameservice1/tmp/hive-hive/hive_2014-02-20_17-45-52_736_1233364556268491863-2/_tmp.-ext-10001.intermediate to: hdfs://nameservice1/tmp/hive-hive/hive_2014-02-20_17-45-52_736_1233364556268491863-2/-ext-10001
5:46:26.346 PM INFO org.apache.hadoop.hive.ql.Driver
</PERFLOG method=Driver.execute start=1392936353985 end=1392936386346 duration=32361>
5:46:26.346 PM INFO org.apache.hadoop.hive.ql.Driver
MapReduce Jobs Launched:
5:46:26.347 PM INFO org.apache.hadoop.hive.ql.Driver
Job 0: Map: 5 Cumulative CPU: 85.06 sec HDFS Read: 979418229 HDFS Write: 664486 SUCCESS
5:46:26.347 PM INFO org.apache.hadoop.hive.ql.Driver
Total MapReduce CPU Time Spent: 1 minutes 25 seconds 60 msec
5:46:26.347 PM INFO org.apache.hadoop.hive.ql.Driver
OK
5:46:26.348 PM INFO org.apache.hadoop.hive.ql.Driver
<PERFLOG method=releaseLocks>
5:46:26.365 PM INFO org.apache.hadoop.hive.ql.Driver
</PERFLOG method=releaseLocks start=1392936386348 end=1392936386365 duration=17>
5:46:26.365 PM INFO org.apache.hadoop.hive.ql.Driver
</PERFLOG method=Driver.run start=1392936353955 end=1392936386365 duration=32410>
5:46:26.365 PM INFO org.apache.hive.service.cli.CLIService
SessionHandle [2a370498-ee64-43c2-9add-284f2e61b53b]: executeStatement()
5:46:26.365 PM INFO org.apache.hive.service.cli.CLIService
OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=1be1c7de-1521-4c10-b393-f39457fb2603]: getOperationStatus()
5:46:26.418 PM WARN org.apache.hadoop.hive.conf.HiveConf
DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
5:46:26.420 PM INFO org.apache.hive.service.cli.CLIService
OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=1be1c7de-1521-4c10-b393-f39457fb2603]: getResultSetMetadata()
5:46:26.498 PM WARN org.apache.hadoop.hive.conf.HiveConf
DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
5:46:26.556 PM WARN org.apache.hadoop.hive.conf.HiveConf
DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
5:46:26.565 PM INFO org.apache.hadoop.mapred.FileInputFormat
Total input paths to process : 5
5:46:26.713 PM WARN org.apache.hadoop.hive.conf.HiveConf
DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
5:46:26.780 PM INFO org.apache.hive.service.cli.CLIService
OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=1be1c7de-1521-4c10-b393-f39457fb2603]: fetchResults()
5:46:26.920 PM WARN org.apache.hadoop.hive.conf.HiveConf
DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
5:46:26.987 PM INFO org.apache.hive.service.cli.CLIService
OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=1be1c7de-1521-4c10-b393-f39457fb2603]: fetchResults()
...
(Lot more fetchresults)
...
...
Attached Images

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>