Where i can download Pentaho shim to connect with Hadoop and Hive

August 12, 2016, 3:28 am

≫ Next: Exception when connecting with Hive: using class org.apache.hive.jdbc.HiveDriver

≪ Previous: Pass Variable in key lookup value of Database lookup value step

Hi,

I have download Pentaho data-integration and pentaho reporting.

now i am trying to download community edition Pentaho shims to connect with Hadoop and Hive. but every time it download a data-integration zip(pdi-ce-6.1.0.1-196.zip).

Where i can download Pentaho shims sepeartely that comptiable with Pentaho 6.

please help.

Thanks.

↧

Exception when connecting with Hive: using class org.apache.hive.jdbc.HiveDriver

August 12, 2016, 3:36 am

≫ Next: Help Regarding Pentaho job

≪ Previous: Where i can download Pentaho shim to connect with Hadoop and Hive

Hi,

I am using apache hadoop-2.7.0, apache-hive-2.0.0 and pentaho report designer version: 6.1.0.1-196 on my CentOS system.

I want to create the Pentaho report designer connection with Hive.

I have put hive-jdbc-2.0.0.jar in report-designer/lib/jdbc folder. now i am trying to create the data source.
I select:
Hadoop Hive 2
Host name: localhost
Database name: default
Port number: 10000

I got the following error when i test the connection:
Error connecting to database: (using class org.apache.hive.jdbc.HiveDriver)
org/apache/thrift/transport/TTransportException

kindly help to solve the problem.

Thanks.

↧

Help Regarding Pentaho job

August 12, 2016, 8:54 am

≫ Next: Is there a way to dynamically add a field?

≪ Previous: Exception when connecting with Hive: using class org.apache.hive.jdbc.HiveDriver

Hi ,

I am a new to pentaho and have developed many transforms in Spoon and executed them successfully on windows via spoon and ftped on Linux(ubuntu) and executed via

./pan.sh -file "./transformations/OneToOne.ktr"
etc etc

Now I have created a job in windows spoon and ftped to linux.

./kitchen.sh -file "./transformations/job1.kjb"

I want to run it on Linux ,the job1 refers to the OneToOne.ktr in windows location as "c:\mytransforms" and errors out !

How can I make the job1 to refer to "/transformations/OneToOne.ktr" in linux location instead of "c:\mytransforms"???

Thanks in advance !

↧

Is there a way to dynamically add a field?

August 12, 2016, 11:53 am

≫ Next: Pentaho Kettle - Delete Records

≪ Previous: Help Regarding Pentaho job

I'm trying to schedule a job to update regularly, which involves transformations that pull data from multiple files that are structured with each date as a column. Thus each time the files are updated with new monthly data, there would need to be a new field added in the file input step. I would also need this field to be added later in the transformations to the row normaliser step.

Is there a way to do this other than manually using the "Get Fields" button or typing in the new field each time?

Thanks!

↧

Pentaho Kettle - Delete Records

August 12, 2016, 3:02 pm

≫ Next: Pentaho Report Designer - MongoDB SSL/TLS

≪ Previous: Is there a way to dynamically add a field?

Hi, I'm trying to delete records in my target table based on whether the table exists in the source table. I tried using the 'Delete' step but then realized that this step is based on a conditional clause. My condition is quite simple "if the record/row doesn't exist in table A [source] delete the record/row from table B [target]".

I also read about using the the 'Merge Rows (diff)' step, but that seems to scan/compare the entire set of tables for differences.

The table is several million records with many hundred columns on a MySQL server, I need to perform this in the most efficient manner.

Any help would be appreciated.

↧

Pentaho Report Designer - MongoDB SSL/TLS

August 12, 2016, 4:55 pm

≫ Next: Fuzzy Matching

≪ Previous: Pentaho Kettle - Delete Records

Hello,

Working on updating a report in RD 6.1, and need to enable TLS for a mongodb connection. However, there doesn't appear to be a way to make this work. There is no option, and the UI doesn't seem to 'like' using a URI formatted string with the parameter at the end. Any ideas?

TIA

↧

Fuzzy Matching

August 12, 2016, 5:20 pm

≫ Next: java sdk - getting report definition outside of class path

≪ Previous: Pentaho Report Designer - MongoDB SSL/TLS

Hello Folks,
I am using the Fuzzy match step to compare two data sets. I am able to configure the step and compare the names on the lookup stream and the main stream. I was wondering whether I could compare on more than one field. For example, in my data, I have names and addresses on both sets - can I compare on both name and address and produce a cumulative similarity score? Thanks for your help.

↧

java sdk - getting report definition outside of class path

August 13, 2016, 12:08 am

≫ Next: I'm New to Weka, How SMO works?

≪ Previous: Fuzzy Matching

I am trying to get the report definition from a folder that is outside the class path, but have so far been unsuccessful.

The following successfully loads the definition from a folder contained in the class path, however I am unable to figure out how to get it from an absolute path outside the class path:

Code:

// Parse the report file

final URL reportDefinitionURL = classloader.getResource("some/path/inclass/Sample1.prpt");

final ResourceManager resourceManager = new ResourceManager();

final Resource directly = resourceManager.createDirectly(reportDefinitionURL, MasterReport.class);     

return (MasterReport) directly.getResource();

But how can I get the file definition based on a absolute path (linux) which is not located in the class path such as "/usr/share/pentaho/Sample1.prpt" ?

Any help greatly appreciated !

↧

I'm New to Weka, How SMO works?

August 13, 2016, 1:56 pm

≫ Next: Table of contents with subreports

≪ Previous: java sdk - getting report definition outside of class path

Hello Everyone.

I'm NettoJM, I'm brazilian and new to data mining and and I don't have a lot of mathemathc and programing knowledg, so I'm having a bat time trying to understund how all it works

I'm doing some tests on a little educational Data and trying to clarify how and how much the attributes do influence on the class attributes (IDEB)

Here is a pic of the data (I already converted to arff files):

sample.jpg

This is just simulated data, we are still collecting real data, and the number of attributes is three or four times bigger, and a lot of instances too, we are planning on using only number attributes, who is the best algorithm we shold try to use?

I got good accuracy with SMO (SVM) in some tests, but I can't understand the SMO output, you guys can help me on that too? I just want to understand how SMO works and his output, I can do a test with some provided dataset by Weka like the weather.numeric.arff, and I got results like that:

sample 2.jpg

Can someone explain to me what this output means in detail? Pardon any grammatical error, I am Brazilian so my english is not perfect.

Thank you.

Attached Images

sample.jpg (18.5 KB)
sample 2.jpg (22.4 KB)

↧

Table of contents with subreports

August 13, 2016, 2:07 pm

≫ Next: Mondrian and Teiid VDB

≪ Previous: I'm New to Weka, How SMO works?

I want to do create a table of contents into PRD , i have a report with 4 subreports , and i want to get the title (label) of each subreport and put it into the table of contents , any one have an idea to do it please.

Thank you

↧

Mondrian and Teiid VDB

August 14, 2016, 8:47 pm

≫ Next: Create New Analyzer in pentaho user console is taking longer time to load datasources

≪ Previous: Table of contents with subreports

Hi everyone,
Currently I'm using Teiid, a data virtualization tools. I've succeed to make a virtual database from excel data sources. I saw from the Mondrian documentation (http://mondrian.pentaho.com/document...stallation.php) that Mondrian can access it if we already have one JDBC data source.
Is it possible to access the virtual database running at the Teiid server by Mondrian through the JDBC? Any resource on how to do that?

Thank you very much. Any help is really appreciated.

↧

Create New Analyzer in pentaho user console is taking longer time to load datasources

August 14, 2016, 10:24 pm

≫ Next: CDE OLAP Wizard doesn't show Dimension member for selection

≪ Previous: Mondrian and Teiid VDB

Hi everyone,

I am using pentaho 5.3 enterprise edition and recently facing problems with create new analyzer reports. It is taking 10-15 minutes to show the data sources.

I deleted temp files in tomcat and in pentaho-solutions folder as well and it is still slow.

can anyone help me in this situation?

Thanks.

↧

CDE OLAP Wizard doesn't show Dimension member for selection

August 15, 2016, 4:49 am

≫ Next: How can i set a field´s value on kettle flow?

≪ Previous: Create New Analyzer in pentaho user console is taking longer time to load datasources

Hi, I'm new to CDE. I have created an OLAP cube successfully for Saiku Analytics. But the problem is, it didn't show up properly in the CDE OLAP Wizard. See attached JPG.

OLAP Wizard Dimensions Problem.jpg

Can somebody point me to the problem/solution? I don't understand what is wrong.

Here is the OLAP cube in Saiku Analytics screenshot.

Saiku Analytics Cube OK.jpg

And here is the Schema Workbench cube creation parameters screenshot.

Schema Workbench Cube Creation.jpg

By the way, I had created a different OLAP cube for another project in the same way and had CDE OLAP Wizard show up the Dimension member properly.

Thanks.

Attached Images

OLAP Wizard Dimensions Problem.jpg (15.7 KB)
Saiku Analytics Cube OK.jpg (21.7 KB)
Schema Workbench Cube Creation.jpg (38.3 KB)

↧

How can i set a field´s value on kettle flow?

August 15, 2016, 6:00 am

≫ Next: Enterprise Repository

≪ Previous: CDE OLAP Wizard doesn't show Dimension member for selection

Hello,

Actually i have a transformation that get some data from table1, do some transformation and load on table2. The transformation flow is like this (sql server tables):

1) Table Input with select field1, null as field2 from table1;
2) Look up on table2 returning id (integer value);
3) Table output table2 with two fields: field3, field4.

My doubt is how can i modify/transform field1 (initially NULL) with this value: 'report-number/'+convert(varchar,id). Concatenate one constant string with the id that i have returned with the lookup component. This transformed field2 i will load on table2.field4.

Thanks!

↧

Enterprise Repository

August 15, 2016, 6:30 am

≫ Next: Issue with 'field splitter' step in Spoon v6.1.0.3-223 dropping input field

≪ Previous: How can i set a field´s value on kettle flow?

Hello

Is it possible to use non-postgres database for PDI 6.1 enterprise repository?

Thanks,
kumar

↧

Issue with 'field splitter' step in Spoon v6.1.0.3-223 dropping input field

August 15, 2016, 11:12 am

≫ Next: Operationalizing Kettle

≪ Previous: Enterprise Repository

My issue is that after this step executes, the input field no longer exists. Just because I decide to split a field into 1 or more new output fields, it does not mean (IMHO) that I also want to get rid of the input field.

So I'm requesting 1 of the following:
1) a BUG (be confirmed & raised) & FIX to also output the specified input field, OR
2) a FEATURE REQUEST to include a option be added to this step to specify whether the input field should be dropped or not upon output (with default set to true to maintain compatibility, or reverse the wording & default value as appropriate).

Thank you,
Steven.

↧

Operationalizing Kettle

August 15, 2016, 11:49 am

≫ Next: NewMapComponent

≪ Previous: Issue with 'field splitter' step in Spoon v6.1.0.3-223 dropping input field

I have a couple of issues running Kettle (PDI 6.0.1) on Windows Server 2012 R2 that make me hesitate considering it for production use

- Kitchen always exits with error code 1 (Error) even on apparently successful runs. This essentially prevents me from monitoring scheduled runs

- Execution hangs or terminates with no log entry written to table. This prevents the script monitoring the job log table from detecting this condition

- cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp. Seems like this is a known problem but still present in 6.1. As a result, I can't run my transforms with no warnings.

- Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0. Come on, JRE8 was released two and a half years ago.

Seems like the attention to detail, let alone fit & finish just isn't there. I can't run a product in a business environment that I can't monitor.

↧

NewMapComponent

August 15, 2016, 2:37 pm

≫ Next: how to stop RandomForest object from saving bagging information

≪ Previous: Operationalizing Kettle

Hello,

My target: change image marker from query.

I realized that I must have a column with the name "marker" and in this column the url of the image. that ok!

If I test my url in the parameter : "Marker image" and add the code "this.markerImageGetter = 'urlMarker';" in the Pre-execution function , I can see my image on all markers

but it does not work dynamically....

Could you help me, please?

Clecle

↧

how to stop RandomForest object from saving bagging information

August 15, 2016, 3:02 pm

≫ Next: Thesis Help - Data Transformation

≪ Previous: NewMapComponent

Hi everyone. I'm working with an old java program which uses weka.jar (version 3.7.1). I'm training a random forest classifier using a dataset with approximately 2500 rows of data. I need to be able to save the classifier to an external file so that I can load it later, but with 750 trees the file is around 500Mb. Because of this size, I am running into Java memory problems.

Looking at the object, I can see that there is a field called m_inBag which holds an array the same size as the number of trees, and each element holds an array the size of the dataset with a 'true' or 'false' (for whether or not that particular piece of training data is in the bag or out of the bag for that tree). I don't think I need this information when I classify new data, so I want to clear this parameter so that my classifier is smaller. Unfortunately I can't access the field directly. The RandomForest.listOptions method gives me the following: -I (num trees), -K (num features), -S (rand num seed), -depth (max depth of trees), and -D (debug mode).

Could anyone tell me a way to stop the classifier from storing the bagging information? Or how to clear the field before I save the classifier? Thanks so much for the help.

↧

Thesis Help - Data Transformation

August 15, 2016, 7:44 pm

≫ Next: A new Pentaho Library on the community site

≪ Previous: how to stop RandomForest object from saving bagging information

Hey all,

I'm new to PDI-Kettle, a coworker recommended it to me as a solution to a problem I am having with the data that I have acquired for my thesis.

The general gist of my problem is that I have ~180 .csv files each of which correspond to an oil futures contract in the past. All of these .csv files are formatted identically (4 different columns {Index, Date, Contract Price, Numeric Date}. Obviously one of those is repetitive but that isn't as important. In addition to that I have another .csv file which is independent of these 180 .csv files. It is formatted differently as well with only two columns ({Date, Oil Price}). This spreadsheet contains the price for 25+ years.

My challenge is that I want to put all of these files together, while at the same time assigning the Oil price in the main spreadsheet to each instance in each file. Since Oil Futures overlap, there are many duplicate days which still need to be retained as they are unique to that specific contract. The other important thing to note is that the name of the file contains all of the information as to what contract it is, so retaining that and including it into the final table would be something I need to do.

Example of what I want to do....

.CSV1
Date / Future Price
1/1 10
1/2 11
1/3 12
1/4 13
1/5 14

.CSV2
Date / Future Price
1/3 11
1/4 12
1/5 13
1/6 14
1/7 15

Main.CSV
Date / Future Price
1/1 13
1/2 14
1/3 13
1/4 14
1/5 15
1/6 13
1/7 14

Final.CSV
.CSV1
Date / Future Price / Actual Price
1/1 10 13
1/2 11 14
1/3 12 13
1/4 13 14
1/5 14 13
.CSV2
Date / Future Price / Actual Price
1/3 11 13
1/4 12 14
1/5 13 15
1/6 14 13
1/7 15 14

I hope that can illustrate what I am trying to do.

I tried to input a .csv file but I'm not sure how to load all of the tons of .csv's that I have, or to transform them the way I need to as I am very new to this program. If there is anything else I can clarify on I'll try to in the morning!

Thanks in advance!

↧