JRIP on unbalanced data result interpretation

March 5, 2014, 6:15 am

≫ Next: Problem with filters with nameexpression

≪ Previous: Pentaho Report Designer V 3.9.1 : Crosstab

Hello all,

I have unbalanced data-set, so I down-sample majority class and train JRIP on that balanced set. As a result I have rules with number of covered instances and number of misclassified instances. But the number of misclassified instances is from balanced dataset, so it is somehow not realistic, because there is far less instances from majority class in balanced dataset used for training. What should I do, can I use the rules for dataset description and interpret their "accuracy" only from balanced testsset?

↧

Problem with filters with nameexpression

March 5, 2014, 7:08 am

≫ Next: XML error

≪ Previous: JRIP on unbalanced data result interpretation

Hi,
I have a big problem concerning NameExpression tag.
I tried to change current formatting of Month column in such manner:
<Level caption="Discrepancy arrival date Month" column="MONTH" hideMemberIf="Never" levelType="TimeMonths" name="Month" ordinalColumn="MONTH" type="Numeric" uniqueMembers="false">
<Annotations>
<Annotation name="AnalyzerDateFormat">[yyyy].[M]</Annotation>
</Annotations>
<NameExpression>
<SQL dialect="oracle">TIME.YEARMONTH</SQL>
</NameExpression>
</Level>
And during browsing of the cube in Analyzer tool everything seems to works fine but it isn't, because now if I try to define on such column a filter with option Current Discrepancy arrival date Month checked i get no data. Now if i delete the NameExpression:
<Level caption="Discrepancy arrival date Month" column="MONTH" hideMemberIf="Never" levelType="TimeMonths" name="Month" ordinalColumn="MONTH" type="Numeric" uniqueMembers="false">
<Annotations>
<Annotation name="AnalyzerDateFormat">[yyyy].[M]</Annotation>
</Annotations>
</Level>
And try again to add a filter with aforementioned option checked then I get some results. It seems that something went wrong because in pentaho logs storing executed queries in where condition instead of value for a certain month I got 1=0.

Please help ASAP.
P.S: I'm using Pentaho version: 4.8

↧

XML error

March 5, 2014, 8:02 am

≫ Next: kettle.properties mixed up after spoon touched it

≪ Previous: Problem with filters with nameexpression

error_1.jpg
Hello all, I'm getting the following error. Basically I closed my spoon.bat normally last night. But had few Data-input and Data-output that I started and well x-ed them out towards the days end.
That said today am with this error.
1) deleted the data-integration directory.
2) Re-downloaded and extracted
3) added the required lib directory files
4) Re-started spoon and got the above error again. error_1
error_2.jpg
When I get to my repository and try to access files from it I get the error_2 above.

Please advice.
Team OP

Attached Images

error_1.jpg (12.9 KB)
error_2.jpg (15.6 KB)

↧

kettle.properties mixed up after spoon touched it

March 5, 2014, 8:25 am

≫ Next: Stacking Filters Weka Explorer

≪ Previous: XML error

Hi,

I have added some additional variables at the end of the kettle.properties file and saved my changes. After starting Spoon and checking the kettle.properties file via Edit -> 'Edit the kettle.properties file' and closing the window again, all my variables seem to be randonly replaced throughout the file when I look at it with a text editor.

Is there a reason for this behaviour or even some kind of rule that determines how manually added variables are integrated into the kettle.properties?
Thanks for any advice!

Bobse

↧

Stacking Filters Weka Explorer

March 5, 2014, 9:11 am

≫ Next: MEtadata files in file-type repository

≪ Previous: kettle.properties mixed up after spoon touched it

Hi I'm new to Weka and using the explorer to try to do some text classification.
I have a training set which I have tested using the "word to string vector filter" and an "attribute selection" filter. However I want to be able to test the classifier on unseen data and so have tried using the "supplied test set option". After reading around I realise that the word to string vector filter has to be applied at the same time to both sets so I have used the "Filtered Classifier" option and proceeded to do this. However I cannot seem to apply the Attribute Selection filter as well??
If I am going about this the wrong way please let me know? Or if there is an option to apply or stack multiple filters when classifying that'd be great. Cheers

↧

MEtadata files in file-type repository

March 5, 2014, 12:12 pm

≫ Next: CDE: Where to find how these chart accepts data?

≪ Previous: Stacking Filters Weka Explorer

I'm wondering if there is any documentation on any metadata files that Spoon might create and use within a file-type kettle repository?

I'm asking because I believe that there is a metadata file corruption that is probably connected with a file-type repository in another there here (http://forums.pentaho.com/showthread...9055-XML-error)

↧

CDE: Where to find how these chart accepts data?

March 5, 2014, 1:59 pm

≫ Next: Copying one table to another that has more columns

≪ Previous: MEtadata files in file-type repository

All of these CCC2 charts look awesome, but implementing them is extremely difficult due in part that I don't know how these charts want to accept data. I've only been able to implement a bar chart only because of this tutorial: http://type-exit.org/adventures-with...ards-with-cde/

I see that in a simple bar chart you need a category, series, and a value. And if you want to do a paired bar chart, you have to group the category together.

Now as for pie charts, you just need a category and a measure.

I guessed the data input of the pie chart. But if I wanted a different variation of a pie chart like a paired bar chart, where do I go to find out how these chart accepts the data?

I've looked here (http://www.webdetails.pt/ctools/char....PieChart.html) and I still can't see where it says how the API accepts data.

↧

Copying one table to another that has more columns

March 5, 2014, 2:31 pm

≫ Next: Spoon Step to read a file's custom properties

≪ Previous: CDE: Where to find how these chart accepts data?

This is a basic question.
Input table A copy to table B.
Table B has more columns that Table A.
I used Table input and table output and connect but no option there to map new columns and sql issued to drop additional columns!

Do I have to use intermediate step? Insert/update was not clear to me.

Basically, I have 4 tables that have different columns and I need to consolidate them into one fact/table.
So, I created a large super table with all the columns combined and now I need to insert all the 4 tables into that super table.
So, I will be repeating that transformation 4 times to copy all the data.

I am new to PDI and doing a POC for my employer.

We are using SQL Server 2005 on windows 7.

Thanks in advance.

↧

Spoon Step to read a file's custom properties

March 5, 2014, 4:08 pm

≫ Next: Browser detection

≪ Previous: Copying one table to another that has more columns

Hi All,

I using the "Get File Names" step, to extract the filenames of a certain Folder/Subfolder. I would like to further "filter" the existing files, based upon each file's "Custom Properties".

I am using/reading files from folders on a Windows machine. I am using PDI 4.0.

I can't recall an existing Input transformation step that can do this.

Has anyone had a similar need and if so, can you advise how to proceed?

Many thanks and kind regards.

DMurray3

↧

Browser detection

March 5, 2014, 4:27 pm

≫ Next: search

≪ Previous: Spoon Step to read a file's custom properties

Hi - I'm looking into using Pentaho Reporting to develop several business sites at my new job. I have some reasonable current experience building open Mapping web sites for the US Government. We had a significant problems with browser differences because different browsers had different default font sizes and users could change those font sizes on the fly using control +/-. That made the problem even worse. Also, we had very specific headers/footers that we had to conform too. Unfortunately, we resorted to browser detection and applying different CSS. I have searched "Pentaho 5.0 Reporting by example: Beginner's Guide" which I purchased today and also this form and haven't seen a discussion on this. Is there a good cookbook approach to doing browser detection with Pentaho Reporting and is described anywhere? :confused: Thanks D

↧

search

March 5, 2014, 7:50 pm

≫ Next: Pass data from one table input step to other table input step without using job

≪ Previous: Browser detection

how to search first letter in the word and replace with blank. If my first letter contains "a" has to be replaced with empty.

My example input is

i/ptext
====
abc
def
a123
b123
ba123
abc123

o/ptext has to come in the below format
======
bc
def
123
b123
ba123
bc123

↧

Pass data from one table input step to other table input step without using job

March 5, 2014, 8:17 pm

≫ Next: Hierarchical Clustering

≪ Previous: search

Hi
I am using PDI datasource in my PRD.
So I cant use job, i have to use transformation.
Scenario:
There are two database A and B, A has one table employee(empid,empname) B has one table department(empid,deptid).
I have to use one table input step to find empid from database B and table department and then pass those empid to other table input step to find names of corresponding empid, Please help me how I can achieve this.

I can't use stream lookup as the table size is too large, I cant use database lookup as there may be 1000 empid to lookup so it will make report too slow.

is there any other thoughts?

Thanks

↧

Hierarchical Clustering

March 5, 2014, 10:12 pm

≫ Next: RTF Output

≪ Previous: Pass data from one table input step to other table input step without using job

Hi,

Is there a way to perform hierarchical clustering in Weka without specifying the value of K? How can I programatically (using API) retrieve the clusters at different hierarchy levels?

Thanks,
Saurabh

↧

RTF Output

March 5, 2014, 10:16 pm

≫ Next: DB Pooling + Not-Closed DB sessions

≪ Previous: Hierarchical Clustering

Dear all,

We have a problem with rtf output.
When we are selecting rtf as an output for the report we are getting a file without a file extension. Thus we have to save it with .rtf extension before opening.
Is there any work around this issue?

Btw, is it possible to rename generated file from generatedContent to something else (e.g. the name of the report)?

Thanks,
Katja.

↧

DB Pooling + Not-Closed DB sessions

March 5, 2014, 11:03 pm

≫ Next: steps too fast!

≪ Previous: RTF Output

Dear all,

I have two DB-related questions:

1) How to deal with database open sessions? (our DB administrator has noticed that there were a few when the reports have not been properly closed or when they were terminated). 2) Database Pooling – how to test if it really works. Now we have settings min 10, max 100. DB administrator did notice database pooling only twice.
(DB requests are being sent from the report/ from the transformation used in the reports).

Thanks in advance for your feedback.
Best regards,
Katja.

↧

steps too fast!

March 5, 2014, 11:57 pm

≫ Next: Exporting BI Metadata to external tools

≪ Previous: DB Pooling + Not-Closed DB sessions

Hi,

i have a transformation with a case.
Case 1) there is a insert in a table
Case 2) there is a update in the same table

Case 1 works slower than case 2, so when the update runs there is an error becaouse the insert isn't be yet.

I cannot set a stop step because the case 1 sometimes does not have to run (so case 2 either).

What can i do?

Thanks on advance

↧

Exporting BI Metadata to external tools

March 6, 2014, 12:35 am

≫ Next: Excel Output / Writer Append Option

≪ Previous: steps too fast!

Hi,

I am a newbie with some interesting idea.

Is it possible to export the metadata of Pentaho to understand the schematics of the database store and fire queries to the database directly or via pentaho??

I would like to know the feasibility to achieve the same.

Regards,
Vijay Raajaa G S

↧

Excel Output / Writer Append Option

March 6, 2014, 2:03 am

≫ Next: excel can access mondrian cube

≪ Previous: Exporting BI Metadata to external tools

Hi,

My quesion is extremely simple but I can't get it to work:
I have a transormation with excel output, I want the output to be appended to a file, and if I run the transformation several times, all the results will be appended.
I mark the append option, but it doesn't work, I always see only the last run results.
I tried using MS excel output and excel writer.

attached a simple example transformation.

Thanks a lot

Attached Files

example.ktr (24.7 KB)

↧

excel can access mondrian cube

March 6, 2014, 2:12 am

≫ Next: How to access hive table in pentaho kettle model

≪ Previous: Excel Output / Writer Append Option

I found this video on the internet ( http://www.youtube.com/watch?v=8eq_dE7_O3s ) where it is shown how to connect excel to a mondrian cube via XML/A .
a comment from Jason Chu says that the code of mondrian has been modified to implement the functionality and is available to be inserted into the thrunk of mondrian .
if anyone knows if this patch will be accepted or how to modify the code of mondrian, I would be very grateful if he would provide information about .
thanks for any answer

↧

How to access hive table in pentaho kettle model

March 6, 2014, 2:33 am

≫ Next: pentaho user console cohort analysis

≪ Previous: excel can access mondrian cube

Hi,
I want to create hive analyzer for this I was created hive connections using hadoop hive , test connections show successful but when I click on hive table for measure and dimensions it shows following error

Code:

org.pentaho.agilebi.modeler.ModelerException: org.pentaho.pms.core.exception.PentahoMetadataException: org.pentaho.di.core.exception.KettleDatabaseException: Couldn't get field info from [SELECT * FROM weblogs]





Method not supported





    at org.pentaho.agilebi.modeler.util.ModelerSourceUtil.generateDomain(ModelerSourceUtil.java:118)

    at org.pentaho.agilebi.modeler.util.TableModelerSource.generateDomain(TableModelerSource.java:50)

    at org.pentaho.agilebi.modeler.util.ModelerWorkspaceUtil.populateModelFromSource(ModelerWorkspaceUtil.java:26)

    at org.pentaho.agilebi.spoon.modeler.SpoonModelerController.editDataSource(SpoonModelerController.java:208)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:329)

    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:139)

    at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:123)

    at org.pentaho.ui.xul.swt.tags.SwtButton.access$500(SwtButton.java:26)

    at org.pentaho.ui.xul.swt.tags.SwtButton$4.widgetSelected(SwtButton.java:119)

    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)

    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)

    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)

    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)

    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)

    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1221)

    at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7044)

    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:8304)

    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:580)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)

Caused by: org.pentaho.pms.core.exception.PentahoMetadataException: org.pentaho.di.core.exception.KettleDatabaseException: 

Couldn't get field info from [SELECT * FROM weblogs]





Method not supported





    at org.pentaho.metadata.automodel.AutoModeler.generateDomain(AutoModeler.java:127)

    at org.pentaho.agilebi.modeler.util.ModelerSourceUtil.generateDomain(ModelerSourceUtil.java:81)

    ... 26 more

Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 

Couldn't get field info from [SELECT * FROM weblogs]





Method not supported





    at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2330)

    at org.pentaho.di.core.database.Database.getQueryFields(Database.java:2242)

    at org.pentaho.di.core.database.Database.getQueryFields(Database.java:1939)

    at org.pentaho.di.core.database.Database.getTableFields(Database.java:1934)

    at org.pentaho.metadata.automodel.PhysicalTableImporter.importTableDefinition(PhysicalTableImporter.java:69)

    at org.pentaho.metadata.automodel.AutoModeler.generateDomain(AutoModeler.java:114)

    ... 27 more

Caused by: java.sql.SQLException: Method not supported

    at org.apache.hadoop.hive.jdbc.HivePreparedStatement.getMetaData(HivePreparedStatement.java:250)

    at org.pentaho.di.core.database.Database.getQueryFieldsFallback(Database.java:2323)

    ... 32 more

For this I used Kettle - Spoon Stable Release - 4.4.0 CE.

↧