J48 Decision Tree Understanding

August 25, 2015, 6:17 am

≫ Next: Red Hat patching, Pentaho won't start

≪ Previous: In Windows/DOS kettle parameters on separate lines - is this possible?

Hi Guys,

I am extremely new to WEKA/Statistical Analysis. My background is in accounting and as such I don't understand statistics at an extremely high level and am trying to learn it. I have been watching the WEKA courses(on the weka website) related to supervised learning and creating some J48 trees using their data sets. I was starting to get a hang of creating the trees, however, I felt that I needed to create my own data set and experiment with it in order to understand it further. So I made up a small set of data and placed it into a .csv file:
Number 1,Number 2,Buy
1,2,yes
2,1,no
1,3,yes
5,6,yes
6,2,no
3,4,yes
4,5,yes
6,1,no

Then I made a j48 tree based on this. The rules I got were that if number2<=2 then no, otherwise yes. I wanted the j48 to do more than that. For example, in my data set, I specifically made it so that if number 1 was less than number 2, then it would be a yes, other wise no.

How does J48 work and what is it doing on my data? Can anyone explain this? I apologize if I am asking a very basic question. I am new to statistics and the wikipedia pages are extremely complicated to understand. Also, the weka videos are made in such a way that you are automatically supposed to understand what a j48 is doing to your data set. This is why I built my own small data set to experiment and understand what a j48 is doing to it. However, I don't think I truly get it.

If someone can explain J48 on the data set above, I would be very thankful. I appreciate all the help provided in my quest to learn data analysis. Thanks :).

Attached Images

WekaJ48Run.jpg (33.3 KB)
Tree.jpg (31.7 KB)
DataSet.jpg (14.7 KB)

↧

Red Hat patching, Pentaho won't start

August 25, 2015, 6:57 am

≫ Next: Combining Rows

≪ Previous: J48 Decision Tree Understanding

Patching our Linux servers usually isn't an issue. This month, the Pentaho servers were patched and when restarting Pentaho, it wouldn't start and was throwing errors referring to Python. Has anyone else seen this lately? if not, I will log with support.

↧

Combining Rows

August 25, 2015, 7:53 am

≫ Next: Pentaho World 2015 - October 14-15

≪ Previous: Red Hat patching, Pentaho won't start

Hello,

I have created a primary key to indicate which rows belong in a specific group. I would now like to merge those rows into one, but I am running into difficulty. Some of the fields contain the same values while others are different. Here is a sample of the data that I currently have. The primary key is RecordGr. I would like all rows associated with RecordGr to be combined into one long row.

RecordGr	HAR	Field_000	Field_001	Field_002	Field_003	Field_004	Field_005	Field_006
101	7901	CAS	PR	2	372.36		3	150
101	7901	CAS	PR	2	17.44
101	7901	CAS	PR	2	2.6
101	7901	CAS	PR	2	34.6
101	7901	CAS	PR	2	11.23
101	7901	CAS	PR	2	7.53
101	7901	CAS	PR	2	21.46
101	7901	CAS	PR	2	0.67
101	7901	CAS	PR	2	13.31
101	7901	CAS	PR	2	115.16

↧

Pentaho World 2015 - October 14-15

August 25, 2015, 8:08 am

≫ Next: Reading data from a webservice URL API in Pentaho

≪ Previous: Combining Rows

Hello everyone

PentahoWorld 2015 is almost here! October 14-15 we'll be in Orlando, Florida learning best practices, use cases, innovations from Pentaho experts, users, advocates, and partners. You can explore the keynotes, session agenda, and interactive trainings as well as check out the highlights from PentahoWorld 2014 here.

I'd also highlight that you can get all your technical "how-to" questions answered by the team that developed the products and chat with key members of Pentaho’s development team about customization, embedding, development best practices and pro-tips, the latest techniques and sources for blending, next-generation plug-in development, and much more!

If you're interested, be sure to register before August 31st so you can take advantage of early bird pricing and save up to 40% on registration and training. If you're thinking about sending a few people, ask us about group discounts!

And btw - I'll follow up very soon with a blog post about the also amazing PCM15, happening in London in November 6-7... :)

↧

Reading data from a webservice URL API in Pentaho

August 25, 2015, 9:27 am

≫ Next: Execute command in Remote windows machine

≪ Previous: Pentaho World 2015 - October 14-15

Dear all,

I have a task where I am supposed to use a Rest Client step to use the webservice URL to be able get the data of interest.

This is the URL link https://reporting.linkshare.com/en/r...=xyxyxyxyxyxyx

Because of confidentiality issues, I replaced the token with something else.

I have sample csv report data that is extracted when I run it manually; https://www.dropbox.com/s/i9ojz0mfkb...t.csv.txt?dl=0

I kindly need some hints on automating this since I haven't used a Rest Client step before.

Any suggestions or ideas are highly appreciated.

Thanks,

Ron

↧

Execute command in Remote windows machine

August 25, 2015, 10:14 am

≫ Next: Parameterized Dashboard Slow Query Performance Postgres 9.2.8

≪ Previous: Reading data from a webservice URL API in Pentaho

Hi,

I have a requirement to invoke an app(through windows command) residing in one of the remote windows machine. I have tried the the following.

1. Wrapped up the "shell" step in a job and mentioned slave server as the remote machine. But my slave server does not have pentaho installed. So I am getting the error that the slave server not valid.

2. In the SSH step I have made the "Working Directory" as the network directory on remote machine where I have the read and write permission. (Ex: \\hostname\dir1\subdir\). But I am getting the error "\hostname\dir1\subdir\"invalid directory. I tried creating a variable also(in working directory) but not working.

3. Used psexec But got an error The command not identified as internal\external.

Can somebody please guide me on this.

↧

Parameterized Dashboard Slow Query Performance Postgres 9.2.8

August 25, 2015, 10:48 am

≫ Next: A navigation menu for dashboards

≪ Previous: Execute command in Remote windows machine

I'm having issues with a parameterized dashboard using sql over slqjdbc as the data source . When the values are hard coded into the data source query the results are comparable to running the SQL in PSQL/PGAdmin (1-2 seconds). However, when I substitute in parameters provided by various selectors on the dashboard the query performance is greatly degraded (+1 minutes). Based on my research Postgres used to have problems with parameterized queries in pre 9.2 releases, as they used a generic execution plan vs. creating a new execution plan for each new query. This looks like a very similar issue but I'm using PostgreSQL 9.2.8 which should have this issue resolved. Any suggestions on how I might be able to get a parameterized query running as efficiently as the hard-coded query would be greatly appreciated.

↧

A navigation menu for dashboards

August 25, 2015, 12:49 pm

≫ Next: Error writing to log table in Hadoop

≪ Previous: Parameterized Dashboard Slow Query Performance Postgres 9.2.8

The Bootstrap framework used by Community Dashboards contains a "Navbar" component. It can be used to implement an elegant, centrally maintained menu system for dashboards (all of them or a subset) by defining the menu in one single JavaScript file and including that in the dashboards.

Here is my blog posting about implementing it:
http://datascientist.at/2015/08/navi...oards/#english

↧

Error writing to log table in Hadoop

August 25, 2015, 4:18 pm

≫ Next: XBase input cyrillic

≪ Previous: A navigation menu for dashboards

I created a log table for a transformation (using the Step selection in the logging panel) on a Hadoop system, and used the SQL generated in PDI; the CREATE TABLE statement worked fine, but it defined the LOG_DATE column as a string.

When I run the transformation, I get an error because apparently Kettle was trying to specify the LOG_DATE as a date field, not a string. Here is the error message.

2015/08/25 13:48:58 - Hive2_on_Data_Cluster - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Unable to write log record to log table [tmp_log_error]
2015/08/25 13:48:58 - Hive2_on_Data_Cluster - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
2015/08/25 13:48:58 - Hive2_on_Data_Cluster - offending row : [ID_BATCH Integer(8)], [CHANNEL_ID String(255)], [LOG_DATE Date], [TRANSNAME String(255)], [STEPNAME String(255)], [STEP_COPY Integer(3)], [LINES_READ Integer(18)], [LINES_WRITTEN Integer(18)], [LINES_UPDATED Integer(18)], [LINES_INPUT Integer(18)], [LINES_OUTPUT Integer(18)], [LINES_REJECTED Integer(18)], [ERRORS Integer(18)], [LOG_FIELD String(9999999)]

How can I resolve this error?

↧

XBase input cyrillic

August 25, 2015, 10:56 pm

≫ Next: Opening a transformation task in Pentaho (Error reading object from XML)

≪ Previous: Error writing to log table in Hadoop

Hello.
When using XBase the input cyrillic. How do I change the encoding?

↧

Opening a transformation task in Pentaho (Error reading object from XML)

August 26, 2015, 2:29 am

≫ Next: How to create dimension member with name and key ? #

≪ Previous: XBase input cyrillic

Hi, all,

After I constructed a transformation task in Pentaho spoon, and then saved it as a .ktr file.
But an error occurs when I open it, and the error messages are shown below.

To be more specific, this error happens when the task contains a "Hadoop file output" step and occurs on Ununtu only. The tasks on Mac are working fine.

Did anyone encounter this problem and solve it before?

My Pentaho spoon edition: General Availability Release - 5.4.0.1-130
My OS: Ubuntu 14.04.3 LTS

Error Messages:

Error reading object from XML file

Unable to load step info from XML step nodeorg.pentaho.di.core.exception.KettleXMLException:
Unable to load step info from XML
at org.pentaho.commons.launcher.Launcher.main (Launcher.java:92)
at java.lang.reflect.Method.invoke (Method.java:606)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
at org.pentaho.di.ui.spoon.Spoon.main (Spoon.java:654)
at org.pentaho.di.ui.spoon.Spoon.start (Spoon.java:9190)
at org.pentaho.di.ui.spoon.Spoon.waitForDispose (Spoon.java:7939)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch (Spoon.java:1319)
at org.eclipse.swt.widgets.Display.readAndDispatch (null:-1)
at org.eclipse.swt.widgets.Display.runDeferredEvents (null:-1)
at org.eclipse.swt.widgets.Widget.sendEvent (null:-1)
at org.eclipse.swt.widgets.EventTable.sendEvent (null:-1)
at org.eclipse.jface.action.ActionContributionItem$5.handleEvent (ActionContributionItem.java:402)
at org.eclipse.jface.action.ActionContributionItem.access$2 (ActionContributionItem.java:490)
at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection (ActionContributionItem.java:545)
at org.eclipse.jface.action.Action.runWithEvent (Action.java:498)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem$1.run (JfaceMenuitem.java:106)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem.access$100 (JfaceMenuitem.java:43)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke (AbstractXulComponent.java:141)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke (AbstractXulComponent.java:157)
at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke (AbstractXulDomContainer.java:313)
at java.lang.reflect.Method.invoke (Method.java:606)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4159)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4222)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4550)
at org.pentaho.di.ui.spoon.TransFileListener.open (TransFileListener.java:51)
at org.pentaho.di.trans.TransMeta.loadXML (TransMeta.java:2977)
at org.pentaho.di.trans.step.StepMeta.<init> (StepMeta.java:307)
at org.pentaho.di.trans.steps.textfileoutput.TextFileOutputMeta.loadXML (TextFileOutputMeta.java:628)
at org.pentaho.di.trans.steps.textfileoutput.TextFileOutputMeta.readData (TextFileOutputMeta.java:693)
at org.pentaho.di.trans.steps.hadoopfileoutput.HadoopFileOutputMeta.loadSource (HadoopFileOutputMeta.java:97)

Unable to load step info from XML
at org.pentaho.commons.launcher.Launcher.main (Launcher.java:92)
at java.lang.reflect.Method.invoke (Method.java:606)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
at org.pentaho.di.ui.spoon.Spoon.main (Spoon.java:654)
at org.pentaho.di.ui.spoon.Spoon.start (Spoon.java:9190)
at org.pentaho.di.ui.spoon.Spoon.waitForDispose (Spoon.java:7939)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch (Spoon.java:1319)
at org.eclipse.swt.widgets.Display.readAndDispatch (null:-1)
at org.eclipse.swt.widgets.Display.runDeferredEvents (null:-1)
at org.eclipse.swt.widgets.Widget.sendEvent (null:-1)
at org.eclipse.swt.widgets.EventTable.sendEvent (null:-1)
at org.eclipse.jface.action.ActionContributionItem$5.handleEvent (ActionContributionItem.java:402)
at org.eclipse.jface.action.ActionContributionItem.access$2 (ActionContributionItem.java:490)
at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection (ActionContributionItem.java:545)
at org.eclipse.jface.action.Action.runWithEvent (Action.java:498)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem$1.run (JfaceMenuitem.java:106)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem.access$100 (JfaceMenuitem.java:43)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke (AbstractXulComponent.java:141)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke (AbstractXulComponent.java:157)
at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke (AbstractXulDomContainer.java:313)
at java.lang.reflect.Method.invoke (Method.java:606)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4159)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4222)
at org.pentaho.di.ui.spoon.Spoon.openFile (Spoon.java:4550)
at org.pentaho.di.ui.spoon.TransFileListener.open (TransFileListener.java:51)
at org.pentaho.di.trans.TransMeta.loadXML (TransMeta.java:2977)
at org.pentaho.di.trans.step.StepMeta.<init> (StepMeta.java:307)
at org.pentaho.di.trans.steps.textfileoutput.TextFileOutputMeta.loadXML (TextFileOutputMeta.java:628)
at org.pentaho.di.trans.steps.textfileoutput.TextFileOutputMeta.readData (TextFileOutputMeta.java:693)
at org.pentaho.di.trans.steps.hadoopfileoutput.HadoopFileOutputMeta.loadSource (HadoopFileOutputMeta.java:97

↧

How to create dimension member with name and key ? #

August 26, 2015, 2:40 am

≫ Next: [Custom parameter]

≪ Previous: Opening a transformation task in Pentaho (Error reading object from XML)

I didn't find how to create member with key, so I used the reflection.

Code:

public static Member createHierarchyMember(Hierarchy hierarchy, String memberName, Long memberKey) {

        Member member = hierarchy.createMember(null, hierarchy.getLevels()[0], memberName, null);

        try {

            Field fieldKey = RolapMemberBase.class.getDeclaredField("key");

            fieldKey.setAccessible(true);

            int modifiers = fieldKey.getModifiers();

            Field modifierField = fieldKey.getClass().getDeclaredField("modifiers");

            modifiers = modifiers & ~Modifier.FINAL;

            modifierField.setAccessible(true);

            modifierField.setInt(fieldKey, modifiers);

            fieldKey.set(((DelegatingRolapMember) member).member, memberKey.intValue());

        } catch (Exception e) {

            throw new RuntimeException(e);

        }

        return member;

    }









... cacheControl.createAddCommand(newMember)

How to create member with name and key using mondrian's API ?

mondrian veriosn 3.8.0.0-209

↧

[Custom parameter]

August 26, 2015, 3:02 am

≫ Next: Forecasting with time series analysis environment (WEKA)

≪ Previous: How to create dimension member with name and key ? #

Hello,

I am using BI server 5.4 and I have somme trouble with custom parameter.

I use this javascript code :

Code:

function f(){

    alert('Hello');

    console.log("test");

    return "test";

}

But nothing appear when i execute the dashboard.

↧

Forecasting with time series analysis environment (WEKA)

August 26, 2015, 5:27 am

≫ Next: Forecasting with WEKA

≪ Previous: [Custom parameter]

Hello

I am using WEKA for sales forecast. I saw how to forecast for a class, e.g. Fortified (type of wine in the example). Then it forecasts the amount of wine that will be sold in the next year based on various past data.

I want to forecast sales for several products with specific characteristics (numerical and nominal). My training data has several other products with similar characteristics. I will base my forecast in similar products.

I would like to ask how to input the data set containing the characteristics of the products to be forecasted.

Thanks

↧

Forecasting with WEKA

August 26, 2015, 5:30 am

≫ Next: http://jira.pentaho.com/browse/PDI-13232

≪ Previous: Forecasting with time series analysis environment (WEKA)

Hello,

Is there any difference between forecasting with Time series analysis environment and Weka forecasting plugin ?

Thanks !

↧

http://jira.pentaho.com/browse/PDI-13232

August 26, 2015, 7:35 am

≫ Next: [DASHBOARD] responsive

≪ Previous: Forecasting with WEKA

http://jira.pentaho.com/browse/PDI-13232
is there a resolution for this bug?

Thanks in advance
R

↧

[DASHBOARD] responsive

August 26, 2015, 8:34 am

≫ Next: Slow transformation initialization (5 minutes??)

≪ Previous: http://jira.pentaho.com/browse/PDI-13232

Hello !!

Is it possible to have an example regarding to responsive dashboard ?

Is it possible to download this showcase : http://www.webdetails.pt/pentaho/api...&password=demo ?

Thanks !

↧

Slow transformation initialization (5 minutes??)

August 26, 2015, 12:19 pm

≫ Next: Execute for every input row -> Not execute for every input row

≪ Previous: [DASHBOARD] responsive

When running one of my transformations, I happen to get a VERY slow initialization (~5 minutes) before the transformation actually begins to process rows.

This is only one of my transformations, all the rest run perfectly fine and initialize in a completely reasonable timeframe. The database connections used in this problematic transformation are the same ones used in my non-problematic transformations.

Code:

....

2015/08/26 12:05:58 - populate_fact_employee_status_change - Step [Add constants _true.0] initialized flawlessly.

2015/08/26 12:05:58 - populate_fact_employee_status_change - Step [Database lookup: dim_employee 2.0] initialized flawlessly.

2015/08/26 12:05:58 - populate_fact_employee_status_change - Step [Table output: fact_employee_summary.0] initialized flawlessly.

2015/08/26 12:11:46 - Generate Rows.0 - Starting to run...

2015/08/26 12:11:46 - Bypass Step Check.0 - Starting to run...

2015/08/26 12:11:46 - Dummy (do nothing).0 - Starting to run...

2015/08/26 12:11:46 - Generate Rows.0 - Signaling 'output done' to 1 output rowsets.

.....

Any help in resolving this would be much appreciated!

Thank you

↧

Execute for every input row -> Not execute for every input row

August 26, 2015, 1:00 pm

≫ Next: Copy Rows To Result for 2 levels

≪ Previous: Slow transformation initialization (5 minutes??)

Hello.

I have some dynamic SQL that I needed to parametrize, so I created a job which I "Execute for every input row". I am trying to take the output rows output by every iteration of that job and get it back into one stream. Could someone point me in the right direction?

Thanks.

↧

Copy Rows To Result for 2 levels

August 26, 2015, 8:23 pm

≫ Next: Logging problem in transformations

≪ Previous: Execute for every input row -> Not execute for every input row

Hi,

I have a Job that follows the structure below.

- Job1
--- Job2
------ Transformation1 (Copy Rows to Result)
- Job3 (For Every Row, execute Transformation2)
--- Transformation2 (Get Rows from Result)

This does not seem to work.

However, if instead I do the following, it works just fine.
- Job1
--- Transformation1 (Copy Rows to Result)
- Job 2 (For Every Row, execute Transformation2)
--- Transformation2 (Get Rows from Result)

I think that in the former structure, the results that are copied are "consumed" by Job2 which is why it never gets passed to Job3. Essentially, what I need to do is put a 'Get Rows from Results" and "Copy Rows To Result" step in Job2 so that it is then consumed by Job3.

Any suggestions on how I might be able to achieve this?

↧