"RemoveType" Filter when testing model in commandLine

July 30, 2013, 4:24 am

≫ Next: Testing a model that has been trained after PCA

≪ Previous: Exclude some of the columns while exporting to excel in CDA

Hi,

I'm using this command line to train a model:

Code:

java weka.classifiers.meta.FilteredClassifier 

   -t inputs/train.arff     ### Total num of attributes = 4, attribute 1 is string, which is printed in the results, but ignored during training.

   -x 3 -s ${i} -p 1 -distribution 

   -d results/file.model

   -F "weka.filters.MultiFilter  

        -F \" weka.filters.unsupervised.attribute.RemoveType -T string\" 

        -F \"weka.filters.supervised.instance.SMOTE -C 0 -K 5 -P $2 -S 1\"" 

  -W weka.classifiers.functions.MultilayerPerceptron --  -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H 0  > results/output.txt

Now, I want to use this model to test new datasets (which also has 4 attributes, first att is string)

Code:

 java  weka.classifiers.meta.FilteredClassifier       

       -l results/file.model 

       -T inputs/test.arff 

       -p 1 -distribution 

       -F "weka.filters.unsupervised.attribute.RemoveType -T string" 

       -W weka.classifiers.functions.MultilayerPerceptron -- -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H 0 > results/outtest.txt

But I get error of ilegal options. How should this second command line be? So that I can keep my string attribute in the arff

Thanks in advance.

↧

Testing a model that has been trained after PCA

July 30, 2013, 4:33 am

≫ Next: Changing sub-report type (inline or banded)

≪ Previous: "RemoveType" Filter when testing model in commandLine

Hi,

I have a question regarding dimensionality reduction using PCA.

If I have a train set (train.arff, 10 attributes) I perform a PCA and I save my data with respect to the new transformed variables (say I choose the two first attribues, combination of the original ones, that collect most of the variance), and call this transformed trainset "trainset-afterPCA.arff".
Now I train a model using this file (which only has 2 attributes), and save it.

If now I have a new dataset, constructed with the original 10 attributes, and I want to use the model I built before to classify this new data, how do I have to proceed?
1. If I just try to test on this new dataset, train and test are'nt compatible, right?
2. If I ran PCA on the test set, the resulting new attributes won't be the same as the ones obtained in the training set.

What should I do?
:(

Thank you a lot for this useful forum

↧

Changing sub-report type (inline or banded)

July 30, 2013, 6:00 am

≫ Next: Help on MDX query

≪ Previous: Testing a model that has been trained after PCA

Hi,

Just starting to get into using the report designer a bit more and have a query regarding changing the sub-report types as I have created a few reports that I may have chosen the wrong type when inserting some of the sub-reports.

Is it possible to change the type from inline to banded and vice versa? If so how?

Also is there anyway to get to the source of the report without having to unzip the prpt and zip it backup after?

Thankful for any help

Gaz

↧

Help on MDX query

July 30, 2013, 6:16 am

≫ Next: EST to GMT Conversion

≪ Previous: Changing sub-report type (inline or banded)

Hi I have the following query:

Code:

select NON EMPTY {[Measures].[count_X]} ON 0,

NON EMPTY {[DIM_DAYS].Children} ON 1 

from [Cube]

with the result set:

Code:

Axis #0:

{[DIM_STATO_RECORD].[A], [DIM_LOCALITA.LOCALITA].[PI], [DIM_ANNO_PRATICA].[2013]}

Axis #1:

{[Measures].[COUNT_PRATICHE]}

Axis #2:

{[DIM_TEMPO_RILASCIO].[#null]}

{[DIM_TEMPO_RILASCIO].[0]}

{[DIM_TEMPO_RILASCIO].[1]}

{[DIM_TEMPO_RILASCIO].[2]}

{[DIM_TEMPO_RILASCIO].[3]}

{[DIM_TEMPO_RILASCIO].[4]}

{[DIM_TEMPO_RILASCIO].[5]}

{[DIM_TEMPO_RILASCIO].[6]}

{[DIM_TEMPO_RILASCIO].[7]}

{[DIM_TEMPO_RILASCIO].[8]}

{[DIM_TEMPO_RILASCIO].[9]}

{[DIM_TEMPO_RILASCIO].[10]}

{[DIM_TEMPO_RILASCIO].[11]}

{[DIM_TEMPO_RILASCIO].[12]}

{[DIM_TEMPO_RILASCIO].[13]}

{[DIM_TEMPO_RILASCIO].[14]}

{[DIM_TEMPO_RILASCIO].[15]}

{[DIM_TEMPO_RILASCIO].[16]}

{[DIM_TEMPO_RILASCIO].[17]}

{[DIM_TEMPO_RILASCIO].[18]}

{[DIM_TEMPO_RILASCIO].[19]}

{[DIM_TEMPO_RILASCIO].[20]}

{[DIM_TEMPO_RILASCIO].[21]}

{[DIM_TEMPO_RILASCIO].[22]}

{[DIM_TEMPO_RILASCIO].[23]}

{[DIM_TEMPO_RILASCIO].[24]}

{[DIM_TEMPO_RILASCIO].[25]}

{[DIM_TEMPO_RILASCIO].[26]}

{[DIM_TEMPO_RILASCIO].[27]}

{[DIM_TEMPO_RILASCIO].[28]}

{[DIM_TEMPO_RILASCIO].[29]}

{[DIM_TEMPO_RILASCIO].[30]}

{[DIM_TEMPO_RILASCIO].[31]}

{[DIM_TEMPO_RILASCIO].[32]}

{[DIM_TEMPO_RILASCIO].[33]}

{[DIM_TEMPO_RILASCIO].[34]}

{[DIM_TEMPO_RILASCIO].[35]}

{[DIM_TEMPO_RILASCIO].[36]}

{[DIM_TEMPO_RILASCIO].[37]}

{[DIM_TEMPO_RILASCIO].[38]}

{[DIM_TEMPO_RILASCIO].[39]}

{[DIM_TEMPO_RILASCIO].[40]}

{[DIM_TEMPO_RILASCIO].[41]}

{[DIM_TEMPO_RILASCIO].[42]}

{[DIM_TEMPO_RILASCIO].[43]}

{[DIM_TEMPO_RILASCIO].[44]}

{[DIM_TEMPO_RILASCIO].[45]}

{[DIM_TEMPO_RILASCIO].[46]}

{[DIM_TEMPO_RILASCIO].[47]}

{[DIM_TEMPO_RILASCIO].[49]}

{[DIM_TEMPO_RILASCIO].[50]}

{[DIM_TEMPO_RILASCIO].[51]}

{[DIM_TEMPO_RILASCIO].[52]}

{[DIM_TEMPO_RILASCIO].[53]}

{[DIM_TEMPO_RILASCIO].[54]}

{[DIM_TEMPO_RILASCIO].[56]}

{[DIM_TEMPO_RILASCIO].[57]}

{[DIM_TEMPO_RILASCIO].[58]}

{[DIM_TEMPO_RILASCIO].[59]}

{[DIM_TEMPO_RILASCIO].[60]}

{[DIM_TEMPO_RILASCIO].[61]}

{[DIM_TEMPO_RILASCIO].[62]}

{[DIM_TEMPO_RILASCIO].[63]}

{[DIM_TEMPO_RILASCIO].[64]}

{[DIM_TEMPO_RILASCIO].[65]}

{[DIM_TEMPO_RILASCIO].[70]}

{[DIM_TEMPO_RILASCIO].[71]}

{[DIM_TEMPO_RILASCIO].[72]}

{[DIM_TEMPO_RILASCIO].[73]}

{[DIM_TEMPO_RILASCIO].[74]}

{[DIM_TEMPO_RILASCIO].[75]}

{[DIM_TEMPO_RILASCIO].[76]}

{[DIM_TEMPO_RILASCIO].[77]}

{[DIM_TEMPO_RILASCIO].[78]}

{[DIM_TEMPO_RILASCIO].[79]}

{[DIM_TEMPO_RILASCIO].[80]}

{[DIM_TEMPO_RILASCIO].[81]}

{[DIM_TEMPO_RILASCIO].[82]}

{[DIM_TEMPO_RILASCIO].[83]}

{[DIM_TEMPO_RILASCIO].[84]}

{[DIM_TEMPO_RILASCIO].[85]}

{[DIM_TEMPO_RILASCIO].[86]}

{[DIM_TEMPO_RILASCIO].[87]}

{[DIM_TEMPO_RILASCIO].[90]}

{[DIM_TEMPO_RILASCIO].[91]}

{[DIM_TEMPO_RILASCIO].[92]}

{[DIM_TEMPO_RILASCIO].[95]}

{[DIM_TEMPO_RILASCIO].[97]}

{[DIM_TEMPO_RILASCIO].[98]}

{[DIM_TEMPO_RILASCIO].[99]}

{[DIM_TEMPO_RILASCIO].[101]}

{[DIM_TEMPO_RILASCIO].[104]}

{[DIM_TEMPO_RILASCIO].[105]}

{[DIM_TEMPO_RILASCIO].[106]}

{[DIM_TEMPO_RILASCIO].[107]}

{[DIM_TEMPO_RILASCIO].[111]}

{[DIM_TEMPO_RILASCIO].[112]}

{[DIM_TEMPO_RILASCIO].[114]}

{[DIM_TEMPO_RILASCIO].[120]}

{[DIM_TEMPO_RILASCIO].[124]}

{[DIM_TEMPO_RILASCIO].[132]}

{[DIM_TEMPO_RILASCIO].[133]}

{[DIM_TEMPO_RILASCIO].[134]}

{[DIM_TEMPO_RILASCIO].[135]}

{[DIM_TEMPO_RILASCIO].[137]}

{[DIM_TEMPO_RILASCIO].[142]}

{[DIM_TEMPO_RILASCIO].[147]}

{[DIM_TEMPO_RILASCIO].[149]}

{[DIM_TEMPO_RILASCIO].[173]}

Row #0: 468

Row #1: 69

Row #2: 70

Row #3: 53

Row #4: 90

Row #5: 77

Row #6: 78

Row #7: 119

Row #8: 159

Row #9: 155

Row #10: 86

Row #11: 47

Row #12: 66

Row #13: 34

Row #14: 61

Row #15: 42

Row #16: 46

Row #17: 30

Row #18: 24

Row #19: 21

Row #20: 9

Row #21: 29

Row #22: 27

Row #23: 35

Row #24: 34

Row #25: 21

Row #26: 18

Row #27: 22

Row #28: 44

Row #29: 36

Row #30: 35

Row #31: 28

Row #32: 12

Row #33: 8

Row #34: 18

Row #35: 16

Row #36: 41

Row #37: 17

Row #38: 17

Row #39: 18

Row #40: 7

Row #41: 12

Row #42: 12

Row #43: 27

Row #44: 19

Row #45: 9

Row #46: 16

Row #47: 6

Row #48: 2

Row #49: 5

Row #50: 5

Row #51: 5

Row #52: 1

Row #53: 1

Row #54: 5

Row #55: 4

Row #56: 5

Row #57: 3

Row #58: 1

Row #59: 1

Row #60: 4

Row #61: 7

Row #62: 2

Row #63: 3

Row #64: 1

Row #65: 6

Row #66: 2

Row #67: 1

Row #68: 1

Row #69: 1

Row #70: 1

Row #71: 3

Row #72: 3

Row #73: 1

Row #74: 2

Row #75: 1

Row #76: 1

Row #77: 2

Row #78: 3

Row #79: 3

Row #80: 2

Row #81: 1

Row #82: 2

Row #83: 3

Row #84: 1

Row #85: 1

Row #86: 1

Row #87: 3

Row #88: 1

Row #89: 1

Row #90: 1

Row #91: 2

Row #92: 1

Row #93: 1

Row #94: 1

Row #95: 2

Row #96: 1

Row #97: 1

Row #98: 2

Row #99: 1

Row #100: 1

Row #101: 1

Row #102: 1

Row #103: 2

Row #104: 1

Row #105: 1

Row #106: 1

Row #107: 1

Row #108: 1

I'd like to:
1) change the #null label with NOT SPECIFIED
2) make ranges for the days as [1 - 30] [31 - 60]

Could someone help me

↧

EST to GMT Conversion

July 30, 2013, 6:26 am

≫ Next: Change named parameter value

≪ Previous: Help on MDX query

Hi,

i am trying to convert a DateTime column from EST value to GMT (Greenwich Mean Time).
How can i do this?

Thanks for your help.

↧

Change named parameter value

July 30, 2013, 6:35 am

≫ Next: CST 'Access Denied'

≪ Previous: EST to GMT Conversion

I need to change my named parameter during the execution, how to do this?

At the moment I have a named parameter with default value, say

my_named_parameter = 300

and I use it in a table input step:

SELECT * FROM MyTable WHERE id < ${my_named_parameter}

What about doing this:

1 - read an excel file with a single column
2 - check the maximum value in that column, say my_new_value
3 - replace it in the named parameter
4 - so that table input step remain the same but I have a dynamic value every time I execute the transformation

No problem with 1, 2 and 4 step..
But the third one??
How to say: ${my_named_parameter} = my_new_value ? :confused:

↧

CST 'Access Denied'

July 30, 2013, 7:23 am

≫ Next: How to write the HQL to retrieve the records for one specific file for the same table

≪ Previous: Change named parameter value

Hello! I've installed CST as specified here, I've throughouly checked the permissions of the folders involved, and I still get an "Access Denied" error when the tab opens up immediately after login. Does anybody know what might be happening? Thanks in advance.

↧

How to write the HQL to retrieve the records for one specific file for the same table

July 30, 2013, 7:47 am

≫ Next: Where CDA cached queries are written?

≪ Previous: CST 'Access Denied'

Hi, hive experts

When I import data into HIVE, I used the component "Hadoop File Output" to load the data into hdfs://cloudera:cloudera@135.252.31.26:8020/user/hive/warehouse/user/abc.
Note: 1.The option "included date in the file name?" is yes.
2. user is the table name that I created in hive.

So when I run this *.ktr file everyday, I can find a new file like abc_130723 is generated on /user/hive/warehouse/user/ via checking "hadoop fs - /user/hive/warehouse/user, and I can find all the records in all the abc* files are shown when running "select * from user".
Now, I only want to read the records in someday like the abc_130724. Is it doable? Can you share me how to write the HQL?

Thanks/Maria

↧

Where CDA cached queries are written?

July 30, 2013, 8:15 am

≫ Next: TableComponent drill down

≪ Previous: How to write the HQL to retrieve the records for one specific file for the same table

Hi all,
i would like to know where the cda cached queries are memorized (on a file, on a db, etc.). I'd like to avoid to cache my queries from the CDA cache manager because they have too much parameters, so i'm looking for a faster way. :)

thanks in advance.
Marco

↧

TableComponent drill down

July 30, 2013, 8:15 am

≫ Next: Replacing variable in Table Input in SELECT statement

≪ Previous: Where CDA cached queries are written?

Hi, I need you help.

I would say how I can implement a table component CDF for a YQM hierarchy in order to first visualize the total over the year, and then clicking on it the four quarter have been displayed and than the month.

I know how to pass the click event and the listener, but I don't know how to update the same table with the furtether column for the further field and of course the others rows.

I hope someone could help me.

Thanks.

↧

Replacing variable in Table Input in SELECT statement

July 30, 2013, 2:22 pm

≫ Next: New Weka 3.6.10 and 3.7.10 releases

≪ Previous: TableComponent drill down

I am trying to do a transformation that is taking a table input for an id # and using that id # to run lookup on another DB. Unfortunately, the results on the 2nd DB and the table i am looking from doesn't contain the ID # so i want to add a line such as

"123" as id

in the select statement of the 2nd table input: Currently my query looks something like this:

select ? as id, ip_address, date
from table
where name = (select name
from table2
where id = ?
)

When i try to run this in pentaho i get an error, however, if i change it to

select ip_address, date
from table
where name = (select name
from table2
where id = ?
)

then it works just fine. I made sure i am passing the correct number of variables to replace, but for some reason i can't replace variables in the select statement. How would i go about doing this?

↧

New Weka 3.6.10 and 3.7.10 releases

July 30, 2013, 9:03 pm

≫ Next: How to Export ccc2 charts to PDFs in cdf

≪ Previous: Replacing variable in Table Input in SELECT statement

Hi everyone!

New versions of Weka are available for download from the Weka homepage:

* Weka 3.6.10 - stable book 3rd edition version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.7.0_25, Win64 installer, Win64 installer incl. 64 bit JRE 1.7.0_25 and Mac OS X application (both Oracle and Apple JVM versions).

* Weka 3.7.10 - development version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.7.0_25, Win64 installer, Win64 installer incl. 64 bit JRE 1.7.0_25 and Mac OS X application (both Oracle and Apple JVM versions).

Both versions contain a significant number of bug fixes, it is recommended to upgrade to the new versions. Stable Weka 3.6 receives bug fixes only. The development version receives bug fixes and new features.

Weka homepage:
http://www.cs.waikato.ac.nz/~ml/weka/

Pentaho data mining community documentation:
http://wiki.pentaho.com/display/Pent...+Documentation

Packages for Weka>=3.7.2 can be browsed online at:
http://weka.sourceforge.net/packageMetaData/

The Pentaho Weka micro site at http://weka.pentaho.com/ will be updated to reflect the new releases soon.

Note: It might take a while before Sourceforge.net has propagated all the files to its mirrors.

What's new in 3.7.10?

Some highlights
---------------

In core weka:

* HoeffdingTree. Ported from the MOA implementation to a Weka classifier
* MergeInfrequentNominalValues filter
* MergeNominalValues filter. Uses an CHAID-style merging routine
* Zoom facility in the Knowledge Flow
* Epsilon-insensitive and Huber loss functions in SGD
* More CSVLoader improvements
* Class specific IR metric based evaluation in WrapperSubsetEval
* GainRatioAttributeEval now supports instance weights
* New command line option to force batch training mode when the classifier is an incremental one
* LinearRegression is now faster and more memory efficient thanks to a contribution from Sean Daugherty
* CfsSubsetEval can now use multiple CPUs/cores to pre-compute the correlation matrix (speeds up backward searches)
* GreedyStepwise can now evaluate mutliple subsets in parallel

In packages:

* New kernelLogisticRegression package
* New supervisedAttributeScaling package
* New clojureClassifier package
* localOutlierFactor now includes a wrapper classifier that uses the LOF filter
* scatterPlot3D now includes new Java3D libraries for all major platforms
* New IWSS (Incremental Wrapper Subset Selection) package contributed by Pablo Bermejo
* New MODLEM package (rough set theory based rule induction) contributed by Szymon Wojciechowski

As usual, for a complete list of changes refer to the changelogs.

Cheers,
The Weka Team

↧

How to Export ccc2 charts to PDFs in cdf

July 30, 2013, 10:55 pm

≫ Next: Big Number null handling error

≪ Previous: New Weka 3.6.10 and 3.7.10 releases

Hi,

how to export ccc2 charts to pdf in cdf dashboard .please can you help on this.

Thanks,

Savan.

↧

Big Number null handling error

July 30, 2013, 11:10 pm

≫ Next: Error while connecting Remedy AR System

≪ Previous: How to Export ccc2 charts to PDFs in cdf

Hi All,

I'm facing null handling issue. My input is having BigNumber. If this column is null or 0 i want to set value as -99999.

For this i am using user defind java class. there i'm facing below error.
QUOTE_KEY BigNumber : There was a data type error: the data type of java.lang.Integer object [-99999] does not correspond to value meta [BigNumber]

Code:
if(get(Fields.In, "QUOTE_KEY").getBigNumber(r)==null)
{
get(Fields.Out, "QUOTE_KEY").setValue(r, -99999);
}

The same logic is working for string function. Suggestions are welcome.. :)

↧

Error while connecting Remedy AR System

July 31, 2013, 12:14 am

≫ Next: Mapping step (sub-transformation) out of memory

≪ Previous: Big Number null handling error

Hi

while creating connection to Remedy AR System.. I am getting below error...Please help in resolving this error... Also available access type to connect Remedy AR System in pentaho are "ODBC" and "Jndi"..Can't we get jdbc by adding its compatible driver.....

Error connecting to database [Remedy] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database

Error connecting to database: (using class sun.jdbc.odbc.JdbcOdbcDriver)
[Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application

org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database

Error connecting to database: (using class sun.jdbc.odbc.JdbcOdbcDriver)
[Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application

at org.pentaho.di.core.database.Database.normalConnect(Database.java:366)
at org.pentaho.di.core.database.Database.connect(Database.java:315)
at org.pentaho.di.core.database.Database.connect(Database.java:277)
at org.pentaho.di.core.database.Database.connect(Database.java:267)
at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:86)
at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2469)
at org.pentaho.di.ui.core.database.dialog.DatabaseDialog.test(DatabaseDialog.java:120)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizardPage2.test(CreateDatabaseWizardPage2.java:167)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizardPage2$3.widgetSelected(CreateDatabaseWizardPage2.java:156)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.eclipse.jface.window.Window.runEventLoop(Window.java:820)
at org.eclipse.jface.window.Window.open(Window.java:796)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizard.createAndRunDatabaseWizard(CreateDatabaseWizard.java:115)
at org.pentaho.di.ui.spoon.Spoon.createDatabaseWizard(Spoon.java:6706)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:329)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:139)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:123)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem.access$100(JfaceMenuitem.java:26)
at org.pentaho.ui.xul.jface.tags.JfaceMenuitem$1.run(JfaceMenuitem.java:85)
at org.eclipse.jface.action.Action.runWithEvent(Action.java:498)
at org.eclipse.jface.action.ActionContributionItem.handleWidgetSelection(ActionContributionItem.java:545)
at org.eclipse.jface.action.ActionContributionItem.access$2(ActionContributionItem.java:490)
at org.eclipse.jface.action.ActionContributionItem$5.handleEvent(ActionContributionItem.java:402)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1219)
at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7049)
at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:8309)
at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:578)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
Caused by: org.pentaho.di.core.exception.KettleDatabaseException:
Error connecting to database: (using class sun.jdbc.odbc.JdbcOdbcDriver)
[Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application

at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:502)
at org.pentaho.di.core.database.Database.normalConnect(Database.java:350)
... 43 more
Caused by: java.sql.SQLException: [Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application
at sun.jdbc.odbc.JdbcOdbc.createSQLException(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.standardError(Unknown Source)
at sun.jdbc.odbc.JdbcOdbc.SQLDriverConnect(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcConnection.initialize(Unknown Source)
at sun.jdbc.odbc.JdbcOdbcDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
at org.pentaho.di.core.database.Database.connectUsingClass(Database.java:482)
... 44 more

Hostname :
Port :
Database name : AR System ODBC Data Source

↧

Mapping step (sub-transformation) out of memory

July 31, 2013, 12:50 am

≫ Next: kettle logging in the BI Server

≪ Previous: Error while connecting Remedy AR System

Hi,

I´m currently using this step to modularize my transformations so I don´t have to repeat same designs.

Problem is that I´m using a lot of them (maybe 25 Mapping modules) and also working with near of 200 variables. There´s a point when I get out of memory (I´ve declared up to 1GB but I´m just testing with a single row!!) and I wanted to know if it is a question of number of declared modules, or it is caused by all the parameters that are being passed between them, or if memory usage could be reduced somehow.

Any help would be appreciated.

Regards!!

EDIT: Execution thowed a "java.lang.OutOfMemoryError: unable to create new native thread" exception. It´s not my intention to configure Pentaho to avoid this: I see I have to design the transformation in a different fashion. I didn´t know the limitations of Mapping step. It´s a pity :(

↧

kettle logging in the BI Server

July 31, 2013, 1:20 am

≫ Next: Radius of a cluster - K-means

≪ Previous: Mapping step (sub-transformation) out of memory

Hi all,

I am trying to trace down a problem I have with logging.

The problem is that on the BI Server I am logging out to the PENTAHOFILE and CONSOLE appenders (I have not changed the config from the sourceforge download). I find after a certain amount of time the pentaho.log file no longer contains entries but catalina.log continues happily and contains warnings like

log4j:ERROR Attempted to append to closed appender named [PENTAHOFILE].

Restarting the tomcat fixes it, but it re-occurs regularly. After some trial and error I believe it is being caused by running any report which uses kettle as a data source (since I had reports of this nature scheduled through kettle transforms the logging would break regularly).

I can get around this, by using scheduled kettle jobs to import data to a common location and then change the report to use SQL over that location rather than the kettle source over multiple data sources but it would be nice to understand the problem.

Reading around the subject I understand that kettle has its own log4j file in the jar and logging classes which build on log4j (new levels etc) and override some of the behavior defined in the log4j file including forcing a console appender if one cannot be found.

I don't know how to go about proving whether this is right or wrong but my best guess is that for some reason the LogWriter is deciding an appropriate appender cannot be found, creates a new console appender (hence the output to Catalina.out continues happily) and the PENTAHOFILE appender is lost somewhere.

Can anyone with more experience of this suggest anything?

Thanks,
James

↧

Radius of a cluster - K-means

July 31, 2013, 2:17 am

≫ Next: Create a report with a dynamic number of columns and lines

≪ Previous: kettle logging in the BI Server

Hello

I am doing a project on clustering for my engineering course. I need to apply K-means clustering algorithm provided by Weka on the WDBC (Wisconsin Breast Cancer Database). But the SimpleKMeans algorithm under the Clusterer tab is disabled. I want to know how to convert the existing dataset to numerical so that I can apply K-means over it.

Also, I would like to know how to calculate the radius of each cluster once the clustering results are output by the K-means algorithm. I also need to find out the distance of each object from the centroid of the cluster to which it belongs.

Thanks

Shambhavi Joshi

↧

Create a report with a dynamic number of columns and lines

July 31, 2013, 3:15 am

≫ Next: flush API not working as expected

≪ Previous: Radius of a cluster - K-means

Hi everyone

I need to create a report that will display an array.
This array will have some columns (the number depends of a field from my request) and lines (same logic)

I'll give an example. I'm working on a fiancial report. Every columns correspond to an Agency, and every line correspond to a certain type of income / outcome

Here is an example of what I need

Capture.PNG
The blank case are where i'll have the values

The number of agency will evolve in the future, that's why I don't want to write them as "hard code" but prefer to let the request tell me the number

Thanks a lot for your help

EDIT
I changed my request and i have now 1 SQL line per agency with all the datas (line of the array I want) on it
The thing is now, I need that each of my line (= each agency) display as a column, and not as a line (normal behaviour)

The solution I thought of, is to do 1 subreport per agency and then put all the subreport next to each other (I don't really like this solution, I think it's dirty)

Any other idea??

Attached Images

Capture.PNG (9.6 KB)

↧

flush API not working as expected

July 31, 2013, 3:18 am

≫ Next: Running jobs on PDI Dynamic Clusters

≪ Previous: Create a report with a dynamic number of columns and lines

I am trying to use the flush API to clear the dimension cache -

Cube MetricsCube = rcon.getSchema().lookupCube("Metrics USD", true);
SchemaReader schemaReader = MetricsCube.getSchemaReader(null).withLocus();
Member memberTimeJul22 = schemaReader.getMemberByUniqueName(Id.Segment.toList("Time", "2013","Jul2013"),true);
final CacheControl cacheControl = rcon.getCacheControl(null);
CacheControl.MemberSet regionTime = cacheControl.createMemberSet(memberTimeJul22, false);
cacheControl.flush(regionTime);

But this call does not seem to clear the dimension cache. I am still seeing the stale data in the saiku UI

↧