detail) streams?

January 6, 2017, 10:35 am

≫ Next: WEKA Experimenter: cross-validation results inconsistent

≪ Previous: National characters not displayed when report is generated as PDF

I fear this is a rudimentary question, but we have spent quite a bit of time investigating to no avail:

We have an flat input of (x) columns. In that input stream, several of the columns represent header information, and the rest are detail information. We want to transform this into the target RDBMS parent / child tables, each of which has an internal primary key.

I *think* we can do this in two steps in Kettle; 1) Use "Unique Rows" to generate a stream of the header data, and write that to the parent table, 2) process all detail rows into the child table with a lookup to go find the PK of the parent table. This feels clunky and not reusable.

Are we missing something? Is there a simple way to split the original stream into parent / child, and have the PK from the RDBMs injected into the stream? Is there a way to use metadata injection to create and "teach" a GENERIC transformation what columns are header and what columns are detail? Has anyone created a 3rd party transformation to address this?

I feel like this is a very common activity for ETL/Dx, which makes me worry we have missing something simple...

All help is appreciated!

↧

WEKA Experimenter: cross-validation results inconsistent

January 6, 2017, 10:53 am

≫ Next: Format mysql output or run script privately to use as parameter at other query

≪ Previous: Transform flat data to parent / child (header / detail) streams?

I've performed various feature selection techniques to obtain several feature subset candidates. I am now using the WEKA Experimenter to perform 10 x 10-fold CV to rank the feature subsets (using the same classifier).

I have noticed that MAE and RAE do not always agree, i.e., the "best" feature subset according to MAE is X whereas the "best" feature subset according to RAE is Y. I would expect MAE and RAE to always agree since RAE is just the MAE normalized by the "variance" of the training data. Does WEKA not use consistent train/test splits when comparing two datasets?

↧

Format mysql output or run script privately to use as parameter at other query

January 6, 2017, 4:34 pm

≫ Next: How to get readable View on file-based Solution Repository

≪ Previous: WEKA Experimenter: cross-validation results inconsistent

I want to create an array from a mysql query so I can use it as a parameter in a mysql query ".... WHERE country IN (${country}) .... "

If I set the output of a mysql query to "country" var, I get an array of arrays

Array (2)
Array [0]
0: US,
Array [1]
1: UK

I can't use this as input as it is because it fails.
I could format the output using a script PostExecution, but that would make it visible to the client, and I want to keep this data inside pentaho.

So I was looking for 2 different approaches:
A) Format the output from the mysql query to make it return a simple array Array(1) -> ("US","UK")
B) To format the output using a script but server-side

Can't find out if any of these approaches are possible. Any ideas?

↧

How to get readable View on file-based Solution Repository

January 7, 2017, 4:34 am

≫ Next: Migration from Pentaho EE to CE

≪ Previous: Format mysql output or run script privately to use as parameter at other query

Hi all,

I try to get familiar with the Solution Repository of Pentaho BI Server.
-> In my understanding this is the place the Server store the files which are shown in the "Browse Files" section.

I use the bi-server in Version 6.1 on windows (direct downloaded version from sourceforge, without configuration changes)

I want to see where the Server store new folders and new files which I uploaded in "Browse Files"-Section, for example this ones:
/public/Arved/
/public/Arved/test.ktr

I find this information:
"Jackrabbit contains the solution repository, examples, security data, and content data from reports that you use Pentaho software to create."

So I looked at: "biserver-ce\pentaho-solutions\system\jackrabbit\repository\repository". It seems that Jackrabbit stores the files in binary way in folder "datastore". Unfortunately so it is hard to comprehend what here is really stored.

Is there a way to make it visible (readable) where the bi-server store this new files?

Thanks for your tips.

Best Regards Arved

↧

Migration from Pentaho EE to CE

January 8, 2017, 2:43 am

≫ Next: Regex....why does this expression work?

≪ Previous: How to get readable View on file-based Solution Repository

Hello,

Can anyone help me for migrating Pentaho EE edition to CE edition in step-by-step process,please?

Thanks in advance

↧

Regex....why does this expression work?

January 8, 2017, 3:50 pm

≫ Next: Error while Connecting Salesforce using Pentaho 6.1

≪ Previous: Migration from Pentaho EE to CE

Hi,

I'm trying to parse 2 dates from a text string. My starting point is a horrific multi-tab excel file whose tab names are like this:

Balance sheet 2015_3_4
Balance sheet 2015_11_12
Balance sheet 2016_9

Why does this work?

(\D+)(\d{4})(_)(\d+)(_)?(\d+)?

Specifically, Why don't I need "?" after each "\d+" i.e. why isn't the regex this:

(\D+)(\d{4})(_)(\d+?)(_)?(\d+?)?

I think what I'm struggling to comprehend here is the need to specify "laziness".

The gist of the source sheet is that there are 2 "amount" columns on each tab. My next level problems will be to construct a pair of full date columns and somehow stitch these together to this pair of "amount" columns and in turn convert that into 1 x date and 1x amount columns..... so if you can interpret that and suggest any tips that would also be handy!

Thanks,

Andy

↧

Error while Connecting Salesforce using Pentaho 6.1

January 8, 2017, 10:21 pm

≫ Next: Mondrian Input vs OLAP input

≪ Previous: Regex....why does this expression work?

I am unable to connect to Salesforce using Pentaho 6.1. Getting the below error.

Error connecting to Salesforce!
; nested exception is:
java.net.ConnectException: Connection refused: connect

I am using the Salesforce Webservice URL as https://login.salesforce.com/services/Soap/u/24.0

Please help!

↧

Mondrian Input vs OLAP input

January 8, 2017, 11:03 pm

≫ Next: How to export a dashboard as Excel Format

≪ Previous: Error while Connecting Salesforce using Pentaho 6.1

Hi!

Im currently working on a setup where some PDI jobs and transformations will be executed from the BI-server scheduler.

In one of my dataflow the initial input step is an MDX query.

Here it seems that i can choose between the Mondrian Input Step and the OLAP input Step in PDI.

Im having a hard time wrapping my head around the differences between the two, and how these differences might be affected by my setup.

It seems as if the OLAp input connects to the BI-server XMLA service, which then processes the query.
Wheras the Mondrian input within PDI itself builds a Mondrian cache, and fetches the data directly form the database specified.

How would this work when executing form within the BI-server? Which would be the best choice?

Thanks!

↧

How to export a dashboard as Excel Format

January 8, 2017, 11:13 pm

≫ Next: Sending the value of a filter/prompt via URL to PUC

≪ Previous: Mondrian Input vs OLAP input

Hi,

I am using Pentaho 5.0.1 stable version and Using Community CTOOLS stable versions.

I Need to export dashboard as Excel format, i used the Button Component and the function is
function f(){ window.print()}
it is work fine but its export as print format i need Excel format.

Is their any way how we can do this.i have tried through toggle but not able to achieved.

↧

Sending the value of a filter/prompt via URL to PUC

January 8, 2017, 11:33 pm

≫ Next: Ideas/Best practices for releasing of PDI code with PDI servers?

≪ Previous: How to export a dashboard as Excel Format

Hello everyone,

I have created a report using Report Designer. It contains a filter, or prompt, on a field (e.g. ID).
I want to set the value of this filter by passing its value as a parameter on the URL to PUC, something like:
http://localhost:8080/pentaho/........./’ID’=ID-value

Is it possible to do this? I.e. to pass a parameter to PUC via URL “without using the GUI” and can Pentaho set this parameter as a value of a filter or prompt?

In this case the report will be filtered according to this ID-value

thanks for your help

↧

Ideas/Best practices for releasing of PDI code with PDI servers?

January 9, 2017, 5:45 am

≫ Next: Run JOB in CDE dashboard by API

≪ Previous: Sending the value of a filter/prompt via URL to PUC

Hi

Short description how we do releasing currently:
1. Developer creates/changes the ktr/kjb files as needed on Dev PDI server.
2. Once ready he copies all the ktr/kjb files to an import/export folder on Dev PDI server.
3. He exports the import/export folder to his local machine as xml.
4. On Test he imports the xml to import/export folder.
5. Then he copies all the files to the same folders they were on Dev.
6. Steps 4 and 5 are to be repeated in all other environments.

The problems with this:
-1 A lot of manual copy/export/import steps where Errors can happen.
-2 No user rights management - risks to make additional Errors.
-3 The PDI server has versioning, but the metadata each version has is limited to only comment. Is it meant to add some release identifier as the comment? Is there any method to extract only the latest files versions with comment "xyz" and then import the same files in other environments in the correct folders.

Ideas/Best practices for releasing of PDI code with PDI servers?

Br,
pj

↧

Run JOB in CDE dashboard by API

January 9, 2017, 6:01 am

≫ Next: Getting error while setting the variable value from Parameter.

≪ Previous: Ideas/Best practices for releasing of PDI code with PDI servers?

Hi Team,

I am looking for the solution to run JOB in Pentaho Dashboards. My problem statement starts with table dynamic columns, where i have to create columns based on filters selected from menu.

I Can achieve the same if i can use getVariable and setVariable steps which needs to be independent, so i am left with only 1 choice to run .kjb.

Please suggest.

Thanks in advance.

↧

Getting error while setting the variable value from Parameter.

January 9, 2017, 6:09 am

≫ Next: IS EMPTY not found in Filter row step

≪ Previous: Run JOB in CDE dashboard by API

Hi,

Can someone help me to why I am getting this sort of error.

XML file [file:///etljobs/wealth-marketing/email-growth/AddMissingAdvisersFromFirmAllocation.ktr]
2017/01/05 00:00:38 - AddMissingAdvisersFromFirmAllocation - Dispatching started for transformation [AddMissingAdvisersFromFirmAllocation]
2017/01/05 00:00:43 - Execute SQL To Add Missing RIs.0 - Finished reading query, closing connection.
2017/01/05 00:00:43 - Execute SQL To Add Missing RIs.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2017/01/05 00:00:43 - Dummy (do nothing).0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2017/01/05 00:00:43 - EmailGrowth - Starting entry [Deduce Email Addresses]
2017/01/05 00:00:43 - Deduce Email Addresses - Loading transformation from XML file [file:///etljobs/wealth-marketing/email-growth\DeduceEmailAddresses.ktr]
2017/01/05 00:00:43 - DeduceEmailAddresses - Dispatching started for transformation [DeduceEmailAddresses]
Continue (enter "y" for yes, "n" for no)?
2017/01/05 00:07:21 - Execute SQL to Deduce email addresses.0 - Finished reading query, closing connection.
2017/01/05 00:07:21 - Execute SQL to Deduce email addresses.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2017/01/05 00:07:21 - Dummy (do nothing).0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2017/01/05 00:07:21 - EmailGrowth - Starting entry [Create output folder if it doesn't exist]
2017/01/05 00:07:21 - Set variables - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : Could not create Folder [/dataload//WETL/Dropzone/Marketing/email-growth/outbound/]
2017/01/05 00:07:21 - Set variables - ERROR (version 5.0.1-stable, build 1 from 2013-11-15_16-08-58 by buildguy) : org.apache.commons.vfs.FileSystemException: Could not create folder "file:///dataload/WETL/Dropzone/Marketing/email-growth".
2017/01/05 00:07:21 - Set variables - at org.apache.commons.vfs.provider.AbstractFileObject.createFolder(Unknown Source)
2017/01/05 00:07:21 - Set variables - at org.apache.commons.vfs.provider.AbstractFileObject.createFolder(Unknown Source)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.entries.createfolder.JobEntryCreateFolder.execute(JobEntryCreateFolder.java:199)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:678)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:815)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:815)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:815)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:815)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:815)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.execute(Job.java:500)
2017/01/05 00:07:21 - Set variables - at org.pentaho.di.job.Job.run(Job.java:407)
2017/01/05 00:07:21 - Set variables - Caused by: org.apache.commons.vfs.FileSystemException: Could not create directory "/dataload/WETL/Dropzone/Marketing/email-growth".
2017/01/05 00:07:21 - Set variables - at org.apache.commons.vfs.provider.local.LocalFile.doCreateFolder(Unknown Source)
2017/01/05 00:07:21 - Set variables - ... 11 more

The solution I have tried but unfortunately not working:

1. Error is not related to the double slash "//" present in the path.

2. In the job I am trying to set this value (/dataload//WETL/Dropzone/Marketing/email-growth/outbound/) to a variable and after few transformation in the job I am checking if the path actually exists or not before loading the data to outbound directory, if not then it creates a directory the the location.

3. At this point I am thinking that this is related to the permission related error while setting the directory at the path.

Can someone help me to sort this thing out ??

-------------------------------
pdi-ce-6.0.1.0-386-stable
java 1.8.0_101
Window 7 (x86_64)
timezone IST
--------------------------

Thanks in Advance,
G_nish

↧

IS EMPTY not found in Filter row step

January 9, 2017, 7:42 am

≫ Next: Set starting point in job with kitchen.bat

≪ Previous: Getting error while setting the variable value from Parameter.

Hi
I an new to PDI. For a client, old server has PDI version 4.4.1. And the new server has 6.1.0. In one of the transformation on the old server, filter row uses IS EMPTY function. But I am not able to re-create the same step since there is no IS EMPTY function to choose from the list for Filter row step. Can I use IS NULL? Is it the same? I have attached the screenshots.
Thanks!

Priya

Attached Images

new server.jpg (17.1 KB)
old server.jpg (17.7 KB)

↧

Set starting point in job with kitchen.bat

January 9, 2017, 9:45 am

≫ Next: Kettle to MuleSoft ESB

≪ Previous: IS EMPTY not found in Filter row step

Hello, I'm running the below command line

kitchen.bat /file:C:\Users\xyz\Documents\ETL\Jobs\XYZJob.kjb /level:Basic

But I'm getting the error:

"Couldn't find starting point in this job."

How do I set a starting point in my job? Is there an argument that I can pass in to kitchen.bat? When I run the job in Spoon it always asks me what the starting point is and the starting point never saves.

Any ideas?

Thanks!!

↧

Kettle to MuleSoft ESB

January 9, 2017, 9:47 am

≫ Next: Red Hat 6 Trusted Sources/libwebkitgtk

≪ Previous: Set starting point in job with kitchen.bat

Pls lemme know if anyone worked on migration from Kettle to Mule ESB ?
Is it a good direction ?

↧

Red Hat 6 Trusted Sources/libwebkitgtk

January 9, 2017, 10:33 am

≫ Next: CBF2 now supporting multiple instances running

≪ Previous: Kettle to MuleSoft ESB

My company will only install rpm's from trusted sources and I am trying to figure out if this is on an alt repo for libwebkitgdk-1.0-0.

a rpm -qa | grep gdk only returns gtk2-2-.24.3-6 (is it in that file?)

If I can figure out which rpm it is in I can extract it to a directory and change my environment to make it run

Else I can not even connect to a repo to work on my old scripts

Thank you

↧

CBF2 now supporting multiple instances running

January 9, 2017, 10:51 am

≫ Next: Loop over Files with mapping?

≪ Previous: Red Hat 6 Trusted Sources/libwebkitgtk

The scenario

Last year I announced CBF2, the biggest, best, coolest way to manage Pentaho projects, available on github. In case you don't recall, it relies on docker to manage the images. Just read about it - really awesome

Since then, we've been using it a lot here. Really helps managing different projects and environments, and it's been put up to test in multiple real world scenarios.

One of the limitations of CBF2 is that it's limited to running one project per machine - since it exposes the ports on the host machine.

Another immediate consequence is that we can't have a local tomcat running since we'd get port conflicts.

The need

However, sometimes it's useful to have running containers side by side. In this case, we wouldn't be able to run these two projects at the same time:

So I guess I have to run these two projects one at a time?

If you tried to run these two it would complain about conflicting ports and the likes of it.

The solution

Turns out Kleyson Rios was less clumsy than I am - so he implemented this feature in CBF2: The ability to have containers running side by side by automagically detecting used ports and just moving on to the next one.

In here you see that both containers successfully ran:

2 projects running side by side

The result

The end result? Pretty cool, I have to admit! I now can run and test different versions side by side on my machine just by using the correct port :)

2 different versions, 5.4 and 7.1, running on my machine with 2 simple commands

This improvement is already committed, so simply pull the latest version and you're ready to go.

Thanks Kleyson! :)

-pedro

More...

↧

Loop over Files with mapping?

January 10, 2017, 12:58 am

≫ Next: kitchen.sh Error on Jarfile

≪ Previous: CBF2 now supporting multiple instances running

Good Morning,

I am actually pretty new to Pentaho Data Integration and what I “achieved” until now is just a bit trial and error within PDI. I am also new in this Forum so please don’t be upset if I overlooked something or did any spelling mistakes as I am no native speaker. But I hope you guys can still understand and help me.

Right now I am trying to generate a new ETL process. I work with excel files just to let you know.
I got like 15 different files for everyday and I need those to be compared to a file with date -1.
E.g. for today I compare the file xxx_10/01/2016 with the file of xxx_09/01/2016

For that I use the Merge Rows step.

My problem is that I don’t want to it to be done with 15 independent jobs that just use a different file as source.

To solve this I tried to get it a bit dynamically so I build a Job that loops over the file names and executes a Job for every Input file.
But my problem is that I don’t know how to do the mapping. How can I prevent the Job to compares file xxx_10/01/2016 with zzz_09/01/2016 v.v.?

Because in my solution I just hand over the path to the folder with the files for today (all 15) and the path to the files for yesterday (all 15)

Right now its just for those 15 Jobs but as it works fine it will be expended to about 300 Files that’s why I can`t do it with single Jobs. Would be awesome if someone got an idea to solve it

Best wishes

↧

kitchen.sh Error on Jarfile

January 10, 2017, 4:04 am

≫ Next: Doing GeoLocation in PDI - Pentaho Data Integration (Kettle)

≪ Previous: Loop over Files with mapping?

I'm trying to execute my main job on a remote server, until tomorrow every tests worked OK, but after tomorrow it started to give this error and it did not started my job execution:

root@DTCVSPTHMJ-01:/pentaho/data-integration# sudo ./kitchen.sh -file="/pentaho/VERSÃO SERVIDOR/TJSP_-_Movjud Solução XML/TJSP_-_JOB - MAQUINA DE EXTRAÇÃO.kjb"

Error: Unable to access jarfile /pentaho/data-integration/launcher/pentaho-application-launcher-6.1.0.1-196.jar
The image of the execution:

pentaho-error.jpg

Someone could help me?

Attached Images

pentaho-error.jpg (11.9 KB)

↧