Hanging Transformation

April 13, 2016, 11:57 am

≫ Next: Dynamic and Scalable : Pentaho 6.1 has arrived!

≪ Previous: run transformation kettle for each row xml data dynamically

Situation:

2 input steps grab csv files.

1 csv file has 5000 rows

Another step has 98000 rows

When I try to do a left join, with the 5000 row table being on the left side, it says "running" but nothing happens, even after 10 minutes. It just hangs there doing nothing.

Do you know why? This did not happen until I used a formula to retrieve the last 10 digits of a number.

↧

Dynamic and Scalable : Pentaho 6.1 has arrived!

April 13, 2016, 12:00 pm

≫ Next: referencing undefined (dynamic) fields due to query scripting

≪ Previous: Hanging Transformation

Hello Kettle and Pentaho fans!
Yes indeed we’ve got another present for you in the form of a new Pentaho release: version 6.1
This predictable steady flow of releases has in my opinion pushed the popularity of PDI/Kettle over the years so it’s great that we manage to keep this up.

The image above shows the evolution of PDI download counts over the years on SourceForge only.

There’s actually a ton of really nice stuff to be found in version 6.1 so for a more complete recap I’m going to refer to my friend and PDI product manager Jens on his blog.
However, there are a few favorite PDI topics I would like to highlight…
Dynamic ETL
Doing dynamic ETL has been on my mind for a long time. In fact, we started working on this idea in the summer of 2010 to have something to show for at the Pentaho Community Meetup of that year in the beautiful Cascais (Portugal). Back then I remember getting a lot of blank stares and incomprehensive grunts from the audience when I presented the idea. However, the last couple of years dynamic ETL (or ETL metadata injection) has been a tremendous driver for solving the really complex cases out there in many areas like Big Data, Data Ingestion and archiving, IoT and many more. For a short video explaining a few driving principles behind the concept see here:

More comprehensive material on the topic can be found here.
Well in any case I’m really happy to see us keeping up the continued investment to make metadata injection better and more widely supported. So in version 6.1 we’re adding support for a bunch of new steps including:

Stream Lookup (!!!)
S3 Input and Output
Metadata Injection (try to keep your sanity while you’re wrestling with this recursive puzzle)
Excel Output
XML Output
Value Mapper
Google Analytics

It’s really nice to see these new improvements drive solutions across the Pentaho stack, helping out with Streamlined Data Refinery, auto-modeling and much more. Jens has a tutorial on his blog with step by step instructions so make sure to check it out!
Data Services
PDI data services is another of these core technologies which frankly take time to mature and be accepted by the larger Pentaho community. However, I strongly feel like these technologies make such a big difference with what anyone else in the DI/ETL market is doing. In this case, simple being able to run standard SQL on a Kettle transformation is a game changer. As you can tell I’m very happy to see the following advances being piled on the improvements of the last couple of releases:

Extra Parameter Pushdown Optimization for Data Services – You can improve the performance of your Pentaho data service through the new Parameter Pushdown optimization technique. This technique is helpful if your transformation contains any step that should be optimized, including input steps like REST where a parameter in the URL could limit the results returned by a web service.
Driver Download for Data Services in Pentaho Data Integration – When connecting to a Pentaho Data Service from a non-Pentaho tool, you previously needed to manually download a Pentaho Data Service driver and install it. Now in 6.1, you can use the Driver Details dialog in Pentaho Data Integration to download the driver.
Pentaho Data Service as a Build Model Source Edit section – You can use a Pentaho Data Service as the source in your Build Model job entry, which streamlines the ability to generate data models when you are working with virtual tables.

Virtual Data Sets overview in 6.1

Other noteworthy PDI improvements
As always, the change-list for even the point releases like 6.1 is rather large but I just wanted to pick 2 improvements that I really like:

JSON Input: we made it a lot faster and the step can now handle large files (hundreds of MBs) with 100% backward compatibility
The transformation and job execution dialogs have been cleaned up!

The new run dialog in 6.1

I hope you’re all as excited as I am to see these improvements release after release after release…
As usual, please keep giving us feedback on the forums or through our JIRA case tracking system. This helps us to keep our software stable in an ever changing ICT landscape.
Cheers,
Matt

More...

↧

referencing undefined (dynamic) fields due to query scripting

April 13, 2016, 1:18 pm

≫ Next: User defined java expression problem

≪ Previous: Dynamic and Scalable : Pentaho 6.1 has arrived!

I searched for a bit but couldn't find much, so I'll ask here....I have a report (PRD v6.0) but my query is NOT static, it has parameters so that the SQL is built in the "query scripting" tab. In doing it this way, I do not have my fields available in the drop-downs for certain things, one of them being for the "running sum" function. How do you perform a running sum when the fields are not being shown? Every sample report I see has the fields available to select due to having a static query. Any tips on how to handle dynamic queries/fields would be helpful!

Thanks in advance for any help,
Wes

↧

User defined java expression problem

April 13, 2016, 2:04 pm

≫ Next: PDI 5.4 kitchen.bat to overwrite log file instead of appending it.

≪ Previous: referencing undefined (dynamic) fields due to query scripting

Hi,

I have got following problem: in my test transformation I defined java expression ( new java.text.SimpleDateFormat( "yyyy/MM/dd HH:mm:ss.SSS" ) ).parse( last_update.substring(0,23)) to convert data string like this '2016/04/13 21:48:35.000000000' to the date variable. When I run job (specified transformation is a part of this job) using kitchen all works fine, but when I run the same job using my program I have got following error

2016/04/14 00:23:57 - User Defined Java Expression.0 - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : Unexpected error
2016/04/14 00:23:57 - User Defined Java Expression.0 - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : org.pentaho.di.core.exception.KettleException:
2016/04/14 00:23:57 - User Defined Java Expression.0 - org.pentaho.di.core.exception.KettleValueException:
2016/04/14 00:23:57 - User Defined Java Expression.0 - java.lang.StringIndexOutOfBoundsException: String index out of range: 3
2016/04/14 00:23:57 - User Defined Java Expression.0 - String index out of range: 3

What's wrong? job and child transformations please see in attach

Best regards,
Mikhail

Attached Files

↧

PDI 5.4 kitchen.bat to overwrite log file instead of appending it.

April 13, 2016, 2:33 pm

≫ Next: Pentaho Platform User-Console Code Build

≪ Previous: User defined java expression problem

Hi

We have been running 4.4.2 for a long time and now upgraded to 5.4. We use windows scheduler to run our jobs and found that when -log is used to specify log file name in command line is appending the log info instead of overwriting the log file for each run.

Is this normal? How do I switch it back if it's possible?

Thanks

↧

Pentaho Platform User-Console Code Build

April 14, 2016, 12:07 am

≫ Next: Calculated Member

≪ Previous: PDI 5.4 kitchen.bat to overwrite log file instead of appending it.

Hi,

I had cloned pentaho-platform from github. Been trying to build the source code.
Other projects got build successful, but when trying to build the pentaho-user-console, I got failed.

I have downloaded pentaho-platform 5.4 branch, and when trying to build it using
ant clean-all resolve create-dot-classpath publish-local, I've got this on console:

Quote:

create-dot-classpath:

BUILD FAILED
D:\PROJECTS\platform\pentaho-platform\user-console\build-res\subfloor.xml:2058:
Problem: failed to create task or type dot-classpath
Cause: The name is undefined.
Action: Check the spelling.
Action: Check that any custom tasks/types have been declared.
Action: Check that any <presetdef>/<macrodef> declarations have taken place.

Please help me figure out how to build this one.

Cheers!

↧

Calculated Member

April 14, 2016, 4:41 am

≫ Next: HTTP Client

≪ Previous: Pentaho Platform User-Console Code Build

Hi,

If I have a measure A which is divided into A1, A2 and A3 by an X dimension
I would find a way to create a calculated measure set a dimension member (Ex. I would take only A2 set X2). I can do it?
Thanks in advance

↧

HTTP Client

April 14, 2016, 6:29 am

≫ Next: Wanting to setup PDI as a process running as service - any info around implementation

≪ Previous: Calculated Member

Can someone tell what step I need following a HTTP Client step that returns a csv file in the result? I want to be able to do some transformations on the result after it's received from the HTTP Client step, but I can't figure out what step I need to get the result into a usable format. Appreciate it.

↧

Wanting to setup PDI as a process running as service - any info around implementation

April 14, 2016, 9:41 am

≫ Next: Absolute Newbie question! How do I get started???

≪ Previous: HTTP Client

I'd like to set up PDI to run as a service where the transform would stay resident in memory.

We have a large lookup file over 100M records. It takes around 10 min to load the reference file each time.

Is it possible to have PDI running in memory running the transform with the lookup pre-loaded?

The idea would be to drop files into a directory and have the transform pick them up periodically and process then looking up the key value in the pre-loaded lookup stream.

Any suggestions on how to approach this?

Is it even possible to create a Job or Transform that would continually run like this essentially creating a service?

↧

Absolute Newbie question! How do I get started???

April 14, 2016, 9:44 am

≫ Next: open reports in pdf by default

≪ Previous: Wanting to setup PDI as a process running as service - any info around implementation

Believe it or not, I actually have 35 years in the computer industry ... and can install a Linux/Apache/MySQL/PHP stack in my sleep (installed my first Linux server in '95) ... I've got both Linux and Windows servers available to me as well as my windows desktoop ... although both that I would be inclined to use already have a webserver installed ... PHP on Linux, ASPx/IIS on Windows.

I downloaded the manual and was greeted with a WAR file ... I had never heard of a WAR file before ... so that didn't help a lot. A quick google and I found out that it was a compressed file that could be extracted with 7-Zip ... and I had 7-Zip installed on my workstation ... so that was fairly easy ... but even with Java installed on the windows server, I couldn't get jsp pages to show me anything. (I told you it was a newbie question!)

So now that I feel like a freshman in University again ... can someone please point me to a fairly straightforward guide to getting a Java server up and running so I can read the manual ... even better would be a guide that gets pentaho up and running itself ...

Alternately a VirtualBox installation of the 6.0.1 community edition would be a solution that I would be very open to using ... as the data I'm working with is mostly spreadsheets that need to be pulled together in a consistent way periodically (quarterly or less) ... so I don't even need the server running constantly. My comfort zone would be a CentOS or RedHat 6.x OS ... or Windows ... but then there is the implicit need for a Windows License ... I just haven't moved up to the new file layout in 7.x (old school and proud of it!)! ;-)

Thanks in advance for your assistance.

↧

open reports in pdf by default

April 14, 2016, 9:50 am

≫ Next: upgrade the Salesforce Plugins to support version >=32 of the Salesforce API

≪ Previous: Absolute Newbie question! How do I get started???

Hi there, does anyone know how to open a report by default in pdf mode ?
Your help is greatly appreciated !

↧

upgrade the Salesforce Plugins to support version >=32 of the Salesforce API

April 14, 2016, 10:23 am

≫ Next: PLEASE fix Execute SQL statement escaping

≪ Previous: open reports in pdf by default

Version of API is calling version 21

Salesforce has upgraded all production services to the latest API version 35, which comes with the "Spring `16" release.

Please update the PDI/Kettle Salesforce plugins to support the latest API version.

I read in other thread about getting WSDL in salesforce page and build a new jar to replace, I have a new jar but it is not working. Is there any doc of how to?

↧

PLEASE fix Execute SQL statement escaping

April 14, 2016, 10:25 am

≫ Next: Pentaho import.sh replaces the PDI Database connections and attributes

≪ Previous: upgrade the Salesforce Plugins to support version >=32 of the Salesforce API

Just making a comment --

There are two 'execute SQL statement' functions --- one as a job entry, one as a transformation step.

The job entry one as far as I know hardly escapes anything ... single quotes, double hyphens ... it's extremely difficult to roll your own logging system this way.

But even the transformation step 'Execute SQL statement' -- say you want to write log text/ error text to a database.

Well, it turns out that this step doesn't comment out a double hyphen (aka "--") which comments out a line in sql.

Even if the double hyphen is surrounded by quotes, which actually makes it a string, not an operator, in SQL server and MySQL --- Pentaho Spoon will still treat it at a line comment operator. This is a glitch, plain and simple. Any help appreciated --- I have not been able to find a suitable workaround, other than sticking to physical log file documents, which is a pain. Thanks.

↧

Pentaho import.sh replaces the PDI Database connections and attributes

April 14, 2016, 10:27 am

≫ Next: 0437 MSDOS United States Equivalent encoding in Pentaho

≪ Previous: PLEASE fix Execute SQL statement escaping

We are using Pentaho Database repository (version 5.1).
We have two different environments DEV, QA. Both the environments have same database connection names but they point to different servers/databases. I am trying to export a complete PDI folder into XML from DEV and then import the *.xml file into QA.

When I use import.sh to import the xml file. All the database connections (i.e. the database definitions in the r_database table and database attributes in r_database_attributes table) in QA repository are replaced by the DB connection definitions from DEV.

I there anyway I can import the xml file into QA from DEV using import.sh without replacing the QA database conncetions (r_database , r_database_attributes) with that of DEV.

↧

0437 MSDOS United States Equivalent encoding in Pentaho

April 14, 2016, 3:36 pm

≫ Next: Fail to run 'Transforming Data with Pig' example

≪ Previous: Pentaho import.sh replaces the PDI Database connections and attributes

Hi Experts,

I need a help for the equivalent encoding of "0437 MSDOS United States" in Pentaho.Could any one help me on this?

Thanks-

↧

Fail to run 'Transforming Data with Pig' example

April 14, 2016, 7:47 pm

≫ Next: New Weka 3.6.14, 3.8.0 and 3.9.0 releases!

≪ Previous: 0437 MSDOS United States Equivalent encoding in Pentaho

Hi everyone,

I follow the steps in the example " Transforming Data with Pig" at: http://wiki.pentaho.com/display/BAD/...+Data+with+Pig.
But, when i ran the example, it work fail.
When i comment out the line: STORE weblog_count INTO '/user/pdi/weblogs/aggregate_pig/'; in the aggregate_pig.pig script, it work without any error.
I don't know why that line of code cause problem.
I use Pentaho data-integration-5.2.0.0 with hortonwork sandbox 2.1.

Please help me to solve this problem.

many thanks,

↧

New Weka 3.6.14, 3.8.0 and 3.9.0 releases!

April 14, 2016, 8:32 pm

≫ Next: Named parameter arrangement in spoon

≪ Previous: Fail to run 'Transforming Data with Pig' example

Hi everyone!

New versions of Weka are available for download from the Weka homepage:

* Weka 3.8.0 - stable version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.8.0_77, Win64 installer, Win64 installer incl. 64 bit JRE 1.8.0_77 and Mac OS X application with Oracle 64 bit JRE 1.8.0_77.

* Weka 3.9.0 - development version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.8.0_77, Win64 installer, Win64 installer incl. 64 bit JRE 1.8.0_77 and Mac OS X application with Oracle 64 bit JRE 1.8.0_77.

* Weka 3.6.14 - stable book version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.8.0_77, Win64 installer, Win64 installer incl. 64 bit JRE 1.8.0_77 and Mac OS X application with Oracle 64 bit JRE 1.8.0_77).

Stable Weka 3.6 and 3.8 receive bug fixes only. The development version receives bug fixes and new features.

3.8.0 and 3.9.0 are the first second digit version increases since stable 3.6 was released in 2008! At this point there is no functional difference between 3.8.0 and 3.9.0 - 3.7 has been branched to create 3.8 and development of core Weka will continue in 3.9. We feel that the package management system is a nice mechanism for allowing stable Weka to be extended with new features, while at the same time maintaining a stable core.

NOTE 1: Users of Weka 3.6 will find that serialized models created in 3.6 cannot be used in 3.8. Unfortunately, there is no workaround for this. Models will need to be recreated in Weka 3.8. Similarly, developers using 3.6 will find that there are some small changes that they need to make to their code in order to compile against 3.8. A quick check of the javadoc for 3.8 will hopefully show what is necessary.

NOTE 2: We have changed the default look and feel in Weka 3.8 and 3.9 to "Nimbus". We feel that this looks reasonable under the three main OS's. Furthermore, it is more performant under Mac OS X than the default Aqua LAF - we found that the Explorer's list of attributes becomes very slow to update on datasets with a large number of attributes when using the Aqua LAF on OS X. From the GUIChooser you can alter the LAF by selecting "Settings" from the "Program" menu (a restart will be required if the LAF is changed).

NOTE 3: When upgrading to Weka 3.8.0/3.9.0, Users of Weka 3.7.x may notice some exceptions thrown in the console relating to the package manager. To make these go away simply delete the installedPackageCache.ser file in ~/wekafiles/packages and then restart Weka.

Weka homepage:
http://www.cs.waikato.ac.nz/~ml/weka/

Pentaho data mining community documentation:
http://wiki.pentaho.com/display/Pent...+Documentation

Packages for Weka>=3.7.2 can be browsed online at:
http://weka.sourceforge.net/packageMetaData/

The Pentaho Weka micro site at http://weka.pentaho.com/ will be updated to reflect the new releases soon.

Note: It might take a while before Sourceforge.net has propagated all the files to its mirrors.

What's new in 3.8.0/3.9.0 compared to Weka 3.7.13?

Some highlights
---------------

In core weka:

* JAMA-based linear algebra routines replaced with MTJ. Faster operation with the option to use native libraries for even more speed
* General efficiency improvements in core, filters and some classifiers
* GaussianProcesses now handles instance weights
* New Knowledge Flow implementation. Engine completely rewritten from scratch with a simplified API
* New Workbench GUI
* GUI package manager now has a search facility
* FixedDictionaryStringToWordVector filter allows the use of an external dictionary for vectorization. DictionarySaver converter can be used to create a dictionary file

In packages:

* Packages that were using JAMA are now using MTJ
* New netlibNativeOSX, netlibNativeWindows and netlibNativeLinux packages providing native reference implementations (and system-optimized implementation in the case of OSX) of BLAS, LAPACK and ARPACK linear algebra
* New elasticNet package, courtesy of Nikhil Kinshore
* New niftiLoader package for loading a directory with MIR data in NIfTI format into Weka
* New percentageErrorMetrics package - provides plugin evaluation metrics for root mean square percentage error and mean absolute percentage error
* New iterativeAbsoluteErrorRegression package - provides a meta learner that fits a regression model to minimize absolute error
* New largeScaleKernelLearning package - contains filters for large-scale kernel-based learning
* discriminantAnalysis package now contains an implementation for LDA and QDA
* New Knowledge Flow component implementations in various packages
* newKnowledgeFlowStepExamples package - contains code examples for new Knowledge Flow API discussion in the Weka Manual
* RPlugin updated to latest version of MLR
* scatterPlot3D and associationRulesVisualizer packages updated with latest Java 3D libraries
* Support for pluggable activation functions in the multiLayerPerceptrons package

As usual, for a complete list of changes refer to the changelogs.

Cheers,
The Weka Team

↧

Named parameter arrangement in spoon

April 14, 2016, 10:41 pm

≫ Next: PDI 6.0.1 kitchen.sh not returning proper exit code on Ubuntu 14.04, but does on OSX

≪ Previous: New Weka 3.6.14, 3.8.0 and 3.9.0 releases!

Hello,

Is there any specific reason why the named parameters get always rearranged in alphabetical order after saving the job/transformation, even if creating/arranging in different sequence?
We use a scheduling tool, where business users can pass parameter value from GUI. So, it would be very helpful, if developers could arrange all relevant parameters consecutively for ease of usage. If the parameter window allows "Move up/Move down", then isn't it expected to retain the sequence?

I am missing something big time I guess; but need help in figuring out the logic behind.

Thank you for any help you can provide.

Regards,
Mallika

↧

PDI 6.0.1 kitchen.sh not returning proper exit code on Ubuntu 14.04, but does on OSX

April 15, 2016, 1:16 am

≫ Next: Configure JAVA_HOME environment variale correct

≪ Previous: Named parameter arrangement in spoon

Hi all,

I'm running PDI 6.0.1 and can't get kitchen.sh to return a code/exit code on Ubuntu 14.04. It runs fine on my Mac OSX. I use Java 1.7 in both situations.

When running kitchen.sh without any parameters on my Mac I get exit code 9. When doing the same on Ubuntu 14.04 the exit code is 0.

Any idea as to why this happens?

Kind regards,

Eric

↧

Configure JAVA_HOME environment variale correct

April 15, 2016, 2:36 am

≫ Next: Problems for help in Building PDI 6.0.1.0

≪ Previous: PDI 6.0.1 kitchen.sh not returning proper exit code on Ubuntu 14.04, but does on OSX

Hi,

I'm trying to install the biserver-ce-6.1.0.1-196 version.
I'm running Windows 8.1. And I want to run it local.
My assumption is that I don't need requirements like a running Tomcat or MySQL-DB beforehand. Is that right? Cause I didn't find any hint for that. I thought the installation would set up it's own services...
I seem not to configure the environmental variable for JAVA_HOME correctly. Cause when I run the "start-pentaho" batch the shell pops up for just a moment and then it closes. I couldn't make a screen from that (too fast). But I read sth like the environmental variable is not defined correctly.
My JRE-path is C:\Program Files (x86)\Java\jre1.8.0_77
I tried to set the environmental variable "JAVA_HOME" as a system variable with the path "C:\Program Files (x86)\Java\jre1.8.0_77" (without the commentation marks).

So then I tried to start the tomcat from biserver-ce first. Like the wiki said: http://wiki.pentaho.com/display/Serv...indows+service
There was the same problem.

For more information: I only have the JRE 1.8 above (no JDK or sth else)
I looked up the jvm.dll in C:\Program Files (x86)\Java\jre1.8.0_77 and only found client\jvm.dll no server\jvm.dll
(But in the service batch it says it would take either)

So that's why I'm stuck. I probably don't define the environmental variable right...

Pls help!

Thanks in advance.

↧