Reading a multiline field from AR Input

January 18, 2016, 8:11 am

≫ Next: org.pentaho.di.core.exception.KettleException: Cannot find repository "reportdev"

≪ Previous: Pentaho Data Integration (Which Linux Flavor and Version is appropriate).

Hello Pentaho Community,

I encounter a problem when I try to read an AR form with a field with multiple lines within.
It looks like that on the form :

Quote:

Site: Place ThePlace
InstanceId: OI-randomchars
Room: R1 RM34

When I read this field on Pentaho, no matter which step I use, like regex manipulation, javascript or row operations... nothing works !
My output is a text file with semicolon separator and when convert on Excel, it scrambles the results and is inexploitable.
If I use an Excel file as output, it does good, but when I use it as input, it does the same bad thing.

Is there by any chance an explanation to how exploit this multiple line field ?
Or maybe Pentaho is not made to do such things ?

I have tried lot of things and nothing worked so far.
Thanks in advance for all the consideration about this.

↧

org.pentaho.di.core.exception.KettleException: Cannot find repository "reportdev"

January 18, 2016, 8:11 am

≫ Next: How to go to the previous directory by using {Internal.Job.Filename.Directory} ?

≪ Previous: Reading a multiline field from AR Input

I'm unable to connect to one of my repositories with DI 5.4. repositoriesMeta is returning null when I try to find it by name.
Code used:
final RepositoriesMeta repositoriesMeta = new RepositoriesMeta();
repositoriesMeta.readData();
final RepositoryMeta repositoryMeta = repositoriesMeta.findRepository(repositoryName);

Here's what my repositories.xml looks like. It finds the file repository just fine, but it doesn't seem to find the enterprise repository. I'm able to connect to that repository with spoon, so I know it is there. Is there something I'm missing?

<repositories>
<repository> <id>PentahoEnterpriseRepository</id>
<name>reportdev</name>
<description>reportdev</description>
<repository_location_url>http://10.32.18.41:9080/pentaho-di</repository_location_url>
<version_comment_mandatory>N</version_comment_mandatory>
</repository>
<repository> <id>KettleFileRepository</id>
<name>pentahoTransforms</name>
<description>pentahoTransforms</description>
<base_directory>/opt/verdeeco/sys/core/current/pentaho/pentahoTransforms</base_directory>
<read_only>N</read_only>
<hides_hidden_files>N</hides_hidden_files>
</repository>
</repositories>

↧

How to go to the previous directory by using {Internal.Job.Filename.Directory} ?

January 18, 2016, 9:15 am

≫ Next: Pentaho 6 startup Error (Mysql DB)

≪ Previous: org.pentaho.di.core.exception.KettleException: Cannot find repository "reportdev"

How to go to the previous directory by using {Internal.Job.Filename.Directory} .... for example if the {Internal.Job.Filename.Directory} is C:\Pentaho_Lab\Test\ETL . so how I can store some file to the previous directory which is C:\Pentaho_Lab\Test by using the variable {Internal.Job.Filename.Directory}

↧

Pentaho 6 startup Error (Mysql DB)

January 18, 2016, 9:58 am

≫ Next: Find first non-null value scanning rows backwards from current row

≪ Previous: How to go to the previous directory by using {Internal.Job.Filename.Directory} ?

Hello there,
Need help with Pentaho 6. My environment is as follow:

Windows 7 64 bits
Oracle JDK 1.7.79 (64 bits)
MySQL version 5.6.26

Sample error appears in pentaho.log (i attached all log files in /tomcat/logs folder .logs.zip . Thanks in advance

2016-01-19 01:35:45,789 ERROR [org.apache.felix.configadmin.1.8.0] [[org.osgi.service.cm.ConfigurationAdmin]]Cannot use configuration org.pentaho.requirejs for [org.osgi.service.cm.ManagedService, id=550, bundle=187/mvn:pentaho/pentaho-requirejs-osgi-manager/6.0.1.0-386]: No visibility to configuration bound to mvn:pentaho/pentaho-server-bundle/6.0.1.0-386
2016-01-19 01:36:24,074 ERROR [org.pentaho.platform.repository2.unified.BackingRepositoryLifecycleManagerSystemListener]
org.pentaho.platform.api.engine.security.userroledao.AlreadyExistsException:
at org.pentaho.platform.security.userroledao.jackrabbit.JcrUserRoleDao.createRole(JcrUserRoleDao.java:123)
at org.pentaho.platform.repository2.mt.RepositoryTenantManager.createTenant(RepositoryTenantManager.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

Attached Files

logs.zip (23.9 KB)

↧

Find first non-null value scanning rows backwards from current row

January 18, 2016, 11:36 am

≫ Next: Pentaho Reporting

≪ Previous: Pentaho 6 startup Error (Mysql DB)

Intro

Hi there, this would be my first post on Pentaho Forums. I've been using PDI for some time now but the issue I'm facing now hasn't been yet approached by me in Kettle before. Actually, I haven't even stumbled upon such a thing in the past.

Software

I'm using Pentaho Data Integration 5.4

Input data & explanation

Input data from a file (simplified, there are more columns):

Code:

    number       name

    1009       ProductA

    2150       ProductB

    3235       ProductC

               ProductD

               ProductE

    1234       ProductF

    7765       ProductG

    4566       ProductH

               ProductI

    9907       ProductJ

The issue is that I had an Excel file format xlsx which has the data with merged cells, and for one value of id there are 1..n rows of values.

After converting that file to csv values for next rows (other than first) are missing, despite the one column which was not merged (see example id=3, id=6).

I'm generating a sequence using step Add sequence, the input is sorted the way it was originally stored in a file.

Steps to achieve the goal

Basically what I need to do is:

Find first non-null value that has sequence_number less than current_row.sequence_number
Concatenate the value from field name to that matching row
Keep scanning next rows with sequence_number higher than the last scanned

As stated before, there can be 1..n rows of values for such case.

Expected output

Code:

    number       name

    1009       ProductA

    2150       ProductB

    3235       ProductC; ProductD; ProductE

    1234       ProductF

    7765       ProductG

    4566       ProductH; ProductI

    9907       ProductJ

My approach

I believe I'm able to do this in a loop, by using Analytic Query and calculating LAG(1) and then concatenating the column name for one row with null values and discarding other column values from null row - and then doing this in a loop (for like 20 times assuming this is maximum), but I do consider this a bad idea.

There are probably better ways to achieve this result using for example Modified Java Script Value step with scanning the rows backward from current (based on sequence number), but I'm unaware of those functions, if they do exist.

How can I achieve this using Modified Java Script Value step, or any other efficient way without using a loop for entire content of the file until there are no empty rows?

As an additional question,
Is there any place where there are docummented special functions for Java Script step that Pentaho uses? It'd probably be a lot easier if only I knew what I can do with existing functions whose existance I'm unaware of for now.

↧

Pentaho Reporting

January 18, 2016, 8:47 pm

≫ Next: Define measures on basis of dimensions

≪ Previous: Find first non-null value scanning rows backwards from current row

Hi,
I have a scenario where I need to group 2 types of line graphs with different line types i.e one with solid and one with dot in one chart. Can you please illustrate a method to do so.

↧

Define measures on basis of dimensions

January 18, 2016, 9:14 pm

≫ Next: Saiku OLAP Wizard error

≪ Previous: Pentaho Reporting

How do I define measures on basis of dimensions. Eg:- I want to disable specific measure on selection of certain dimension. How to achieve this in Schema Workbench?

↧

Saiku OLAP Wizard error

January 18, 2016, 10:41 pm

≫ Next: CDE template and CSS

≪ Previous: Define measures on basis of dimensions

hello all,
I am working with pentaho bi server 5.4. i was trying to use "Saiku OLAP Wizard" but got following error:

"Failed
No class registered for id saiku-ui

Server Version: Pentaho Open Source BA Server 5.4.0.1-130"
screenhot of error is also attached
saiku error.jpg

How can i resolve this? Do I need to install any plugin? if yes where can i get this.
Thankyou
Preeti

Attached Images

saiku error.jpg (11.9 KB)

↧

CDE template and CSS

January 19, 2016, 12:21 am

≫ Next: KBinfo measure differs with same training data

≪ Previous: Saiku OLAP Wizard error

Hello all,

I encounter a little problem. I have created a nice dashboard and i saved it as a template so i can apply it for other departments.
This all works fine(data is correct and functionality stays the same).
My new dashboards listen to the CSS but its totally different from the original Dashboard
This means my lay-out is totally destroyed. I tried everything with rights and css in different folders but nothing seems to work.
Does anybody have any idea?

Kind Regards,

↧

KBinfo measure differs with same training data

January 19, 2016, 12:42 am

≫ Next: Replace in String step in PDI leads to the issue - java.lang.IllegalArgumentException

≪ Previous: CDE template and CSS

Hello,

I've got a problem with my RBF network. I thought I could use Kononenko & Bratko index (KBInfo) to get more accuracy on the final prediction. The problem is that KBInfo varies from run to run and I don't understand why.

For example, with a training file with 17000 instances I initially had a KBInfo of -0.89 and later on, when I had say 40 more instances in my training data, then KBInfo for the exact same individual test as before got a positive 0.3. OK, I though if I regenerated my training data with the original 17000 instances I would get the same -0.89 as the day before, but not, now, with the old training data I get a KBInfo of 0.29....

Haven't got the time to study in more detail what is happening. I'd like to know if someone has experienced this before and what can be causing this high variance in the KBInfo.

Thx & regards,
Jordi

↧

Replace in String step in PDI leads to the issue - java.lang.IllegalArgumentException

January 19, 2016, 1:29 am

≫ Next: Time series prediction with overlay data in Java

≪ Previous: KBinfo measure differs with same training data

Following is the exception encountered when using "Replace in String" step in PDI kettle. There is no variable or group name defined as GKTIMESTAMP in the transformation and regex based replacement is turned off, yet it appears in the exception message. The environment is of Linux 64 bit machine with PDI 5.0.1 stable version

java.lang.IllegalArgumentException: No group with name {GKTIMESTAMP}
at java.util.regex.Matcher.appendReplacement(Matcher.java:849)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.replaceString(ReplaceString.java:79)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.getOneRow(ReplaceString.java:124)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.processRow(ReplaceString.java:202)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60)
at java.lang.Thread.run(Thread.java:745)

Can anyone suggest what could be the reason for such an error and how to resolve this error ?

↧

Time series prediction with overlay data in Java

January 19, 2016, 7:41 am

≫ Next: Consuming JSON variabel coming our of MongoDB as BSON

≪ Previous: Replace in String step in PDI leads to the issue - java.lang.IllegalArgumentException

I want to do a time series prediction with overlay data in Java code instead of in the Weka GUI.
For that I need to enable the checkbox 'evaluate on held out training'.
Any hints?

↧

Consuming JSON variabel coming our of MongoDB as BSON

January 19, 2016, 8:56 am

≫ Next: Impersonate User functionality

≪ Previous: Time series prediction with overlay data in Java

Hello forum,

I am tasked with parsing a document in MongoDB that has a number of nested arrays or elements of data - the number of the nested elements can be any so I cannot preset it manually on the MongoDB Input Transformation Component to a certain number (e.g. as $.Sensors[0].Value then $.Sensors[1].Value and so forth), hence I attempted to operate on the JSON variable coming out of the Mongo Input.

However, it appears actually not JSON complaint, but raw BSON instead (unlike specified in the Mongo Input component).
Therefore I have an issue parsing it using the JSON Path in a JavaScript Task for it being not JSON valid (using a library as helper).
Stripping the BSON metadata bits appears UN-reliable (for parasites in data).

I am curious if anyone has a BSON Path or BSON parser to extract the nested elements.

↧

Impersonate User functionality

January 19, 2016, 9:04 am

≫ Next: Blueprint Container Issue

≪ Previous: Consuming JSON variabel coming our of MongoDB as BSON

Hi,

I am looking for some advice on how to create an impersonate functionality.

I would like to be able (as admin user) to choose from a list of users on the system and to impersonate them and see my CDE dashboards from their point of view.

We have dynamic schemas meaning the cubes and dimensions within them vary per users as well as the CDE pages that a user could potentially access.

I was thinking to have a kettle endpoint that displays users and then one that receives a user to impersonate.

This would kick off another process to set the session variables perhaps?

And then use these when embedding another dashboard within the CDE page.

But this last part I am not having any luck implementing.

Anybody done anything like this before? Or have any suggestions on the best way to achieve this?

Thanks,

↧

Blueprint Container Issue

January 19, 2016, 11:35 am

≫ Next: Passing div class values

≪ Previous: Impersonate User functionality

I am trying to parse a larger than ordinary number of records. I do not have a dedicated Hadoop cluster but I do have several servers. My aim is to create a dynamic cluster on several machines. However, I am receiving 2 errors. The master starts and runs programs but the slave server is experiencing issues before and after registering successfully with the master.

How can I resolve these issues. I do not want to use the big-data-pluging hdfs capabilities and I cannot find ILineageClient.

Prior to successful registration, I am receiving the following error:

Code:

ERROR [KarafLifecycleListener] Error in Blueprint Watcher

org.pentaho.osgi.api.IKarafBlueprintWatcher$BlueprintWatcherException: Unknown error in KarafBlueprintWatcher

    at org.pentaho.osgi.impl.KarafBlueprintWatcherImpl.waitForBlueprint(KarafBlueprintWatcherImpl.java:89)

    at org.pentaho.di.osgi.KarafLifecycleListener$2.run(KarafLifecycleListener.java:112)

    at java.lang.Thread.run(Thread.java:745)

Caused by: org.pentaho.osgi.api.IKarafBlueprintWatcher$BlueprintWatcherException: Timed out waiting for blueprints to load: pdi-dataservice-server-plugin,pentaho-big-data-impl-shim-initializer,pentaho-big-data-impl-shim-hdfs,pentaho-big-data-impl-shim-pig,pentaho-big-data-impl-vfs-hdfs,pentaho-big-data-kettle-plugins-common-named-cluster-bridge,pentaho-big-data-kettle-plugins-guiTestActionHandlers,pentaho-big-data-kettle-plugins-pig,pentaho-hadoop-shims-mapr-osgi-jaas,pentaho-big-data-impl-clusterTests,pentaho-big-data-impl-shim-shimTests,pentaho-metaverse-core,pentaho-requirejs-osgi-manager,pentaho-angular-bundle,pentaho-marketplace-di

    at org.pentaho.osgi.impl.KarafBlueprintWatcherImpl.waitForBlueprint(KarafBlueprintWatcherImpl.java:77)

    ... 2 more

After loading properties I am successfully connecting for a time to the master.

Code:

2016/01/19 12:23:00 - Carte - Registered this slave server to master slave server [Master] on address [xxxxxxxxxxxxxxxx:8999]

2016/01/19 12:23:00 - Carte - Registered this slave server to master slave server [Master] on address [xxxxxxxxxxxxxxx:8999]

2016/01/19 12:23:00 - Carte - Created listener for webserver @ address : localhost:8199

However, after about a minute or so, object timeout is set to 1 minute. I get the following jobs. No tasks execute on the master before or after registration (actor model?).

Code:

[BlueprintContainerImpl] Unable to start blueprint container for bundle pdi-dataservice-server-plugin due to unresolved dependencies [(objectClass=org.pentaho.metaverse.api.ILineageClient)]

java.util.concurrent.TimeoutException

    at org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:336)

↧

Passing div class values

January 19, 2016, 12:54 pm

≫ Next: Transformation works in Spoon but not in my Java Application and there are no errors!

≪ Previous: Blueprint Container Issue

I've got a data source query that returns three values. I can successfully get those values passed to my HTML, and displayed properly using <div id=""></div>. However, I want to style the panel those numbers are in differently based on those values; basically, if a number is positive, make it green, if negative, make it red, etc.

Code:

if (percentChange < 0) {

        panelClass = 'panel-footer custom-panel-success' ;

    } else {

    // Negative percent change

        panelClass = 'panel-footer custom-panel-warning';

    }

How can I get the div class to accept the panelClass variable value?

↧

Transformation works in Spoon but not in my Java Application and there are no errors!

January 19, 2016, 1:41 pm

≫ Next: Remove Image Saiku at PDF/PNG chart

≪ Previous: Passing div class values

I am new to Pentaho Kettle and I have created several simple transformations and jobs in Spoon.

I have a Job that runs a transformation that simply pulls data from CSV files, adds a couple fields to each row, and sends the rows to MongoDB. I also have an error step(Write to Log) coming off of the MongoDB Output step.

The job and transformation run perfectly in Spoon and all rows appear in MongoDB.

However, when I run the Job from my Java App, everything runs perfectly except when it gets to the MongoDB Output step. There all the rows go to the Write to Log step and there are no errors recorded. :(

When I take out the Write to Log step there still no errors recorded and no rows are written to MongoDB.

I'm wondering if there is any DB configuration I need to do in my Java App but I thought that Kettle would take care of that for the transformation.

↧

Remove Image Saiku at PDF/PNG chart

January 19, 2016, 3:00 pm

≫ Next: Move xml files after sucesfull db write

≪ Previous: Transformation works in Spoon but not in my Java Application and there are no errors!

Hi! I need to know if its possible remove saiku image from pdf/png file when export a chart from saiku analytics. I'm using Saiku Community Edition.
I attached a file which contain a chart generate with saiku (the image is at the corner).

Thank you a lot!
Regards

Attached Images

chart.png (9.7 KB)

↧

Move xml files after sucesfull db write

January 20, 2016, 1:27 am

≫ Next: Strange error encountered during Job execution

≪ Previous: Remove Image Saiku at PDF/PNG chart

I am looking for a clean solution how to move a processed xml file to another folder location. InputFilename and outputFilename are provided at the beginning of the job with results from previous transformation. Current transformation is called row by row, so at each iteration one file is processed, written to the db and finally moved to another location.

The problem I am facing is the following:
- if I put a "process file" component (move operation) after the table output it will get trigger as many times as there are returning results in the previous table output component so this solution does not work well in this case
- in the situation where a transformation waits for all table outputs to be finished there arise a problem with referencing the outputFilename field from the beginning of the transformation. Is there any way how to reference that column without assigning a variable at the beginning of the job?

Details in the picture:
000046.jpg

Thank you

Attached Images

000046.jpg (24.0 KB)

↧

Strange error encountered during Job execution

January 20, 2016, 4:37 am

≫ Next: Pentaho ETL Developer WANTED

≪ Previous: Move xml files after sucesfull db write

I am getting such error in my environment where the exception is encountered when using "Replace in String" step in PDI kettle. There is no variable or group name defined as GKTIMESTAMP in the transformation and regex based replacement is turned off, yet the field / group name GKTIMESTAMP appears in the exception message. The environment is of Linux 64 bit machine with PDI 5.0.1 stable version.

Another thing that I have noticed this time during the failure is, after the transformation name and step name in square brackets, the log message prints "null" (as highlighted in red as part of Failure log snipper) instead of the step name (as highlighted in green color in Successful log snippet), followed by processing status such as example "- Linenr 50000".

Failure log snippet:

2016-01-19 06:11:31,601 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] null - Linenr 50000
2016-01-19 06:11:31,714 INFO [TaskHandlerJob IncrementalTask - Group byField] null - Linenr 50000
2016-01-19 06:11:32,341 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] null - Linenr 100000
2016-01-19 06:11:32,455 INFO [TaskHandlerJob IncrementalTask - Group byField] null - Linenr 100000
2016-01-19 06:11:33,013 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] null - Linenr 150000
2016-01-19 06:11:33,133 INFO [TaskHandlerJob IncrementalTask - Group byField] null - Linenr 150000
2016-01-19 06:11:33,875 ERROR [TaskHandlerJob IncrementalTask - replace_string] null - Unexpected error
2016-01-19 06:11:33,875 ERROR [TaskHandlerJob IncrementalTask - replace_string] null - java.lang.IllegalArgumentException: No group with name {GKTIMESTAMP}
at java.util.regex.Matcher.appendReplacement(Matcher.java:849)
at java.util.regex.Matcher.replaceAll(Matcher.java:955)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.replaceString(ReplaceString.java:79)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.getOneRow(ReplaceString.java:124)
at org.pentaho.di.trans.steps.replacestring.ReplaceString.processRow(ReplaceString.java:202)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60)
at java.lang.Thread.run(Thread.java:745)

In ideal case or the case when the job ran successfully, the step name would appear instead of null as highlighted above. The corresponding log snippet is given below. Is there any reason why the above scenario has occurred ?

Successful log snippet:

2016-01-19 06:11:31,601 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] Sorted Merge - Linenr 50000
2016-01-19 06:11:31,714 INFO [TaskHandlerJob IncrementalTask - Group byField] Group byField - Linenr 50000
2016-01-19 06:11:32,341 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] Sorted Merge - Linenr 100000
2016-01-19 06:11:32,455 INFO [TaskHandlerJob IncrementalTask - Group byField] Group byField - Linenr 100000
2016-01-19 06:11:33,013 INFO [TaskHandlerJob IncrementalTask - Sorted Merge] Sorted Merge - Linenr 150000
2016-01-19 06:11:33,133 INFO [TaskHandlerJob IncrementalTask - Group byField] Group byField - Linenr 150000

↧