Coding Charts

September 16, 2015, 9:52 am

≪ Previous: What are the Maven dependencies for User Defined Java Class step?

Hi, Apologies if this is a silly question, but I think I am missing something with Ctools. I have been using ctools for a few months and have created a few reports that I have been very pleased with. But, everything I have done using standard components and the advanced properties. What I want to be able to do is use the scripting used on the http://www.webdetails.pt/ctools/ccc/ page as this would allow me to create standard templates that can be used for reports without having to go through the hundreds or options. But, I don't know where to put this code. Should I be using a Protovis component and adding the code as custom chart script, or should I be doing something else like using it in the pre/post execution parts. Looking around I can see lots of places that give examples of the code I can use, just not how to use it. I'm probably asking google the wrong questions, but I'm not sure what the right question is. Any help or a point in the right direction would be a huge help. Thank you Andy

↧

sqleonardo does not open

September 16, 2015, 1:43 pm

≫ Next: Database connections reuse

≪ Previous: Coding Charts

Pentaho Report Designer 5.4.0.1.130 and also tested in previous versions.

I am not able to use sqleonardo, it does not open a new window.
Report Wizard

JDBC DataSource
Available Queries
Static Query

Clicking on the pencil
----------------x-----------------
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/sampledata
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at org.pentaho.reporting.engine.classic.core.modules.misc.datafactory.sql.DriverConnectionProvider.createConnection(DriverConnectionProvider.java:144)
at org.pentaho.reporting.ui.datasources.jdbc.ui.JdbcDataSourceDialog$InvokeQueryDesignerAction.actionPerformed(JdbcDataSourceDialog.java:241)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
----------------x-----------------

Copy the 'mysql-connector-java-5.1.17.jar' file from

/home/jjuan/pentaho/biserver-ce/tomcat/lib

to

/home/jjuan/pentaho/report-designer/lib/jdbch

and then when I click on the pencil (sqleonardo), it does nothing and the prd.log displays this message:
----------------x-----------------
2015-09-16 22:01:22,405 [ 25368] ERROR - org.pentaho.reporting.designer.core.util.exceptions.UncaughtExceptionsModel - Unexpected Error encountered:
java.lang.NoClassDefFoundError: com/sun/image/codec/jpeg/ImageFormatException
at nickyb.sqleonardo.querybuilder.QueryActions.init(QueryActions.java:57)
at nickyb.sqleonardo.querybuilder.QueryBuilder.<init>(QueryBuilder.java:78)
at org.pentaho.reporting.ui.datasources.jdbc.ui.JdbcDataSourceDialog$InvokeQueryDesignerAction.actionPerformed(JdbcDataSourceDialog.java:253)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)

----------------x-----------------
Thanks:
Jose

Attached Images

Screenshot from 2015-09-16 22:40:52.jpg (21.3 KB)

↧

Database connections reuse

September 16, 2015, 11:42 pm

≫ Next: Fuzzy Match - Find closest distance matrix

≪ Previous: sqleonardo does not open

Hi,

I have created a dashboard which has 5 components. So when I am trying to use all components in dashboard , it is making 5 different database connections. But I want to use single database connection for all components. I want to reuse the same database connection for all components.

Please help me as this is very urgent .

Please note that I am not using ktr ..we are doing this thru jndi option.

Thanks

Pallavi

↧

Fuzzy Match - Find closest distance matrix

September 17, 2015, 12:44 am

≫ Next: Combine 2 streams with identical fields into 1 with prio fieldvalue

≪ Previous: Database connections reuse

I have an logic to find the most optimised to perform delivery. Lets say, I have location A,B,C. So need distance and duration from A to B, B to A, A to C, C to A, B to C and C to B.

Example result would be NewMatrix in fiddle.
http://sqlfiddle.com/#!6/9cce7/1
I have a table where I store current matrix we have based on past deliveries. (AppMatrix in table above)
So I need to lookup distance and duration in this table, to find closest matching origin and destination. I have created following function which works just perfect to get my answer :

Code:

SELECT TOP 1 Distance,([Time]/60)as Duration FROM[AppMatrix]ORDER BY ABS([OriginSiteLat]-@OriginLat)+ ABS([OriginSiteLng]-@OriginLong),ABS([DestSiteLat]-@DestinationLat)+ ABS([DestSiteLng]-@DestinationLong)

The problem is slowness. Since I need to perform these call with each matrix (I can have 700 different deliveries in a day, 700*700 = 14000, this just too slow - it takes few hours to return result against few million current records)

I'm looking of using Fuzzy Match with either Levenshtein or Damerau-Levenshtein algorithm to find the closest origin and destination. Hoping for some guidance here.

↧

Combine 2 streams with identical fields into 1 with prio fieldvalue

September 17, 2015, 2:09 am

≫ Next: RequireJS Dashboard use JQuery in external JS

≪ Previous: Fuzzy Match - Find closest distance matrix

Hi,

Im newby in kettle and confronted with some problem when i try to combine 2 streams (exel files) with identical fields (but different values) into 1 stream.:confused:

ex.

Stream 1 : Fields

ID	Field1	Field2	Field3
1	A	B
2	C	D
3	E	F
5	G	H	Z

Stream 2 : fields

ID	Field1	Field2	Field3
1	A	B	C
3	E
4	X	Y	Z
5	G	H	AA

Output should look like :

ID	field1	Field2	Field3
1	A	B	C
2	C	D
3	E	F
4	X	Y	Z
5	G	H	AA

The blanks should be replaced by the value coming from either inputstream.
when there are different values in the fields then the value of stream 2 should be taken in the output.
unique rows should flow to the output as they are.

any idea how to takle this?

↧

RequireJS Dashboard use JQuery in external JS

September 17, 2015, 2:59 am

≫ Next: Dynamically Creating Output Text File Names from fields?

≪ Previous: Combine 2 streams with identical fields into 1 with prio fieldvalue

I want to reuse funtionality in several dashboards that support the RequireJS feature. One functionality should be to hide the charts when something has changed.
Therefore i simply used JQuery to hide the corresponding colums. This is working fine as long as i defined it as a resource with codesnippet.
However when i paste the same code in an external JS and use it as an resource i receive an error saying: "TypeError: $ is not a function".
Even following the suggestion from Šime Vidas in http://stackoverflow.com/questions/1...-anonymous-fun didn't solve the problem.
So is it possible to use JQuery in external JS resource, and if yes how do i achieve it?

↧

Dynamically Creating Output Text File Names from fields?

September 17, 2015, 6:54 am

≫ Next: Distribute Rows Doesn't Enhance Performance.

≪ Previous: RequireJS Dashboard use JQuery in external JS

All -

I have Get Files input step and then read in a set of XML files (pattern). I would like to use the short_filename field (from Get Files) to dynamically create Text File Output steps based on concatenating the short_filename field.

Here are a few sample rows to help illustrate the scenario:

short_filename,json_details

nodes_Checks,{"id": "tsom.256d126e-41d6-11e5-a5ef-b72ef4e0a925", "label": "dcesx12", "text": "dcesx12", "image": "img/models/ESXi.png", "shape": "image", "color": { "background":"orange", "border":"orange", "highlight":{ "background":"orange","border":"orange"}}},
nodes_Checks,{"id": "entuity.813f3d1a-4530-11e5-a5ef-b72ef4e0a925", "label": "172.21.86.253", "text": "172.21.86.253", "image": "img/models/Router.png", "shape": "image", "color": { "background":"orange", "border":"orange", "highlight":{ "background":"orange","border":"orange"}}},
nodes_NIT,{"id": "entuity.813f3d1a-4530-11e5-a5ef-b72ef4e0a925", "label": "172.21.86.253", "text": "172.21.86.253", "image": "img/models/Router.png", "shape": "image", "color": { "background":"orange", "border":"orange", "highlight":{ "background":"orange","border":"orange"}}},
nodes_Tapps,{"id": "tsom.256d126e-41d6-11e5-a5ef-b72ef4e0a925", "label": "dcesx12", "text": "dcesx12", "image": "img/models/ESXi.png", "shape": "image", "color": { "background":"orange", "border":"orange", "highlight":{ "background":"orange","border":"orange"}}},
nodes_Tapps,{"id": "entuity.813f3d1a-4530-11e5-a5ef-b72ef4e0a925", "label": "172.21.86.253", "text": "172.21.86.253", "image": "img/models/Router.png", "shape": "image", "color": { "background":"orange", "border":"orange", "highlight":{ "background":"orange","border":"orange"}}},
...

Now I want to output the field "json_details" to Output Text Files based on the short_filename. So all the rows that has short_filename with value 'nodes_Checks', would be written to nodes_Checks.txt. And so forth.

I can not use Filter Rows since the values for short_filename will change or be dynamic.

Let me know how I can accomplish this. Thanks for the support!!

KP

↧

Distribute Rows Doesn't Enhance Performance.

September 17, 2015, 10:32 am

≫ Next: problème de classification probabiliste avec weka

≪ Previous: Dynamically Creating Output Text File Names from fields?

I'm getting some data from SQL table, and trying to execute a SQL stored procedure against each row.
To enhance performance, I tried distributing rows to 5 different steps, and it takes 3 mins to process 1000 rows.
So I added another 5, so 10 steps in total, yet it still takes 5 mins. What am I doing wrong?

http://screencast.com/t/gdEoymuVu5MJ

↧

problème de classification probabiliste avec weka

September 17, 2015, 11:09 am

≫ Next: Get JSON currency from currencylayer.com

≪ Previous: Distribute Rows Doesn't Enhance Performance.

bonjour;
je traite un problème de classification de symboles mathématiques
j'ai une base d'apprentissage avec laquelle j'ai entraîner un classifieur Bayes network de Weka. à partir du modèle de classification sauvegardé, je veux prédire la classe à laquelle appartient une instance de test avec un pourcentage d'appartenance à la classe. j'explique, supposant que j'ai 4 classes de symboles: plus, moins, fraction, multiplication, en introduisant un symbole unconnu, je veux avoir la probabilité d'appartenance de ce symbole à chaque classe. ce que j'ai actuellement. weka me retourne uniquement une classe à laquelle appartient mon symbole avec une probabilité égale à 1 et 0 pour le reste des classes.

↧

Get JSON currency from currencylayer.com

September 17, 2015, 12:34 pm

≫ Next: Filtering CSV using multiple grouped AND/OR statements?

≪ Previous: problème de classification probabiliste avec weka

Hi,

I am trying to get the exchange currency from currencylayer.com. Returned JSON looks like the following:

{
"success":true,
"terms":"https:\/\/currencylayer.com\/terms",
"privacy":"https:\/\/currencylayer.com\/privacy",
"timestamp":1442515988,
"source":"USD",
"quotes":{
"USDAUD":1.375989,
"USDCHF":0.962855,
"USDEUR":0.877751,
"USDGBP":0.64024
}
}
I would like to save exchange rate data into tables but when executing the JSON Input with the following mapping an error is returned.

Sin título.png

Error:
2015/09/17 21:16:19 - Json Input.0 - ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : Could not open file #1 : XXX --> org.pentaho.di.core.exception.KettleException:
2015/09/17 21:16:19 - Json Input.0 - The data structure is not the same inside the resource! We found 1 values for json path [$.timestamp], which is different that the number returned for path [$.quotes[*]] (4 values). We MUST have the same number of values for all paths.

I understand why the error but no idea how to solve it.

Many thanks for your help!

Attached Images

Sin título.png (7.4 KB)

↧

Filtering CSV using multiple grouped AND/OR statements?

September 17, 2015, 12:35 pm

≫ Next: The (servername) failed to respond error message when running a report.

≪ Previous: Get JSON currency from currencylayer.com

Is there a way to select which rows transform will be passed to the next step based on groups of AND/OR statements? Normally I would do this at the database level, but I'm processing CSV files and only need a subset of the rows from the input files. I tried using the Filter Rows step, but that doesn't allow me to group my AND/OR statements together. I wasn't able to find any other forum posts that addressed this so it's definitely possible that there is a completely different way to do this and I'm going about it all wrong.

Any help would be appreciated.

Thanks,
Jared

A small example from the CSV input file:

| Device | Description | Property | Value |
---------------------------------------------------------------------------
| PC-a | BIOS | SMBIOS Version | 2.5 |
| PC-a | BIOS | BIOS Version | A08 |
| PC-a | Computer System | Model | Precision WorkStation T3400 |
| PC-a | Computer System | NetBios name | PC-a |
| PC-a | Computer System | System type | X86-based PC |

I would like to select specific rows based on a combination of the Description and multiple Property values used a combination of AND and OR statements:

(Description = ValueA
AND (
Property = ValueB
OR Property = ValueC
OR Property = ValueD
))
OR (Description = ValueE
AND (
Property = ValueF
Property = ValueG
Property = ValueH
))

I'm running Kettle 5.4.0.1-130

↧

The (servername) failed to respond error message when running a report.

September 17, 2015, 1:47 pm

≫ Next: Question of building self referencing dimension table

≪ Previous: Filtering CSV using multiple grouped AND/OR statements?

Hello:

I have a report that has been running for over a year. We are using a cloud application and so i do not know the vendor's servers or environment configurations. Now, suddenly this report is returning:

ParentException: java.sql.SQLException: java.sql.SQLException: org.apache.commons.httpclient.NoHttpResponseException: The server reportdata.qualifacts.org failed to respond

Does anyone have an clues as why this is happening and how to address/fix the issue? The vendor has not been able to resolve the issue.:(

Thanks,

↧

Question of building self referencing dimension table

September 17, 2015, 8:31 pm

≫ Next: Pentaho PDI features and limitation. Help Help Help

≪ Previous: The (servername) failed to respond error message when running a report.

Hi There,

I have a table called Document, it has a foreign key column called ParentDocumentId which points to DocumentId at the same table. I trying to create a transformation to load a table in data warehouse called dim_document. I understand that I might need an additional closure table to hold the distance between records and also need a foreign key parent_document_key within dim_document table to record direct parent document.

There is possible scenario as the following steps
1. Parent Document created.
2. Child Document created.
3. Parent Document updated.
4. ETL transformation started.

Since the select document query is based on the LastUpdatedOn field, which means the child document will be appeared at the front of the parent document. Then, the load_parent_document_key step will return null value while processing the child document because the parent document is not in dimension table yet.

Is there any way to cope this issue?

Thanks so much.
Jing

↧

Pentaho PDI features and limitation. Help Help Help

September 18, 2015, 3:13 am

≫ Next: Remove Duplicate Rows in Table Output

≪ Previous: Question of building self referencing dimension table

Hi all,
I want to know some features and limitation of Pentaho , we are just trying to use Pentaho PDI as our ETL tool but while researching in Pentaho and using few days. I really want to know about:

1. Is there any GUI ETL Monitoring tool in order to track each and every job day to day basis & their status and all.
2. Is there any GUI or way to handle all the jobs listed in one place ( Pentaho all Job list within that server ) , so we can easily run the different jobs easily.
3. Is there any control of how dynamically/automated way dependency and dependent flow will be run , if there is some issue occurs for someday.
4. Is there a way to run particular flow only if there is errors occurs , so that we don’t need to run whole flow again.

we will be using Pentaho PDI kettle V5.4.0 , so I really want to know about above concerns is the Pentaho have already these features tools or it's limitation, hope someone can help me.
If there is useful links for this then please share:

Thanks,
Anil Maharjan
BI Engineer
https://np.linkedin.com/in/maharjananil

↧

Remove Duplicate Rows in Table Output

September 18, 2015, 4:16 am

≫ Next: how to control the parallel number of job in kettle

≪ Previous: Pentaho PDI features and limitation. Help Help Help

Hi All,
I have taken Input from One Database Table and copied to another Table. It was successfully copied.
But when I run the same transformation again, It will generate duplicate rows(Means, Runs twice). It is not appending with the same value.

Eg:If Input Table has 5 rows, Then one time it is generating 5 rows. Second time 10 rows instead f 5 rows.

I want same output whenever run the KTR file.How to do it??

Regards
Ramya

↧

how to control the parallel number of job in kettle

September 18, 2015, 5:40 am

≫ Next: WEKA Algorithm for Uplift Modeling?

≪ Previous: Remove Duplicate Rows in Table Output

Hi guys,

How to set the number of the job run in parallel in kettle v5.4 ?

e.g. I have 80+ jobs, and they all can run in parallel, which mean the jobs don't dependent each other.

I need to run 10 jobs in parallel each time, and when one of the 10 jobs finished, the one of the remained 70 jobs should start to run.

How can i achieve this ?

Regards
tony

↧

WEKA Algorithm for Uplift Modeling?

September 18, 2015, 5:43 am

≫ Next: negative values

≪ Previous: how to control the parallel number of job in kettle

Hi all,

I'm interested in doing some uplift modeling in medical datasets. The effort is to identify subgroups of patients in a randomized trial that may benefit from a specific intervention. Are there any Weka-implementable algorithms for this type of activity?

Appreciate any advice / directions.

Thanks!

↧

negative values

September 18, 2015, 6:11 am

≫ Next: PRD 5 Problem with mdx query columns names!! Can't format it!!

≪ Previous: WEKA Algorithm for Uplift Modeling?

Hello,

I am using weka in a dataset that originally contains some values with negative signs. What should I do? Leave them with negative signs in the dataset or work with absolute values?

Thanks

↧

PRD 5 Problem with mdx query columns names!! Can't format it!!

September 18, 2015, 7:51 am

≫ Next: how to call job2 from other job1

≪ Previous: negative values

I'm building a report with PRD 5.0.1, using an mdx query. It has some measures as rows (no problem with these) and one dimension as columns (time dimension from wich i want show only year-month).
The problem is with the column names that the query is generating, it uses the full dimension member name, but i want only use the last part of this name (as it is shown in saiku using the same mdx query).

For example: column name in saiku [2014-09]
column name in PRD [Activacion].[Activacion.Activacion].[All Activacions].[2014].[2014-09]

How can i format these names to show only last part (2014-09) in PRD?

I've tried using a formula expression in the "labels-deatail-header" attribute of the header label, like =RIGHT([::column::1], 9) (this is supposed to get the last 9 characters from the name of the column 1). In this case the label is empty. CASE1
I've tried with the same formula but placed in the "value" attribute of the label, and then it shows the last 9 characters of the first data value for the ::column:: field, but not for name of the column. CASE2

Here is a screenshot in wich you can see the problem, first column (with the dark background color) is the measures, second column is the first member of the time dimension, and the header is empty (CASE1), and last column the header has the first data value as label (13819.0), instead of the field name (CASE2):
Captura.jpg

This is the mdx query i'm using:
WITH MEMBER [Measures].[A- Activadas] as [Measures].[cantidad]
MEMBER [Measures].[B- Impactadas] as [Measures].[cantidadImpactadas]
MEMBER [Measures].[C- Efic. Vendedor (en %)] as VBA!round([Measures].[ImpactadasActivación] * 100,1)
MEMBER [Measures].[D- Entreg. PDV] as [Measures].[cantEntregaPDV]
MEMBER [Measures].[F- Efic. PDV (en %)] as VBA!round([Measures].[porcentajePDV] * 100,1)
MEMBER [Measures].[E- Imp. PDV] as [Measures].[cantEntregImpactadas]
SELECT
HEAD(TAIL(FILTER({Hierarchize({[Activacion].[mesActivacion].AllMembers})}, NOT ISEMPTY([Measures].[cantidad])), 13), 12) ON COLUMNS,
NON EMPTY {Hierarchize({{[Measures].[A- Activadas], [Measures].[B- Impactadas], [Measures].[D- Entreg. PDV], [Measures].[E- Imp. PDV], [Measures].[C- Efic. Vendedor (en %)], [Measures].[F- Efic. PDV (en %)]}})} ON ROWS
FROM [Impactadas]

Thanks in advance and sorry for my english.

Attached Images

Captura.jpg (26.6 KB)

↧

how to call job2 from other job1

September 18, 2015, 7:56 am

≫ Next: Windows Server is Rate Limiting SQL connections to database over network - help?

≪ Previous: PRD 5 Problem with mdx query columns names!! Can't format it!!

Hi,

PDI-CE-5.3, MySQL , java 1.7,windows

I am trying to call job2 from job1 once job1 is successfully executed. searched in web but not able to find solution, could you please help me how can i achieve this?.

here is my scenario: job1 is executed based upon incremental data load but i am trying to run job2 with always full load.

I can run job2 manually once job1 is executed but i am trying to apply logic generically. please advise.

Thank you

↧