Customization Of the Y-Axis Scale - CCC Chart

May 11, 2019, 2:20 pm

≪ Previous: How to Get Max from different Column

Hi ,

I am display top 10 category based upon the cost.Some of category having very less value due to this my bar chart is not visible much which is not looking so much good.Also i have mentioned below the screen shot for the same:

Name: chart.jpg
Views: 18
Size: 8.6 KB

There is any work around to make it y scale based upon the value written by cda query.Lets suppose my query will be written 10 results set then in the y axis we will have 10 tick mark and each tick mark value having based upon the result return.

Is it possible to do in the cde dashboard.can anyone help me on this or provide me any suggestion how we can achieved this.

Attached Images

↧

Create OLAP cubes

May 15, 2019, 5:43 am

≫ Next: MaxMind GeoIP Lookup not working

≪ Previous: Customization Of the Y-Axis Scale - CCC Chart

Maybe someone can help...
I've never tired to create OLAP cubes. This subject is absolutely new for me.
I see Pentaho offers some products to create such data layer; Schema Workbench, Mondrian server...

Suppose I have MySQL database. What I need to convert data into OLAP cubes, what layers I should organize with what tools? Pentaho Server is the only tool to display those data?
Can I use Pentaho Kettle to do some part of the job?

Any help appreciated,

Regards

↧

MaxMind GeoIP Lookup not working

May 16, 2019, 8:23 am

≫ Next: Kettle log errors - Step works nonetheless

≪ Previous: Create OLAP cubes

Hi,
There is a plugin "MaxMind GeoIP Lookup" but it's not working. I think beacause it does not support current Maxmind databases and Maxmind has discontinued legacy databases.
Do you have any news about plugin updates?
Or any idea about alternative approach?

↧

Kettle log errors - Step works nonetheless

May 19, 2019, 2:35 pm

≫ Next: partition bulk insert elasticsearch

≪ Previous: MaxMind GeoIP Lookup not working

A Shell script step calls the program 'osmosis' which loads an (OpenStreetMap) .osm (format XML) file to postgresql (version 11.x).
It works - the result is exactly what I want.
However, when run, the log just shows a series of ERROR messages at this step. (See below, in the middle of the log lines.) Also, subsequent job steps provide my desired results without errors.
I am ignoring this, but any idea what might be causing it?
Thanks.
RCD
=================== LOG ================================
2019/05/19 23:00:22 - Spoon - Starting job...
2019/05/19 23:00:22 - Load Larabanga OSM - Start of job execution
2019/05/19 23:00:22 - Load Larabanga OSM - Starting entry [DATA_HOME=/opt/pentaho/laradata]
2019/05/19 23:00:22 - Load Larabanga OSM - Starting entry [Get Geofabrik Ghana]
2019/05/19 23:00:22 - Get Geofabrik Ghana - Start of HTTP job entry.
2019/05/19 23:00:22 - Get Geofabrik Ghana - Connecting to URL: http://download.geofabrik.de/africa/...latest.osm.pbf
2019/05/19 23:00:22 - Get Geofabrik Ghana - Resource type: application/octet-stream, last modified on: Sun May 19 01:01:58 CEST 2019.
2019/05/19 23:00:26 - Get Geofabrik Ghana - Finished writing 41922383 bytes to result file [/opt/pentaho/laratmp/ghana.osm.pbf]
2019/05/19 23:00:26 - Load Larabanga OSM - Starting entry [..........]
2019/05/19 23:00:26 - Ghana pbf to Larabanga osm - Running on platform : Linux
2019/05/19 23:00:26 - Ghana pbf to Larabanga osm - Executing command : /tmp/kettle_205b0c0b-7a79-11e9-a189-3bd5352ed62bshell
2019/05/19 23:00:27 - Load Larabanga OSM - Starting entry [......... OSM exists]
2019/05/19 23:00:27 - Load Larabanga OSM - Starting entry [Wait for file creation]
2019/05/19 23:00:30 - Load Larabanga OSM - Starting entry [osm to geojson]
2019/05/19 23:00:30 - osm to geojson - Running on platform : Linux
2019/05/19 23:00:30 - osm to geojson - Executing command : /tmp/kettle_2281bc9c-7a79-11e9-a189-3bd5352ed62bshell
2019/05/19 23:00:31 - Load Larabanga OSM - Starting entry [OSMosis init SQL]
2019/05/19 23:00:31 - Load Larabanga OSM - Starting entry [SQL Trunc or Drop tables]
2019/05/19 23:00:31 - Load Larabanga OSM - Starting entry [Wait for SQL init]
2019/05/19 23:00:34 - Load Larabanga OSM - Starting entry [OSMosis to Posgtres DB]
2019/05/19 23:00:34 - OSMosis to Posgtres DB - Running on platform : Linux
2019/05/19 23:00:34 - OSMosis to Posgtres DB - Executing command : /tmp/kettle_24f1d23d-7a79-11e9-a189-3bd5352ed62bshell
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:34 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Osmosis Version 0.46
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:34 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Preparing pipeline.
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:34 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Launching pipeline execution.
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:34 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:34 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Pipeline executing, waiting for completion.
2019/05/19 23:00:36 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:36 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:36 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Pipeline complete.
2019/05/19 23:00:36 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) May 19, 2019 11:00:36 PM org.openstreetmap.osmosis.core.Osmosis run
2019/05/19 23:00:36 - OSMosis to Posgtres DB - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : (stderr) INFO: Total execution time: 2353 milliseconds.
2019/05/19 23:00:36 - Load Larabanga OSM - Starting entry [SQL for ways tags]
2019/05/19 23:00:36 - Load Larabanga OSM - Starting entry [SQL for nodes tags]
2019/05/19 23:00:37 - Load Larabanga OSM - Starting entry [SQL for relations tags]
2019/05/19 23:00:37 - Load Larabanga OSM - Starting entry [Delete old zip file]
2019/05/19 23:00:37 - Delete old zip file - File [$DATA_HOME/........_osm.zip] already deleted.
2019/05/19 23:00:37 - Load Larabanga OSM - Starting entry [Zip OSM input data]
2019/05/19 23:00:37 - Load Larabanga OSM - Starting entry [Store loaded OSM data]
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Store loaded OSM data] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Zip OSM input data] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Delete old zip file] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [SQL for relations tags] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [SQL for nodes tags] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [SQL for ways tags] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [OSMosis to Posgtres DB] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Wait for SQL init] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [SQL Trunc or Drop tables] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [OSMosis init SQL] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [osm to geojson] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Wait for file creation] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [........ OSM exists] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Ghana pbf to ......... osm] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [Get Geofabrik Ghana] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Finished job entry [DATA_HOME=/opt/pentaho/laradata] (result=[true])
2019/05/19 23:00:37 - Load Larabanga OSM - Job execution finished
2019/05/19 23:00:37 - Spoon - Job has ended.
2019/05/19 23:17:45 - Spoon - Spoon

↧

partition bulk insert elasticsearch

May 21, 2019, 1:19 am

≫ Next: Binary file (standard input) matches

≪ Previous: Kettle log errors - Step works nonetheless

Regards,I write because I am working with pentaho and elasticsearch and I found a something that can be handled with relatively simple although I do not understand why this option is not in pentaho

you have an index that has
tickets-year-month
and a pattern that reads tickets- *
therefore it manages them as a partition

now I do not know how to put in the name of the index the variable year-month for each row of the transformation since it is working with about 10 million tickets per month the separation is fundamental

Name: Captura.jpg
Views: 0
Size: 8.2 KB

illustrating it in the image as it could change the {$ year} {$ month} change it to the real value of each row

Attached Images

↧

Binary file (standard input) matches

May 21, 2019, 10:48 pm

≫ Next: setOutputDone() and return true?

≪ Previous: partition bulk insert elasticsearch

When i'm trying to use mysql in table input with an order by in the query, I'm getting below error and ETL stops abruptly.

Code:

Binary file (standard input) matches

If I remove order by in the query, it works. Is this a bug in Pentaho?

I'm using Pentaho 8.1.0.0 CE
OS: Ubuntu 16.04.4 LTS
MySQL Driver version: mysql-connector-java-5.1.46.jar

↧

setOutputDone() and return true?

May 27, 2019, 4:24 am

≫ Next: Report Designer 8.2: Query preview of Sub Report don't work

≪ Previous: Binary file (standard input) matches

I have a unique situation which is quite complicated, but let's give it a try. This applies to a transformation I made which reads fact rows from files, applies various filters and calculations, then generates statistics using Memory Group By. This transformation handles big data, typically millions of rows. The summaries calculated are finally inserted into a fact table.

This transformation splits into multiple sub-streams as described above, each sub-stream writes to a unique fact table. So the contents from the file are maniuplated in multiple ways and written to multiple fact tables.

The bug: There is a bug in "Memory Group By" step. It will log "Return value <random field name> can't be found in the input row.". Problem with this is, the field that can't be found is some random temporary field used earlier, but irrelevant and not referenced by the Memory Group By step. Perhaps due to some race condition or otherwise, some step/row meta-data is leaking. This error will periodically happen, perhaps a few times per day. The program is however executed every 5 minutes, exactly 288 times per day (12 5-min intervals per hour, multiplied by 24 hours). So the crash frequency is ca 0.7%.

Relevant links describing the bug (no solutions):
https://forums.pentaho.com/archive/i...p/t-95727.html
https://stackoverflow.com/questions/...-the-input-row
https://forums.pentaho.com/threads/2...the-input-row/

The gateway step: Some of the sub-streams must be enabled using a variable. This is achieved using Java code step. Let's call it the "gateway step". So if the variable is "false", the gateway step will disable the sub-stream. The purpose of the gateway step is simple: Either allow rows to pass depending on the variable or stop all rows. The error above only happen when the sub-stream is disabled, meaning no rows are even passed to Memory Group By!

The gateway step must return true to avoid rows filling up. PDI can only handle a certain number of "intermediate" rows between steps before crashing. So the gateway step will return true to avoid this. But what if it also calls "setOutputDone()" to quickly and immediately disable all steps in the disabled sub-stream?

I have difficulties finding detailed description of what exactly "setOutputDone()" does, but I'll try and see if the error disappears. Due to the low crash frequency it can take some time to see any difference.

Anyone have ideas here?

↧

Report Designer 8.2: Query preview of Sub Report don't work

May 28, 2019, 12:37 am

≫ Next: Access Dashboards anonymously (Sparkl Plugin)

≪ Previous: setOutputDone() and return true?

I'm using Report Designer 8.2 (prd-ce-8.2.0.0-342.zip) on Windows10.

On main report, I add JDBC data source to Dataset, and add some SQL query like "SELECT * FROM foo". Then, to click "preview" bottan at the bottom right. PRD shows preview of query results.

Next, I add a "Sub Report" on the main report, and add JDBC data source to Dataset of the "Sub Report". Then, add some SQL query like above. Then click "preview" bottan. PRD does not show anything, and error is logged like below.

Although query preview of sub report does not work, query itself is working normaly. I can use result set of query on the sub report. Only query preview does not work.

Does someone have the same symptoms ?

---- PRD error log ----
java.lang.ClassCastException: org.pentaho.reporting.engine.classic.core.SubReport cannot be cast to org.pentaho.reporting.engine.classic.core.MasterReport
at org.pentaho.reporting.ui.datasources.jdbc.ui.JdbcDataSourceDialog$PreviewAction.actionPerformed(JdbcDataSourceDialog.java:136)
at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
at java.awt.Component.processMouseEvent(Unknown Source)
at javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.awt.Component.processEvent(Unknown Source)
(snip)

↧

Access Dashboards anonymously (Sparkl Plugin)

May 28, 2019, 2:23 pm

≫ Next: Example of Change File Encoding

≪ Previous: Report Designer 8.2: Query preview of Sub Report don't work

Hi!

How do I access the dashboards of my sparkl plugin anonymously?

↧

Example of Change File Encoding

May 28, 2019, 6:44 pm

≫ Next: Regex Evaluation Issues

≪ Previous: Access Dashboards anonymously (Sparkl Plugin)

I would like to receive an example of a transformation with a change file encoding step.

I need to change a ISO-8859-1 file into a UTF-8 file.

Thanks

↧

Regex Evaluation Issues

May 29, 2019, 7:09 am

≫ Next: Group By / Return Value cannot be found

≪ Previous: Example of Change File Encoding

Hi Group

I hope someone can help.

I have a regex evaluation step and the expression to find the email with a string

([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)

The string is as follows

Username: uceses9Name: Test SharifTelephone (including code): 01344322585Email: Testf958@gmail.com

The expression works with online regex testers yet in the PDI whether I use the regex evaluation step or replace in string step brings back null values

Can someone help me in the right direction

Thanks in Advance
Chirag

↧

Group By / Return Value cannot be found

May 27, 2019, 4:24 am

≫ Next: Data Mining -> PMI and Weka J48 algorithm

≪ Previous: Regex Evaluation Issues

I have a unique situation which is quite complicated, but let's give it a try. This applies to a transformation I made which reads fact rows from files, applies various filters and calculations, then generates statistics using Memory Group By. This transformation handles big data, typically millions of rows. The summaries calculated are finally inserted into a fact table.

This transformation splits into multiple sub-streams as described above, each sub-stream writes to a unique fact table. So the contents from the file are maniuplated in multiple ways and written to multiple fact tables.

The bug: There is a bug in "Memory Group By" step. It will log "Return value <random field name> can't be found in the input row.". Problem with this is, the field that can't be found is some random temporary field used earlier, but irrelevant and not referenced by the Memory Group By step. Perhaps due to some race condition or otherwise, some step/row meta-data is leaking. This error will periodically happen, perhaps a few times per day. The program is however executed every 5 minutes, exactly 288 times per day (12 5-min intervals per hour, multiplied by 24 hours). So the crash frequency is ca 1%.

Relevant links describing the bug (no solutions):
https://forums.pentaho.com/archive/i...p/t-95727.html
https://stackoverflow.com/questions/...-the-input-row
https://forums.pentaho.com/threads/2...the-input-row/

The gateway step: Some of the sub-streams must be enabled using a variable. This is achieved using Java code step. Let's call it the "gateway step". So if the variable is "false", the gateway step will disable the sub-stream. The purpose of the gateway step is simple: Either allow rows to pass depending on the variable or stop all rows. The error above only happen when the sub-stream is disabled, meaning no rows are even passed to Memory Group By!

The gateway step must return true to avoid rows filling up. PDI can only handle a certain number of "intermediate" rows between steps before crashing. So the gateway step will return true to avoid this. But what if it also calls "setOutputDone()" to quickly and immediately disable all steps in the disabled sub-stream?

I have difficulties finding detailed description of what exactly "setOutputDone()" does, but I'll try and see if the error disappears. Due to the low crash frequency it can take some time to see any difference.

Anyone have ideas here?

↧

Data Mining -> PMI and Weka J48 algorithm

June 1, 2019, 10:06 am

≫ Next: Can not wait for remote job - Pentaho-Server 8.2

≪ Previous: Group By / Return Value cannot be found

Hello,

If I take it correctly, PMI Decision Tree Classifier will produce J48 decision tree model. Right?
All what I see as a result from this step is:

Model

Code:

J48 unpruned tree

------------------



categories <= 2: yes (2.0)

categories > 2

|   turnover = down: yes (3.0/1.0)

|   turnover = up: no (2.0)



Number of Leaves  :     3



Size of the tree :     5

Then I have to use this model in some other step to produce prediction?
If so, which one?

Regards

↧

Can not wait for remote job - Pentaho-Server 8.2

June 3, 2019, 6:09 am

≫ Next: How to remove html from the user console

≪ Previous: Data Mining -> PMI and Weka J48 algorithm

Good Morning,

If you can help me I would really appreciate it, I have been with it for days and I can not understand what is happening.

We have installed version 8.2 of pentaho-server and data-integration on a server with debian 9, the repository is on b.d. which has by default the installation of pentaho-server (I did not modify it).

Previously we had version 7.1 with the repository in a mysql (I do not know what is recommended, if you leave it in the b.d. by default or put it in a mysql database).

When importing the repository to the new server, and running from my new server (pentaho-server) at: http://xxx.xxx.x.xxx:8080 (from the web environment), the jobs that other jobs have inside give me these messages and jump them:

PROBLEM WITH THE SUB-JOBS:

2019/06/03 11:35:08 - j_loader_secundario - [j_loader_principal] can not wait for remote job [j_loader_secundario] to complete. [j_loader_principal] will continue to run.

and it continues executing the following one ... (in the old servant until the sub-job did not finish it did not continue with the following one).

I tried from the client to activate the [wait for remote job to complete] box within the configuration of the jobs that I execute within the main job but it appears disabled.

But when I launch it from the web environment, it always gives me the message I mention above and jumps to the next one.

If I launch it with kitchen, it runs without problems.

If you can help me with any suggestions, I'd really appreciate it, because I do not know if I can leave the jobs to execute all of them locally with the [wait for remote job to complete] option, since with the pentaho-server configuration from the server's web it is impossible to execute it without skipping the sub-jobs without executing them.

Thank you very much in advance and greetings,

---

Buenos días,

Si me podéis ayudar os lo agradecería mucho, llevo días con ello y no consigo entender que es lo que pasa.

Hemos instalado la versión 8.2 de pentaho-server y data-integration en un servidor con debian 9, el repositorio está en la b.d. que lleva por defecto la instalación de pentaho-server (no lo modifiqué).

Anteriormente teníamos la versión 7.1 con el repositorio en un mysql (tampoco se que es recomendable, si dejarlo en la b.d. por defecto o meterlo en una base de datos tipo mysql).

Al importar el repositorio al nuevo servidor, y ejecutar desde mi nuevo servidor (pentaho-server) en: http://xxx.xxx.x.xxx:8080 (desde el entorno web), los jobs que tiene otros jobs dentro me dan estos mensajes y los salta:

PROBLEMA CON LOS SUB-JOBS:

2019/06/03 11:35:08 - j_loader_secundario - [j_loader_principal] cannot wait for remote job [j_loader_secundario] to complete. [j_loader_principal] will continue to run.

y continua ejecutando el siguiente... (en el servidor antiguo hasta que no terminaba el sub-job no continuaba con el siguiente).

Intenté desde el cliente activar la casilla [wait for remote job to complete] dentro de la configuración de los jobs que ejecuto dentro del job principal pero aparece desactivada.

Pero cuando lo lanzo desde el entorno web, siempre me da el mensaje que comento más arriba y salta al siguiente.

Si lo lanzo con kitchen lo ejecuta sin problemas.

Si me podéis ayudar con alguna sugerencia os lo agradecería mucho, porque no se si dejar los jobs para ejecutar todos en local con la opción [wait for remote job to complete], puesto que con la configuración de pentaho-server desde la web del servidor es imposible ejecutarlo sin que salte los sub-jobs (sin llegar a ejecutarlos).

Muchas gracias por anticipado y un saludo,

↧

How to remove html from the user console

June 6, 2019, 12:14 pm

≫ Next: Docker Image for PDI CE

≪ Previous: Can not wait for remote job - Pentaho-Server 8.2

Would like to remove the following html portions from the user console.
1. Getting Started
2. Favorites
3. Recents
4. Log Out
Name: PENTAHO_SR.jpg
Views: 11
Size: 25.3 KB

Name: PENTAHO_SR.jpg
Views: 11
Size: 25.3 KB

Attached Images

↧

Docker Image for PDI CE

June 10, 2019, 3:02 am

≫ Next: Load tables in parallel with a template transformation

≪ Previous: How to remove html from the user console

I would like to deploy pdi-ce-8.2.0.0-342 (Pentaho Data Integration Community Edition) on AWS cloud as a container. Is there PDI CE image available on dockerhub or some where else ?

Thanks in advance.

↧

Load tables in parallel with a template transformation

June 11, 2019, 7:06 am

≫ Next: Memory Group by from PDI to Spark trough AEL ERROR: None.get

≪ Previous: Docker Image for PDI CE

Hey,

I have a question. I try to build an etl - job that extract and load my data with a template. For this one I use a job with 2 steps, where I get my sql queries and pass that to the result.
In the next step, i get the parameters and set them to the subtransformation or job and want to load my data with the table output step.

So, I have a job with two steps
1. Step get my queries and set them to results
2. Step execute every input row and add the results to parameter that I can use it in the subtransformation.

Now my question?

I want to execute the transformation in parallel for every input row and extract and load my data in parallel with a template.

It is possible to do that?

I try it, to set in the 1 step, execute next step in parallel, but this doesn´t work. I used the transformationexecutor too, but the same result.

The tables will load in sequenz and not parallel.

Have someone of you an idea, how could I try to do hat?

Thanks in advanced.

↧

Memory Group by from PDI to Spark trough AEL ERROR: None.get

June 14, 2019, 3:50 am

≫ Next: copy files(images) from one machine to another machine

≪ Previous: Load tables in parallel with a template transformation

Hi to all,
i'm trying to set up some demos with PDI+AEL->Hadoop+Spark

I have done a simple job with one transf that has 1 generate rows, 1 Memory Group By and 1 Write to log.

I start the Job on my client PC and set the Transformation to run on through AEL on a Spark Server

So my set up is
- my pc that run the job
- a server with hadoop+spark+pentaho ael daemon

conf:
Pentaho 8.1 CE
hadoop-3.1.2
spark-2.4.3

As far as i know AEL re-map on spark some steps, generate rows and write to log are not those kind of re-mapped steps, Memory Group By it's one of these.
Everything fine as far as I don't use re-mapped steps, daemon get called and the transf run on a single node of spark and that's ok.

If i use Memory Group By i get an error on my pc that run the job and then the transf fails, the error is copied at the end of the post.
The daemon on remote machine simpy says:
2019-06-14 10:47:54.349 INFO 6864 --- [io-53000-exec-9] o.p.a.d.spark.RequestServerEndpoint : Received Stop Message from driver
2019-06-14 10:47:54.362 INFO 6864 --- [io-53000-exec-2] o.p.a.d.spark.RequestServerEndpoint : Session 12 closed because of CloseReason: code [1000], reason [Transformation Complete Successfully]
2019-06-14 10:47:54.395 INFO 6864 --- [io-53000-exec-1] o.p.a.d.spark.RequestServerEndpoint : Received Driver Session Closed Message: Driver Session not Reused
2019-06-14 10:47:54.429 INFO 6864 --- [io-53000-exec-1] o.p.a.d.spark.RequestServerEndpoint : Session 13 closed because of CloseReason: code [1000], reason [Driver Finalized]

Any hint?
Thank you

2019/06/14 12:40:30 - Memory Group by.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : None.get
2019/06/14 12:40:30 - Memory Group by.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : org.pentaho.di.engine.api.remote.ExecutionException: None.get
2019/06/14 12:40:30 - Memory Group by.0 - at scala.None$.get(Option.scala:347)
2019/06/14 12:40:30 - Memory Group by.0 - at scala.None$.get(Option.scala:345)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.mapper.MemoryGroupByMetaDataMapper.mapPentahoSparkField(MemoryGroupByMetaDataMapper.java:175)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.mapper.MemoryGroupByMetaDataMapper.lambda$mapAggregateFields$0(MemoryGroupByMetaDataMapper.java:96)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.mapper.MemoryGroupByMetaDataMapper.mapAggregateFields(MemoryGroupByMetaDataMapper.java:104)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.mapper.MemoryGroupByMetaDataMapper.fromPdiToAelGroupByMeta(MemoryGroupByMetaDataMapper.java:70)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.MemoryGroupBySparkOperation.mapToGroupByAelMeta(MemoryGroupBySparkOperation.java:82)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.groupby.BaseGroupBySparkOperation.apply(BaseGroupBySparkOperation.java:110)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable.applyOperation(TaskObservable.java:264)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.Optional.ifPresent(Optional.java:159)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable$OperationSubscriber.lambda$addOutput$1(TaskObservable.java:355)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:778)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2140)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable$OperationSubscriber.addOutput(TaskObservable.java:354)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable$OperationSubscriber.lambda$setOutput$3(TaskObservable.java:399)
2019/06/14 12:40:30 - Memory Group by.0 - at java.lang.Iterable.forEach(Iterable.java:75)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable$OperationSubscriber.setOutput(TaskObservable.java:399)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.ops.GenericSparkOperation.apply(GenericSparkOperation.java:114)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable.applyOperation(TaskObservable.java:264)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.Optional.ifPresent(Optional.java:159)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.TaskObservable$OperationSubscriber.lambda$addOutput$1(TaskObservable.java:355)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019/06/14 12:40:30 - Memory Group by.0 - at org.pentaho.di.engine.spark.impl.execution.JobGroupDriverTask.lambda$new$0(JobGroupDriverTask.java:51)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2019/06/14 12:40:30 - Memory Group by.0 - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2019/06/14 12:40:30 - Memory Group by.0 - at java.lang.Thread.run(Thread.java:748)

↧

copy files(images) from one machine to another machine

June 17, 2019, 3:26 am

≫ Next: Change column value if output-format is excel

≪ Previous: Memory Group by from PDI to Spark trough AEL ERROR: None.get

Hi All,

How can we copy files(images) from one machine to another machine.Please give me one example for Getting files from a remote server .

Thanks in advance,
Janu:(:(:(

↧

Change column value if output-format is excel

June 19, 2019, 3:27 am

≫ Next: Connecting Data to Azure

≪ Previous: copy files(images) from one machine to another machine

Hello, I have report, where I show some photos. However, if the report is generated as Excel output, I don't want to have the photos there, because the size of generated file is too big in that case.
Because of that, I want to have those columns empty, if the output format is Excel (or, even not HTML would work). I have tried to do it through excel:formula in Attributes section of the cell, but it did not work (I guess, it is because its the formula of that excel cell.
I have also tried to create Open formula with IF statement, but wasn't able to find any way how to give it output-format as a parameter.
Is there any way how to solve this issue? How to change value of a cell depending on output-format? I been trying to find it in Pentaho documentation, but no luck so far..

↧