How to get headers metadata from query component

May 4, 2018, 8:31 am

≫ Next: Multiple pie - individual colors

≪ Previous: PDI 7.1 XML return ok - PDI 8.0 XML bad return

Hi,

I want to create a custom table from a query component that has a kettle transformation data source.

I'm getting the data only with the bellow code, but I don't know how to get the headers.

Code:

function() {

    var myContainer = document.getElementById('panel_body_user_aggregates');

    var myTable = document.createElement('table');

    var myTHead = document.createElement('thead');

    var myTh = document.createElement('th');

    var myTBody = document.createElement('tbody');

    var myTr = document.createElement('tr');

    var myTd = document.createElement('td');

    //var q = dashboard.getQuery('query_user_aggregates');

    var var_result_user_aggregates = dashboard.getParameterValue('result_user_aggregates');

    

    myTable.className = 'table table-striped table-bordered';

    

    myTr.innerHTML = "<th></th><th colspan='2'>Latenza Media</th><th colspan='2'>Latenza Media</th><th colspan='3'>Through. Download Medio</th>";

    document.getElementById('panel_body_user_aggregates').innerHTML='';

    

    myContainer.appendChild(myTable).appendChild(myTBody);

    //myContainer.innerHTML('');     

    myContainer.lastChild.lastChild.appendChild(myTr);

    for(var i = 0; i < var_result_user_aggregates.length; i++) {        

        myContainer.lastChild.lastChild.appendChild(myTr.cloneNode());

        for(var j = 0; j < var_result_user_aggregates[i].length; j++) {

            myText = document.createTextNode(var_result_user_aggregates[i][j]);

            myContainer.lastChild.lastChild.lastChild.appendChild(myTd.cloneNode()).appendChild(myText);

        }

    }

}

I followed an old post

Code:

https://diethardsteiner.github.io/dashboards/2014/05/21/Pentaho-CDE-Custom-Table.html

It has the following code but it doesn't work in version 8, only the metadata part.

Code:

function() {

       var myContainer = document.getElementById('test');

       var myTable = document.createElement('table');

       var myTHead = document.createElement('thead');

       var myTh = document.createElement('th');

       var myTBody = document.createElement('tbody');

       var myTr = document.createElement('tr');

       var myTd = document.createElement('td');

       

       //myTable.id = 'table1';

       myTable.className = 'table table-striped';

       

       //document.getElementById('test').innerHTML = JSON.stringify(this.metadata);

       myMetadata = this.metadata;

          

       myContainer.appendChild(myTable).appendChild(myTHead).appendChild(myTr);

       

       for(var s = 0; s < myMetadata.length; s++){

           myHeaderText = document.createTextNode(myMetadata[s]['colName']);

           myContainer.lastChild.lastChild.lastChild.appendChild(myTh.cloneNode()).appendChild(myHeaderText);

       }

        

       myContainer.lastChild.appendChild(myTBody);

           

       for(var i = 0; i < select_result.length; i++) {

           myContainer.lastChild.lastChild.appendChild(myTr.cloneNode());

           for(var j = 0; j < select_result[i].length; j++) {

               myText = document.createTextNode(select_result[i][j]);

               myContainer.lastChild.lastChild.lastChild.appendChild(myTd.cloneNode()).appendChild(myText);

           }

       }



   }

Can someone help me or point me to another direction...please!:(

↧

Multiple pie - individual colors

May 4, 2018, 3:11 pm

≫ Next: pentaho-server internal cache

≪ Previous: How to get headers metadata from query component

Hi, i need your help, i have a small multiple pie and i need put individual colors in the each pie depends a value

for example

range 1 to 3 red range 3.01 to 4 yellow range 4.01 to 5 green

1 Pie have 2.5 color red
2 Pie have 3.5 color yellow
3 Pie have 4.2 color green
n Pie ......

i use this function with one pie, but i dont know to configure or adapt with multipie :

function changePie(e){
var cccOptions = this.chartDefinition;
var eps = Dashboards.propertiesArrayToObject(cccOptions.extensionPoints);

var analizar = e.resultset;

var valor = analizar[0][2];

if(valor >= 4){
this.chartDefinition.colors = ['#727176', '#288f42'];
}
else
if(valor < 4 && valor >= 3){
this.chartDefinition.colors = ['#727176', '#d0cb22'];
}
else{
this.chartDefinition.colors = ['#727176', '#a82522'];
}

}

Thanks !!!

↧

pentaho-server internal cache

May 7, 2018, 3:55 am

≫ Next: Does the BA Server include PDI?

≪ Previous: Multiple pie - individual colors

Hello, I am doing some changes in texts of pentaho-server 7.1 (CE, if that matters)

BUT I am still getting strange behaviour.

For example:
I wanted to change button text of "Browse Files" button in PUC. However on some PCs it is changed and on some is not.
I wanted to get rid of "Run in Background" button in report generation dialogue. But even if I have commented it in "pentaho-server/tomcat/webapps/pentaho/mantle/browser/js/browser.fileButtons.js" file, but even after I have restarted pentaho-server several times, the button is still there.

So I would like to ask, if there is some tomcat or other cache that I need to refresh/delete to make such things happen.

I am sorry if its lame question, but I am not very skilled in web apps.

↧

Does the BA Server include PDI?

May 7, 2018, 11:18 am

≫ Next: Dynamic column in CSV file

≪ Previous: pentaho-server internal cache

Since the new Vantara Community site to me is one of the most terrible and unusable sites ever, I fail to find a clear overview of what is in the BA Server package. So far I have been using only PDI, but now I want to evaluate and use the full package. Is it then enough to just get the BA Server download, or should I get both the BA Server and PDI downloads?

↧

Dynamic column in CSV file

May 8, 2018, 6:33 am

≫ Next: Waterfall Chart With Solid Colours

≪ Previous: Does the BA Server include PDI?

Hi,
How to get the dynamic column using SQL or pivot table, the Output format is CSV file.

thanks
Manoj

↧

Waterfall Chart With Solid Colours

May 8, 2018, 8:45 am

≫ Next: Upload zip file object to blob field in database

≪ Previous: Dynamic column in CSV file

Hi,

I have a question that is bugging me.

I have put a waterfall in my dashboard, but I want it to use solid colours for all bars and not faded/transparency.

Thanks for the help.

↧

Upload zip file object to blob field in database

May 9, 2018, 7:15 am

≫ Next: Pentaho Dashboard embedding - getHeaders trouble

≪ Previous: Waterfall Chart With Solid Colours

Hi guys,

Do anyone know how to upload a zip file as a object to the blob field in a database. I am basically using a oracle database. Not sure if we can achieve this through Pentaho. Any thoughts please??

↧

Pentaho Dashboard embedding - getHeaders trouble

May 9, 2018, 7:30 am

≫ Next: Customising a waterfall chart

≪ Previous: Upload zip file object to blob field in database

Hi all,

That's the first time I'm trying to embed a test dashbord on my php based page. I'm using pentaho 8.0
I have a problem with getHeaders function:

PHP Code:


$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://myserver:8080/pentaho/plugin/pentaho-cdf-dd/api/renderer/getHeaders?userid=user&password=password&solution=public&path=/Cdg/Test&file=test_20180508.wcdf&absolute=true&root=myserver:8080&inferScheme=false");
curl_setopt($ch, CURLOPT_USERPWD, "user:password");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo "<html><head>";?>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<script type="text/javascript">        
var contextPath = "http://myserver:8080/pentaho";        
var FULL_QUALIFIED_URL = 'http://myserver:8080/pentaho/';       
 var webAppPath = 'http://myserver:8080/pentaho';       
 var CONTEXT_PATH = 'http://myserver:8080/pentaho';
</script>
<?php$result = curl_exec($ch);echo $result;
//and so on...

The result is:

HTML Code:

  <title>test_20180508</title>

  <script language="javascript" type="text/javascript" src="webcontext.js?context=cdf&amp;requireJsOnly=true"></script>

1) title is correct so it seems make a correct call, that's correct;
2) the src parameters has not been modified with "root" getHeaders parameter, that's the problem :-P

Where I'm wrong?

Thank you!!!

Alex

↧

Customising a waterfall chart

May 10, 2018, 11:01 am

≫ Next: Line chart with Stacked Bar as Plot2

≪ Previous: Pentaho Dashboard embedding - getHeaders trouble

Hi,

I have a waterfall chart in my dashboard. Is there a way to edit the value text so it only shows for one level and hide all the rest?

↧

Line chart with Stacked Bar as Plot2

May 10, 2018, 12:36 pm

≫ Next: Dynamic SQL query in Pentaho Reporting 5.4

≪ Previous: Customising a waterfall chart

Hi,

am asking for some help/suggestion on how to tructure the data set, so that I can have the "main" chart be a line chart with a bar chart as "secondary" chart.
I cant choose a Line Chart as I do not know how to change the plot2 to a stack bar.
So am trying the negative approach, use the bar chart and use the plot2 to draw the line chart, I am getting stuck on selecting "plot2Series and plot2SeriesIndexes" for multiple series.

DS
Date|block1|block2|block3|block4|block5|linepoint1|linepoint2|linepoint..|linepointN

Aim:
https://ibb.co/fN5p3y

Current:
plot2SeriesIndexes=linepoint1 (But the names change, and dont know how many)

https://ibb.co/jE7abJ

↧

Dynamic SQL query in Pentaho Reporting 5.4

May 14, 2018, 1:45 am

≫ Next: Transformation executor and result

≪ Previous: Line chart with Stacked Bar as Plot2

Hi

When I used PRD4.2, I could execute dynamic query with "JDBC(Costom) ".
("Data"tab⇒”Dataset"⇒"Advanced"⇒"JDBC(Costom) ")

But in PRD5.4 ,there is no "JDBC(Costom) " and I can't use dynamic query.

How to execute dynamic query in PRD5.4

Thanks.

↧

Transformation executor and result

May 14, 2018, 1:43 pm

≫ Next: passing field down the stream and populate in email

≪ Previous: Dynamic SQL query in Pentaho Reporting 5.4

Hi,

I have transformation:

Table input => Transformation Executor (Microsoft Excel Output inside) => Copy rows to result

How to pass Table input rows to step "Copy rows to result"?
Looks like "Transformation Executor" returns only result files (names of the excel files in Microsoft Excel Output step).

BWT
in Transformation Executor and Preview Data tab there are always errors - this is normal?

↧

passing field down the stream and populate in email

May 15, 2018, 3:13 pm

≫ Next: Return only one part of .txt

≪ Previous: Transformation executor and result

How can I pass a value that is already defined as a fiieldname in a transformation down further in the job and have that field populate within the body of the email message in the Mail step?

↧

Return only one part of .txt

May 16, 2018, 5:23 am

≫ Next: Unable to start PDI CE 7.1 in windows 8.1

≪ Previous: passing field down the stream and populate in email

Hello!

I want to take only a part of .txt, its possible?

For example, i want just the three lines in the image:

Remember that in .txt the lines will not always be in the same position.

Someone can help?

Thanks!

↧

Unable to start PDI CE 7.1 in windows 8.1

May 16, 2018, 5:38 am

≫ Next: javascript compare all the columns

≪ Previous: Return only one part of .txt

Hi,

I'm trying to use PDI CE 7.1 but the spoon is not starting. I'm using windows 8.1 and Java 1.8 64-bit also I have set JAVA_HOME and JRE_HOME in system variables.
I'm not getting any errors or java exceptions but the spoon UI is not coming up

Following is the configuration

F:\pdi-ce-7.1.0.0-12\data-integration>Spoon.bat
DEBUG: Using PENTAHO_JAVA_HOME
DEBUG: _PENTAHO_JAVA_HOME=C:\Program Files\Java\jdk1.8.0_45
DEBUG: _PENTAHO_JAVA=C:\Program Files\Java\jdk1.8.0_45\bin\javaw.exe

F:\pdi-ce-7.1.0.0-12\data-integration>start "Spoon" "C:\Program Files\Java\jdk1.
8.0_45\bin\javaw.exe" "-Xms1024m" "-Xmx2048" "-XX:MaxPermSize=256m" "-Dhttps.pr
otocols=TLSv1,TLSv1.1,TLSv1.2" "-Djava.library.path=libswt\win64" "-DKETTLE_HOME
=" "-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD=" "-DKETTLE_PLUGIN
_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT=" "-DKETTLE_JNDI_ROOT=" -jar launcher\penta
ho-application-launcher-7.1.0.0-12.jar -lib ..\libswt\win64

Any Suggestions?

Thanks!

↧

javascript compare all the columns

May 16, 2018, 6:16 am

≫ Next: Pentaho 8.1 is available

≪ Previous: Unable to start PDI CE 7.1 in windows 8.1

Hi there.

I'm trying to build a pipeline that will compare two source files and identify discrepancies, that is records with that have different values in ANY position within the record. I want to output one row per discrepancy, something like

Record 1, Variable7, Source A="12.1", Source B= "12.11"

I've got a transformation that reads the two sources, sorts by id, and then does an inner join so i get an output record that consists of the record from source B appended to the one from source A. Then I want to use javascript to compare and to output the rows. The code below is a simple test, based on an example I found written by Slawomir Chodnicki. This works on a simple test file with 7 columns in each source so the merged record is 14 columns wide. That size is hard coded right now.

The code loops through the row asking whether row[1]!= [row]. If they are not equal it outputs a row with the source A and B values. I need to add the variable name but my immediate problem is that the equality test fails when the field is a date, time, or datetime. I've looked up various posts about Penatho and about javascript in general and I see that two date objects are never equal. There are a number of sugestions about how to compare two dates but all of them pre-suppose that I know I've got two dates. The real sources I want to compare have 181 columns in each consisting of a mixture of numbers, dates and text. How can I make a coparison when I don't know the variable type? I've loked around and I keep getting directed to posts about what to do when the variable type = unkown. That's not the situation her. Every variable will be of known type, just not known by me.

Many thanks for any help.

Nick

//loop through input record comparing start 1st halF to 2nd

for (var i=0;i<7;i++) {

// create a new row object, that fits our output size
var outrowrow = createRowCopy(getOutputRowMeta().size());

// find the index of the first field appended by this step
var idx = getInputRowMeta().size();

// fill each field (notice how the index is increased)
outrowrow[idx++] = row[i];
outrowrow[idx++] = row[i+7];

if (row[i] !==row[i+7]) {

// output the row
putRow(outrowrow);
}

}

// do not output the trigger row
trans_Status = SKIP_TRANSFORMATION;
//trans_Status = ERROR_TRANSFORMATION;

↧

Pentaho 8.1 is available

May 16, 2018, 9:20 am

≫ Next: Does the active transformation exist as Java object?

≪ Previous: javascript compare all the columns

Pentaho 8.1 is available

The team has once again over delivered on a dot release! Below are what I think are the many highlights of Pentaho 8.1 as well as a long list of additional updates.

If you don’t have time to read to the end of my very long blog, just save some time and download it now. Go get your Enterprise Edition or trial version from the usual places

For CE, you can find it on the community home!

Cloud

One of the biggest themes of the release: Increased support for Cloud. A lot of vendors are fighting for becoming the best providers, and what we do is try to make sure Pentaho users watch all that comfortably sitting on their chairs, having a glass of wine, and really not caring about the outcome. Like in a lot of areas, we want to be agnostic – which is not saying that we’ll leverage the best of each – and really focus on logic and execution.

It’s hard to do this as a one time effort, so we’ve been adding support as needed (and by “as needed” I really mean based on the prioritization given by the market and our customers). A big focus of this release was Google and AWS:

Google Storage (EE)

Google Cloud Storage is a RESTful unified storage for storing and accessing data on Google's infrastructure. PDI support for import and export Data To/From Cloud Storage is now done through a new VFS driver (gs://). You may even use it on the several steps that support it as well as browse it’s contents.

These are the roles required on Google Storage for this to work:
● Storage Admin
● Storage Object Admin
● Storage Object Creator
● Storage Object Viewer

In terms of authentication, you’ll need the following environment variable defined:

GOOGLE_APPLICATION_CREDENTIALS="/opt/Pentaho81BigQuery.json“

From this point on, just treat it as a normal VFS source.

Google BigQuery – JDBC Support (EE/CE)

BigQuery is Google's serverless, highly scalable, low cost enterprise data warehouse. Fancy name for a database, and that’s how we treat it.

In order to connect to it first we need the appropriate drivers. Steps here are pretty simple:

1. Download free driver: https ://cloud .google .com /bigquery /partners /simba -drivers /
2. Copy google*.* files from Simba driver to /pentaho/design-tools/data-integration/libs folder

Host Name will default to https ://www .googleapis .com /bigquery /v 2 but your mileage may vary.

Unlike the previous item, authentication doesn’t use the previously defined environment variable as does Google VFS. Authentication here is done at the JDBC driver level, though a driver option, OAuthPvtKeyPath, set in the Database Connection Option and the you need to point to the Google Storage certificate through the P12 key format.

The following Google BigQuery roles are required:

1. BigQuery Data Viewer
2. BigQuery User

Google BigQuery – Bulk Loader (EE)

While you can use a regular table output to insert data into BigQuery that’s going to be slow as hell (who said hell was slow? This expression makes no sense at all!). So we’ve added a step for that: Google BigQuery Loader.

This step leverages google’s loading abilities, and is processed out on Google, not on PDI. So the data, that has to be either in Avro, JSON or CSV has to be previously copied to Google Storage. From that point on is pretty straightforward. Authentication is done via the GOOGLE_APPLICATION_CREDENTIALS environment variable point to the Google JSON file.

Google Drive (EE/CE)

While Google Storage will probably be seen more frequently in production scenarios, we also added support for Goggle Drive, a file storage and synchronization service, allows users to store files on their servers, synchronize files across devices, and share files.

This is also done through a VFS driver, but given it’s a per user authentication a few steps need to be fulfilled to leverage this support:

● Copy your Google client_secret.json file into (The Google Drive option will not appear as a Location until you copy the client_secret.json file into the credentials directory and restart)
o Spoon: data-integration/plugins/pentaho-googledrive-vfs/credentials directory, and restart spoon.
o Pentaho Server: pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-googledrive-vfs/credentials directory and restart the server
● Select Google Drive as your Location. You are prompted to login to your Google account.
● Once you have logged in, the Google Drive permission screen displays.
● Click Allow to access your Google Drive Resources.
● A new file called StoredCredential will be added to the same place where you had the client_secret.jsonfile. This file will need to be added to the Pentaho Server credential location and that authentication will be used

Analytics over BigQuery (EE/CE, depending on the tool used)

This JDBC connectivity to Google BigQuery, as defined previously for Spoon, can also be used throughout all the other Business Analytics browser and client tools – Analyzer, CTools, PIR, PRD, modeling tools, etc. Some care has to be taken here, though, as BigQuery’s pricing is related to 2 factors:

● Data stored
● Data queried

While the first one is relatively straightforward, the second one is harder to control, as you’re charged according to total data processed in columns selected. For instance, a ‘select *’ query should be avoided if only specific columns are needed. To be absolutely clear, this has nothing to do with Pentaho, these are Google BigQuery pricing rules.

So ultimately, and a bit like we need to do on all databases / data warehouses, we need to be smart and work around the constraints (usually speed and volume, on this case price as well) to leverage best what these technologies have to offer. Some examples are given here:

● By default, there is BigQuery caching and cached queries are free. For instance, if you run a report in Analyzer, clear the Mondrian cache, and then reload the report, you will not be charged (thanks to the BigQuery caching)
● Analyzer: Turn off auto refresh, i.e, this way you design your report layout first, including calculations and filtering, without querying the database automatically after each change
● Analyzer: Drag in filters before levels to reduce data queried (i.e. filter on state = California BEFORE dragging city, year, sales, etc. onto canvas)
● Pre-aggregate data in BigQuery tables so they are smaller in size where possible (to avoid queries across all raw data)
● GBQ administrators can set query volume limits by user, project, etc. (quotas)

AWS S3 Security Improvements (IAM) (EE/CE)

PDI is now able to get IAM security keys from the following places (in this order):

1. Environment Variables
2. Machine’s home directory
3. EC2 instance profile

This added flexibility helps accommodate different AWS security scenarios, such as integration with S3 data via federated SSO from a local workstation, by providing secure PDI read/write access to S3 without making user provide hardcoded credentials.

The IAM user secret key and access key can be stored in one place so they can be leveraged by PDI without repeated hardcoding in Spoon. These are the environment variables that point to them:

● AWS_ACCESS_KEY_ID
● AWS_SECRET_ACCESS_KEY

Big Data / Adaptive Execution Layer (AEL) Improvements

Bigger and Better (EE/CE)

AEL provides spectacular scale out capabilities (or is it scale up? I can’t cope with these terminologies…) by seamlessly allowing a very big transformation to leverage a clustered processing engine.

Currently we have support for Spark through the AEL layer, and throughout the latest releases we’ve been improving it in 3 distinct areas:

● Performance and resource optimizations
o Added Spark Context Reuse that, under certain circumstances can speed up startup performance on the range to 5x faster, proving specially useful under development conditions
o Spark History Server integration, providing a centralized administration, auditing and performance reviews of the transformations executed in Spark
o Ability to passing down to the cluster customized spark properties, allowing a finer-grained control of the execution process
● Increased support for native steps (eg, leveraging the spark specific group by instead of the PDI engine one)
● Adding support for more cloud vendors – and we just did that for EMR 5.9 and MapR 5.2

This is the current support matrix for Cloud Vendors:

Sub Transformation support (EE/CE)

This one is big, as it was the result of a big and important refactor on the kettle engine. AEL Now supports executing sub transformations through the Transformation Executor step, a long-standing request since the times of good-old PMR (Pentaho Map Reduce)

Big Data formats: Added support for Orc (EE/CE)

Not directly related to AEL, but most of the use cases where we want the AEL execution we’ll need to input data in a big data specific format. In previous releases we added support for Parquet and Avro, and we now added support for ORC (Optimized Record Columnar), a format favored by Hortonworks.

Like the others, Orc will be handled natively when transformations are executed in AEL

Worker Nodes (EE)

Jumping from scale-out to scale-up (or the opposite, like I mentioned, I never know), we continue to do lots of improvements on the Worker Nodes project. This is an extremely strategic project for us as we integrate with the larger Hitachi Vantara portfolio.

Worker nodes allow you to execute Pentaho work items, such as PDI jobs and transformations, with parallel processing and dynamic scalability with load balancing in a clustered environment. It operates easily and securely across an elastic architecture, which uses additional machine resources as they are required for processing, operating on premise or in the cloud.

It uses the Hitachi Vantara Foundry project, that leverages popular technologies under the hood such as Docker (Container Platform), Chronos (Scheduler) and Mesos/Marathon (Container Orchestration).

For 8.1 there are several other improvements:

● Improvements tn Monitoring, with accurate propagation of Work Items status for monitoring
● Performance improvements by optimizing the startup times for executing the work items
● Customizations are now externalized from docker build process
● Job clean up functionality

Streaming

In Pentaho 8.0 we introduced a new paradigm to handle streaming datasources. The fact that it’s a permanently running transformation required a different approach: The new streaming steps define the windowing mode and point to a sub transformation that will then be executed on a micro batch approach.

That works not only for ETL within the kettle engine but also in AEL, enabling spark transformations to feed from Kafka sources.

New Streaming Datasources: MQTT, and JMS (Active MQ / IBM MQ) (EE/CE)

Leveraging on the new streaming approach, there are 2 new steps available – well, one new and one (two, actually) refreshed.

The new one is MQTT – Message Queuing Telemetry Transport - an ISO standard publish-subscribe-based messaging protocol that works on top of the TCP/IP protocol. It is designed for connections with remote locations where a "small code footprint" is required or the network bandwidth is limited. Alternative IoT centric protocols include AMQP, STOMP, XMPP, DDS, OPC UA, WAMP

There are 2 new steps – MQTT Input and MQTT Output, that connect with the broker for consuming and publishing back the results.

Other than this new, IoT centered streaming source, there are 2 new steps, JMS Input and JMS Output. These steps replace the old JMS Consumer/Producer and the IBM Websphere MQ steps, supporting, in the new mode the following message queue platforms:

● ActiveMQ
● IBM MQ

Safe Stop (EE/CE)

This new paradigm to handle streaming sources introduced a new challenge that we never had to face. Usually, when we triggered jobs and transformations, they had a well defined start and end; Our stop functionality was used when we wanted to basically kill a running process because something was not going well.

However, on these streaming use cases, a transformation may never finish. So stopping a transformation the way we’ve always done – by stopping all steps at the same time – could have unwanted results.

So we implemented a different approach – We added a new option to safe stop a transformation implemented within Spoon, Carte and the Abort step, that instead of killing all the step threads, stops the input steps and lets the other steps gracefully finish the processing, so no records currently being processed are lost.

This is especially useful in real-time scenarios (for example reading from a message bus). It’s one of those things that when we look back seems pretty dumb that it wasn’t there from the start. It actually makes a lot of sense, so we went ahead and made this the default behavior.

Streaming results (EE/CE)

When we launched streaming in Pentaho 8.0 we focused on the processing piece. We could launch the sub transformation but we could not get results back. Now we have the ability to define which step on the sub-transformation will send back the results to follow the rest of the flow.

Why is this important? Because of what comes next…

Streaming Dataservices (EE/CE)

There’s a new option new option to run data service in streaming mode. This will allow the consumers (on this case CTools Dashboards) to get streaming data from this dataservice.

Once defined, we can test these options within the test dataservices page and see the results as they come.

This screen exposes the functionality as it would be called from a client. It’s important to know that the windows that we define here are notthe same as the ones we defined for the micro batching service. The window properties are the following:

● Window Size – The number of rows that a window will have (row based), or the time frame that we want to capture new rows to a window (time based).
● Every - Number of rows (row based), or milliseconds (time based) that should elapse before creating a new window.
● Limit – Maximum number of milliseconds (row based) or rows (time based) which will be used to wait for a new window to be generated.

CTools and Streaming Visualizations (EE/CE)

We took a holistic approach to this feature. We want to make sure we can have a real time / streaming dashboard leveraging what was set up before. And this is where the CTools come in. There’s a new datasource in CDE available to connect to streaming dataservices:

Then the configuration of the component will select the kind of query we want – Time or number of records base, window size, frequency and limit. This gives us a good control for a lot of use cases.

This will allow us to then connect to a component the usual way. While this will probably be more relevant for components like tables and charts, ultimately all of them will work.

It is possible to achieve a level of multi-tenancy by passing a user name parameter from the PUC session (via CDE) to the transformation as a data services push-down parameter. This will enable restriction of the data viewed on a user by user basis

One important note is that the CTools streaming visualizations do not yet operate on a ‘push’ paradigm – this is on the current roadmap. In 8.1, the visualizations poll the streaming data service on a constant interval which has a lower refresh limit of 1 second. But then again… if you’re doing a dashboard of this types and need a refresh of 1 second, you’re definitely doing something wrong…

Time Series Visualizations (EE/CE)

One of the biggest use cases for streaming, from a visualization perspective, is time series. We improved the support for CCC for timeseries line charts, so now data trends over time will be shown without needing workarounds.

This applies not only to CTools but also to Analyzer

Data Exploration Tool Updates (EE)

We’re keeping on our path of improving our Data Exploration Tool. It’s no secret that we want to make it feature complete so that it can become the standard data analysis tool for the entire portfolio.

This time we worked on adding filters to the Stream view.

We’ll keep improving this. Next on the queue, hopefully, will be filters on the model view and date filters!

Additional Updates

As usual, there were several additional updates that did not make it to my highlights above. So for the sake of your time and not creating a 100 page blog – here are even more updates in Pentaho 8.1.

Additional updates:

● Salesforce connector API update (API version 41)
● Splunk connection updated to version 7
● Mongo version updated to 3.6.3 driver (supporting 3.4 and 3.6)
● Cassandra version updated to support version 3.1 and Datastax 5.1
● PDI repository browser performance updates, including lazy loading
● Improvements on the Text and Hadoop file outputs, including limit and control file handling
● Improved logging by removing auto-refresh from the kettle logging servlet
● Admin can empty trash folder of other users on PUC
● Clear button in PDI step search in spoon
● Override JDBC driver class and URL for a connection
● Suppressed the Pentaho ‘session expired’ pop-up on SSO scenarios, redirecting to the proper login page
● Included the possibility to schedule generation of reports with a timestamp to avoid overwriting content

In summary (and wearing my marketing hat) with Pentaho 8.1 you can:
● Deploy in hybrid and multi-cloud environments with comprehensive support for Google Cloud Platform, Microsoft Azure and AWS for both data integration and analytics
● Connect, process and visualize streaming data, fromMQTT, JMS, and IBM MQ message queues and gain insights from time series visualizations
● Get better platform performance and increase user productivity with improved logging, additional lineage information, and faster repository access

Download it

Go get your Enterprise Edition or trial version from the usual places

For CE, you can find it on the community home!

Pedro

More...

↧

Does the active transformation exist as Java object?

May 16, 2018, 11:53 pm

≫ Next: Getting each file name as a field when reading several files in Microsoft Excel input

≪ Previous: Pentaho 8.1 is available

Does the active/running transformation exist as an object that can be used in java/javascript code?

I have an existing set of template-style jobs and transformations (made by someone else) that makes heavy use of parameters and variables that are loaded from databases at runtime (often in multiple steps). For debugging and analysis purposes I would like to save or log copies of the transformations as they are at runtime, so that I can open them in Spoon and see what was executed, rather than extracting all of the variables from the log and manually plugging in all the values for each sub-job and transformation.

I know that PDI has Job and Trans objects with a nice "GetXML()" method, so I am hoping to use that to write out the XML to file, but I have no clue if the running transformation itself has such an object and what it might be named.

↧

Getting each file name as a field when reading several files in Microsoft Excel input

May 18, 2018, 5:48 am

≫ Next: Mail step with attachment name with "%" generates java exception

≪ Previous: Does the active transformation exist as Java object?

Hi!

So I have a folder with files from different sources that I need to combine and process.
They all share the same field structure and name convention, so I am using one single input step and a regular expression to read them all.
But the problem is that I need to identify the source of each line, and there is no way to know where they come from once I append them.
Having one step per source is not an option because there are many sources and they vary from day to day.

Example:
I have these files:

Quote:

Daily_Src1_20180517.xlsx
Daily_Src2_20180517.xlsx
Daily_Src4_20180517.xlsx
Daily_Src5_20180517.xlsx

That have this structure:

Quote:

Field1,Field2,Field3
aaaaaa,aaaaaa,aaaaaa
aaaaaa,aaaaaa,aaaaaa

And I want to read them with one single step and get something like this:

Quote:

Field1,Field2,Field3,Source
aaaaaa,aaaaaa,aaaaaa,Daily_Src1_20180517.xlsx
aaaaaa,aaaaaa,aaaaaa,Daily_Src1_20180517.xlsx
aaaaaa,aaaaaa,aaaaaa,Daily_Src1_20180517.xlsx
bbbbbb,bbbbbb,bbbbbb,Daily_Src2_20180517.xlsx
....

Is there a way to do this?

Thank you

↧

Mail step with attachment name with "%" generates java exception

May 19, 2018, 6:58 am

≫ Next: Change Number Format

≪ Previous: Getting each file name as a field when reading several files in Microsoft Excel input

We need to process emails and their attachments. We have not control of what the attachment are been named, we found that if the mail attachment name has "%" on it then an java exception get generated. In this case we create and email and we attached a file name 46575400%.what.pdf as an example

2018/05/18 17:56:15 - GeteMails - ERROR (version 8.0.0.0-28, build 8.0.0.0-28 from 2017-11-05 07.27.50 by buildguy) : Unexpected error:
2018/05/18 17:56:15 - GeteMails - org.pentaho.di.core.exception.KettleException:
2018/05/18 17:56:15 - GeteMails - Error saving attached files of message number 3 in folder [c:\processing\Deleteme]!
2018/05/18 17:56:15 - GeteMails -
2018/05/18 17:56:15 - GeteMails - org.pentaho.di.core.exception.KettleException:
2018/05/18 17:56:15 - GeteMails - org.pentaho.di.core.exception.KettleException:
2018/05/18 17:56:15 - GeteMails - org.pentaho.di.core.exception.KettleFileException:
2018/05/18 17:56:15 - GeteMails -
2018/05/18 17:56:15 - GeteMails - Unable to get VFS File object for filename 'c:\processing\Deleteme/46575400%what.pdf' : Invalid URI escape sequence "%wh".
2018/05/18 17:56:15 - GeteMails - Invalid URI escape sequence "%wh".

Does anybody has any idea for a workaround?

regards

↧