Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

Filling gaps in stream from previous row on specific fields only.

$
0
0
Pentaho 5.3 EE

Hi All,
I am trying to figure out a solution but really struggling to resolve...

I have a stream of data originally read from an excel spreadsheet. There are multiple spread sheets, I have the column meta data stored in a database so I know for each file which fields I am expecting to receive.
I am able to validate and read the spreadsheet, which now needs converting to a flat (delimited) file.
Some columns in the spreadsheet are NULL values.
As part of the conversion process, I need to deal with NULL values.

Some of these NULL values will be converted to blank values, others, defined in the database meta data tables, will be populated using values from the previous row.

Example input stream:
my_ref mycol1 mycol2 mycol2
1 0001 A B C
2 NULL D E F
3 0005 G NULL H
4 0007 I J NULL

Example after processing:
my_ref mycol1 mycol2 mycol2
1 0001 A B C
2 0001 D E F
3 0005 G H
4 0007 I J 0
my_ref column is defined to be copied forward / populated from previous row.
So, my_ref on row 2, would be set to the value from the previous row (0001).
There could n number of rows in this state.
There could be n number of columns in a row requiring to be copied forward.

NULL values in mycol1 / mycol2 / mycol3 are not defined to be copied forward and so will be converted to blank values (depending on data type).

I have found examples where all null values are populated from previous row values, but its the specific fields nature of my problem causing the complication.

My thoughts on a solution transformation:
  • For each row, look at each field and retrieve its field name (normalise each row to get field name / type / value?).
  • If value is NULL, Use the field name to lookup the meta data to see if this field should be populated from previous row.
  • If row is to be populated from previous, determine previous row value.
  • If not, check data type and set to 0 / blank string etc depending on data type.


I am not sure I can use the analytic query step as I do not know until run time what columns I will receive and so cannot set the steps meta data.
Also, as the key identifer (my_ref) may be NULL, it is difficult to group the records.
I don't think the analytic query step supports metadata injection?

I am not sure I can use a javascript step as I need to do the lookup.

I can retrieve the column meta before hand, so I know what fields I will receive and which should be populated from previous, but need to find a way to relate it to the incoming data in order to do the comparison.

Hopefully the above makes sense...

Thanks
Jason

logicalRoleMap keep deleted roles with the permissions this had when it was deleted

$
0
0
Hello,

I use the Rest api to access the permissions. I have detected that if I delete a role, their permissions aren't removed from the logical role map. If I change the permissions, these are modified, so once you create a role and assign it any permission, this role will be forever in the role map, with permissions if you delete the role with some permission assigned to it, or without them if you remove the permissions before.

Is it possible to modify this manually?

Thank you.

how to produce multiple files

$
0
0
I have a javascript module: see below:

var file_index_number = Math.floor( rowid/recordLimit) + 1;



the input file has 600 records,the recordlimit = 100;

what is the best way to produce 6 files with file_name_file_index_number as he name and send them separately to the common plugin?

thanks in advance,

Kettle Row Count in Pentaho data Integeration

$
0
0
Can any one please provide an example using which I can COUNT number of rows entered before and after the transformation \ Job from a table.

Issue with Block this step until steps finish

$
0
0
Hi All,

I have a report where I am deleting from a table and then adding records to the same table. The following are the steps:
1) Delete from table
2) Block this step until steps finish (inbuilt step in Pentaho DI).
3) Insert into the same table.

The job calling this report failed in production because of deadlock on this table. There was no other job using this table. The step 1 and step 3 caused the deadlock to happen. Can some one please help me telling the significance of step 2. Why didn't it work. Had it worked the deletion and insert statements would not have occurred simultaneously. Please suggest to make sure that the steps don't run in parallel. I am deciding to separate the steps into different jobs. But want to know why the step 2 doesn't work. Am I missing something in configuration.

Thanks,
Megha

How can I use kettle to connect hive2 with kerberos approving?

$
0
0
I want to use kettle to deal with data in hive2,but the hive need to approve kerberos.what can I do to connect the hive2,thanks!

Modifying XML Document (Need same format output)

$
0
0
Hi All,

I have thousands of XML documents that I need to input, modify an key value attribute, and output in the same format. An example would be:

<?xml version="1.0" encoding="utf-8"?>
<SampleXML version="3.0">
<AccountsListing name="First" code="1X">
<Imports />
<Interpret />
<ABCList date="20151231">
<ABC Account="1234567" bill="y" transaction="A" type="Open" name="Mike Jones"/>
<ABC Account="1234568" bill="y" transaction="N" type="Open" name="John Smith"/>
<ABC Account="1234569" bill="n" transaction="R" type="Open" name="Robert Xi"/>
</ABCList>
</AccountsListing>
</SampleXML>

Where if (name = "Mike Jones" change to "Michael Jones").

Everything else would stay the same. Anyone suggestions?

Problem with encoding and Report Web Viewer


Reports slow in BI Server

$
0
0
Hello,

My reports are quite slow in BI Server. For example, one of them takes about 1 minute to be available in PDF format, whereas in PRD it is ready in about 5 seconds. The querys are quite fast, but I have seen in the network the following request:

POST /pentaho/api/repos/%3Apublic%3ACognitio%3AReports%3Afunctions_improvement_data_basal_state_ef.prpt/report?ts=1459921222572&ACCESS=Total&CENTER_FILTER=%5BCenter%5D.%5BCenter%5D.%5BInstitut%20Guttmann%5D&CENTER=%5BCenter%5D.%5BAll%20Centers%5D&CATEGORY=%5BCategory%5D.%5BDa%C3%B1o%20Cerebral%5D&output-target=pageable%2Fpdf&accepted-page=-1&showParameters=true&renderMode=REPORT&htmlProportionalWidth=false HTTP/1.1
Host: localhost:8082
Connection: keep-alive
Content-Length: 0
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: http://localhost:8082
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: http://localhost:8082/pentaho/api/re...=1459921222572
Accept-Encoding: gzip, deflate
Accept-Language: es-ES,es;q=0.8
Cookie: JSESSIONID=60A66EBC7A1F93803C3D84D240121245; session-flushed=true; JSESSIONID=EBCFABD0EB3E3470A3DE8249724F27AA

with 1.0 min of response time.

This is a quite short report. With larger reports it takes several minutes, but only some seconds with PRD.

Sometimes I can see some errors about springframework saying that something failed loading the report in the logs, but this is generated correctly.

Can this be improved by changing or setting anything?

PD: I don't know if I should post this in the Reporting subforum.

Thank you.

Bar Chart label cut in pentaho report

$
0
0
Hi,

I am doing bar chart report using prd 5.1. But the problem is my label got cut at the upper of the bar. Is there any other ways that i can do to avoid the label got cut? I would appreciate all ideas, helps and responses.

Thank you.
Attached Images

Remove/Disable last line from text output

$
0
0
I'm doing a simple transformation: Load data from a table and output it to a text file.
However, by default there is an empty line at the ending of the text file output, caused by the "Add Ending line of file":
2016-04-06 10_07_11-Text file output.jpg

I see no way of disabling this, nor have I found a way to work around this.
Does anyone have suggestions? Any help would be appreciated!
Attached Images

Extract Number Field

$
0
0
Hi all
I wanna a help please
I wanna to extract ONLY CodeNumber for my Multiplie rows
Ex : Alian John c3434 i wanna just that c3434
Dude barack david V6790 i wanna just that V6790
how can i do thats ???

Save As Report with Parameters

$
0
0
My users want to save a public report in their personal folders with specific parameters that apply to them. Is it possible to do this? I tried going through PRD's attributes and PUC attributes, but the Save As option on a report is always grayed out. Am I missing something?

Thanks.

Treat return Json using Modified Java Script Value

$
0
0
Someone have any examples of how to treat return Json (url) using Modified Java Script Value?


Regards,
Santana, Marcos

Showing radio buttons on report in pentaho

$
0
0
Hi Folks!

I am facing a unique situation with pentaho report. I am creating a report which emulates the behavior of an existing web form. In the report, 2 params are passed/input by the user and report pulls data from database based on them. Some of this data is passed to text boxes and some other, needs to be passed to some questions, and the matching answers to those questions, needs to be selected in form of radio buttons. And it needs to be on report itself and not up above in the task bar as we are accustomed with pentaho. In other words, we need not just the selected answer from database, but also the options that were not selected. Is there a way in pentaho to do this? I am thinking along lines of using HTML in formula editor. Is this possible? Any suggestions will be helpful.

MDX with YTD

$
0
0
Hello.

I'm trying to make a measure using sum of another measure in this year using YTD function but I'm experiencing some strange behaviour, so I'm not sure what I'm missing here.

This is simplified example ported to FoodMart schema/DB.

1. This MDX gives the result I expected:
Code:

WITH
  MEMBER [Measures].[XYZ] AS SUM(YTD([Time].CurrentMember), [Measures].[Store Sales])
SELECT
  {[Product]} ON ROWS,
  {[Measures].[Store Sales], [Measures].[XYZ]} ON COLUMNS
FROM [Sales]
WHERE {[Time].[1997].[Q2].[5]}

Code:

|              | Store Sales | XYZ    |
+--------------+-------------+--------+
| All Products |      44456 | 226963 |

2. If I add more than 1 month in where clause the new measure is null/empty.
Code:

WITH
  MEMBER [Measures].[XYZ] AS SUM(YTD([Time].CurrentMember), [Measures].[Store Sales])
SELECT
  {[Product]} ON ROWS,
  {[Measures].[Store Sales], [Measures].[XYZ]} ON COLUMNS
FROM [Sales]
WHERE {[Time].[1997].[Q2].[4], [Time].[1997].[Q2].[5]}

Code:

|              | Store Sales | XYZ |
+--------------+-------------+-----+
| All Products |      87335 |    |

Could someone shine a bit of light how to fix this or why it works this way?

I've tested this with 3.12 and 3.13-SNAPSHOT (build 684) with same results.

Thank you!

How to add field dynamically in "ADD XML Column " Step.

$
0
0
Hi,
In ktr i have used "Table input" connector and passed query as a variable, In it I don't know how much column it will have in select statement if query is (Select * from table) and need to generate an XML output using that query dynamically. So, I want to fetch fields dynamically in "Add XML" Field. I tried with an "ETL Metadata Injection" connector but it does not support to "ADD XML" Connector.
Find Attached Sample.ktr
sample.ktr

Please provide me any suggestion if you have.
Thanks in advance.
Attached Files

Significance of Garbage Collection in Pentaho Data Integration Server tuning

$
0
0
1. Types of Garbage collection
2. Heap Memory Allocation.
3. MaxPermSize significance in sever performance
4. Dsun.rmi.dgc.client.gcInterval and Dsun.rmi.dgc.client.gcInterval
5. Best practices to improve server performance.

Can I use variable in Select Value

$
0
0
hey,

I want to know. can we use variable in select value step for field name.

please help me :)

Thanks
Rushikesh

KJB/KTR testing

$
0
0
i have a very complicated kjb to develop. I am thinking about the testing while I am thinking on how to develop this complicated kjb. the best way I can think about is to test step by step or job entry by job entry. are there a best practice in kettle. how to run this kind of unit testing? thanks,
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>