Parse a CSV file with comma within separator

September 21, 2016, 3:51 am

Hi,

I am having a csv file with comma separator. But the problem is that there are few rows which have comma within upper quotes, i.e. comma within a field. And so pentaho parses these rows with those commas as well, as separate field.

Let following be the sample csv file. First line is the header line.

name, title, age, degree, college
john,fork,23,btech,abcd
rita,saha,21,ba,dvdf
reme,singha,34,"btech,ba,msc",abcd
seema,tupi,56,"b.ed,phD,mphil,ma-music","ju,cu"

Any suggestion as how to parse the above file??

Thanks in advance for your support.

↧

Pentaho Ctools Release 16.08.18

September 21, 2016, 6:10 am

≫ Next: Converting String to Date

≪ Previous: Parse a CSV file with comma within separator

Release Notes - Community Dashboard Editor - Version 6.1-16.08.18

Bug

[CDE-778] - Error in components when resultSet is empty and the 'Column Type' attribute is defined.
[CDE-802] - Export Button gets wrong results when tables use input html tags other than filter's one
[CDE-808] - Incorrect text in Map Component Reference
[CDE-822] - CGG Dial Component replicates when parameters change
[CDE-825] - DashboardComponent: parameter propagation should take into account all the mapped parameters
[CDE-831] - Sample View Manager doesn't open (only Legacy)
[CDE-837] - FilterComponent - html injection
[CDE-858] - Openlayers map not cleaning selection when some features are loaded selected
[CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.

Improvement

[CDE-824] - DashboardComponent: Make parameter propagation happen both ways
[CDE-826] - DashboardComponent: Expose a way to turn on and off the parameter propagation
[CDE-840] - Expose a option in the dashboard editor to modify the failureCallback property of the tablecomponent

New Feature

[CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data

For 5.x:
Release Notes - Community Dashboard Editor - Version 16.08.18

Bug

[CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
[CDE-858] - Openlayers map not cleaning selection when some features are loaded selected

Release Notes - Community Dashboard Framework - Version 6.1-16.08.18

Bug

[CDF-449] - CCC - On timeseries charts, the zeroline of the base axis shows up on 1970
[CDF-452] - CCC - Treemap - specifying the colorMap option throws an error
[CDF-809] - Samples under plugin-samples > CDF have some issues
[CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
[CDF-865] - Dashboard require - Radio Button component default type "checkbox"
[CDF-871] - PrptComponent sample executes the components in an arbitrary order
[CDF-872] - Missing dependency in the CDF AMD broadcast sample
[CDF-875] - On the Filter Component with the sortByLabel enabled, and a page length defined: scrolling down to fetch more items creates a loop until the last selected item is found
[CDF-881] - TableComponent with paginateServerSide set to true, will trigger 2 queries
[CDF-888] - Table Component cannot updated if paginate Server side is true
[CDF-895] - CCC - Stacked area chart has an incorrect behaviour when one of the series has null and not null values.
[CDF-896] - CDF Storage: any user is able to change the storage of another user
[CDF-912] - CCC - Axis tick label overflows - layout fails to take axis offset into account
[CDF-913] - CCC - Axis tick label overflows - ignores fixed or maximum axis sizes
[CDF-917] - CCC - Axis tick label overflows - fails on fixed categorical bands layout
[CDF-918] - CCC - Metric/Scatter chart - cannot set axis offset to 0
[CDF-919] - CCC - Axis tick label overflows - fails when OverlappedLabelsMode is "hide"
[CDE-778] - Error in components when resultSet is empty and the 'Column Type' attribute is defined.
[CDE-837] - FilterComponent - html injection

Improvement

[CDF-670] - As a dashboard developer using a TableComponent, I would like to be able to provide a friendly error message when a query fails

New Feature

[CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data

Story

[CDF-713] - As a user, I'd like to be able to easily change the datasource used by a component in the preExec function.

For 5.x:
Release Notes - Community Dashboard Framework - Version 16.08.18

Bug

[CDF-449] - CCC - On timeseries charts, the zeroline of the base axis shows up on 1970
[CDF-452] - CCC - Treemap - specifying the colorMap option throws an error
[CDF-826] - BlockUI does not appear when a certain dashboard is embedding another one using dashboard component.
[CDF-865] - Dashboard require - Radio Button component default type "checkbox"
[CDF-871] - PrptComponent sample executes the components in an arbitrary order
[CDF-872] - Missing dependency in the CDF AMD broadcast sample
[CDF-895] - CCC - Stacked area chart has an incorrect behaviour when one of the series has null and not null values.
[CDF-912] - CCC - Axis tick label overflows - layout fails to take axis offset into account
[CDF-913] - CCC - Axis tick label overflows - ignores fixed or maximum axis sizes
[CDF-917] - CCC - Axis tick label overflows - fails on fixed categorical bands layout
[CDF-918] - CCC - Metric/Scatter chart - cannot set axis offset to 0
[CDF-919] - CCC - Axis tick label overflows - fails when OverlappedLabelsMode is "hide"

New Feature

[CDF-603] - CCC - Realtime - Sliding Window to cope with constantly incoming data

Release Notes - Community Data Access - Version 6.1-16.08.18

Bug

[CDA-183] - CDA File Editor is not working correctly
[CDA-188] - Using "security:principalRoles" as cache key creates a new entry on the cache every time the query is run

Release Notes - Community Graphics Generator - Version 6.1-16.08.18

Improvement

Upgraded to last CCC release

More...

↧

Converting String to Date

September 21, 2016, 6:10 am

≫ Next: Slowness issue when opening job or transformation

≪ Previous: Pentaho Ctools Release 16.08.18

I'm not able to convert a string to a date.

My string has this format:

yyyy-MM-dd-HH.mm.ss.SSSSSS

I've uset the Select Value step setting Type = Date and the format yyyy-MM-dd-HH.mm.ss.SSSSSS, but I get this error:

Quote:

2016/09/21 15:08:13 - Select values 5.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : date_string String(26) : couldn't convert string [2016-09-20-15.40.11.603173] to a date using format [yyyy-MM-dd-HH.mm.ss.SSSSSS]
2016/09/21 15:08:13 - Select values 5.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Unparseable date: "2016-09-20-15.40.11.603173"

How can I fix it?

Thank you very much!

↧

Slowness issue when opening job or transformation

September 21, 2016, 6:18 am

≫ Next: Dynamic reports generation from pie chart

≪ Previous: Converting String to Date

Hi Sir,Madam,

May be my question is silly but currently facing i am facing issue with slowness.

Previously i have limited assets but my file repository assets increasing day by day. currently my file repo assets is 34 MB. currently i am facing issue with slowness issue whenever i am opening Jobs or transformation..

could you please help me is there any possibility to achieve this slowness issue.

PDI CE 6.1 , MySQL, Windows OS, java 1.7

Thank you

↧

Dynamic reports generation from pie chart

September 21, 2016, 8:33 am

≫ Next: Remove both duplicate rows

≪ Previous: Slowness issue when opening job or transformation

Hi,
My requirement is to display pie chart and on click of a pie slice display the drill down details in a tabular structure for that slice.
Initially there wont be any tabular data , only when the user clicks on pie slice the details should be displayed.

1)Do i need to use sub reports?
2)How to pass the value of clicked pie slice to the second dataset to retrieve the corresponding details.

can anyone please help.

Thanks,
Priya.

↧

Remove both duplicate rows

September 21, 2016, 11:25 am

≫ Next: Java.lang.OutOfMemoryError: Java heap space

≪ Previous: Dynamic reports generation from pie chart

Hello guys.

I have a excel file with some records about sales and in this records there are records that was sold and the same record canceled, in another row. So, I don't need both records.

I sorted this excel and I used the Analytic Query:

analytic_query.jpg

At the moment this solve my problem but is there another conventional way or a best practice to do this?

Thanks.

Attached Images

analytic_query.jpg (31.5 KB)

↧

Java.lang.OutOfMemoryError: Java heap space

September 21, 2016, 12:08 pm

≫ Next: issue while loading 60 MB csv or excel into sql server - outofmemory exception

≪ Previous: Remove both duplicate rows

Hi everyone, my question is:
I have a java web application running in a tomcat server, from the application calls a kettle job. This job contains one transformation that includes 12 input table and they write the query results in 12 txt files (size 1MB to 25MB).
Before the first execution the memory consumption is 720 MB, after this the memory grows up to 1.3GB in the third 2.4 and so on until throwing a excepecion.
I've tried to separate the main job in two jobs and each job has one transformation with half the inputs table. I have placed steps lock to serialize but nothing has worked.
Now I am analyzing the application with the tool jvisualvm.
What am I doing wrong?.

Kettle Version: 4.2.0
Tomcat 7
Java 7

Thanks in advance.

↧

issue while loading 60 MB csv or excel into sql server - outofmemory exception

September 21, 2016, 12:24 pm

≫ Next: MS SQL step not returning any results

≪ Previous: Java.lang.OutOfMemoryError: Java heap space

Hello, I am completely new to this & I am trying to understand the limitation on this tool

in my first run, I was able to load 100k csv list (7 mb size) into my local sql server but when i tried to load 500k csv (or excel) list (60 mb size) into my local sql server, pentaho timed out as it was not able to read such big list.

my question is, is there any published limit that pentaho data integration tool can only handle such number of records? if there is, please do let me know

Thanks,
Nimesh

↧

MS SQL step not returning any results

September 21, 2016, 4:56 pm

≫ Next: Non-aggregated (or semi-aggregated?) measure for "Targets"

≪ Previous: issue while loading 60 MB csv or excel into sql server - outofmemory exception

Hi,

I'm having trouble with getting successful results from a sql query step in my transformation.
Basically, what I'm trying to do is:

1. retrieve a set of id's from Table X
2. combine the results into a single row (comma delimited and single quotes around each ID)
3. use that list as the input to a sql query on Table P

I've got it all working and running with no errors at all... But the issue is that there are no results being returned by the final step (sql query).

My database connections are good.
I've hard coded my list of id's into a separate step so my query is good.

Here is my what my transformation looks like:
Screen Shot 2016-09-21 at 4.23.07 PM.jpg

Here's what the Input to my final sql step looks like to Table P:
Screen Shot 2016-09-21 at 4.45.01 PM.png

Here's what my sql query looks like (the input list is for the ID argument):
select *
from table_a (nolock)
where source = '${source}'
and ID in (?)

pdi version 6.1 community

Attached Images

Screen Shot 2016-09-21 at 4.23.07 PM.jpg (9.0 KB)
Screen Shot 2016-09-21 at 4.45.01 PM.png (9.3 KB)

↧

Non-aggregated (or semi-aggregated?) measure for "Targets"

September 21, 2016, 8:26 pm

≫ Next: Why my job don't proccess my txt file on batch way?

≪ Previous: MS SQL step not returning any results

Hi all, I'm hoping someone can answer this question for me - I have seen similar questions asked all over the web, some of which get solutions, but nothing I have seen directly helps by business problem.

We have a cube which uses a "Department" dimension with 9 levels, and a standard Date dimension. Measures include "Incident Count" which is a distinct-count aggregator.
A good example of a typical output would be a line graph visualization which shows over a 12 month period, for a given Level 3 Department, how many incidents are occurring month-by-month. This is all working happily.

We want to introduce the concept of "Incident Targets" - so that for each Department we can capture how many Incidents we would like to have, along with the already existing data we have about how many Incidents actually occurred. This would allow us to plot the actuals versus the targets on a line graph etc.
Targets would typically be captured and reported on per-month

The trick is that the targets should not be aggregated on the Department dimension.
If Department A is the parent of Departments X, Y and Z... and Departments X, Y and Z all have targets of 10... it is not guaranteed that the target of Department A is 30. It just doesn't work like that on the business side - different KPIs and management philosophies at different levels of the business etc.

I guess that aggregating over time does make sense... a monthly target of 10 over a year would give an annual target of 120.

So assuming my data captures a Target for every Department/Month combination... is there a way to define the Target measure (or dimension?) or refer to it in MDX so that it will not get aggregated - it will just return the single value stored... while any other measures in the query continue to be aggregated as normal? Ideally I am looking for a solution that would work in Analysis Reports - where the user can run the reports at any Department level. However, I'll take ANY solution!!

Any ideas?

↧

Why my job don't proccess my txt file on batch way?

September 21, 2016, 10:36 pm

≫ Next: PageBreak on Detail's Section

≪ Previous: Non-aggregated (or semi-aggregated?) measure for "Targets"

I have a transformation with following sequence:

Txt File -> Java Validation (User Define Class) -> Tbl Output

My txt file have one million of rows approximately. I thought Pentaho proccess my file of batch way (Load part by part of my file). But, when I see log file. I've found this:

Code:

2016/09/21 23:09:54 - TXT_FILE_ENTRADA.0 - Opening file: file:///opt/tools/PENTAHO_JOBS/Myfile.txt

2016/09/21 23:09:58 - TXT_FILE_ENTRADA.0 - linenr 50000

2016/09/21 23:10:01 - TXT_FILE_ENTRADA.0 - linenr 100000

2016/09/21 23:10:04 - TXT_FILE_ENTRADA.0 - linenr 150000

2016/09/21 23:10:07 - TXT_FILE_ENTRADA.0 - linenr 200000

2016/09/21 23:10:10 - TXT_FILE_ENTRADA.0 - linenr 250000

2016/09/21 23:10:13 - TXT_FILE_ENTRADA.0 - linenr 300000

2016/09/21 23:10:16 - TXT_FILE_ENTRADA.0 - linenr 350000

2016/09/21 23:10:19 - TXT_FILE_ENTRADA.0 - linenr 400000

2016/09/21 23:10:23 - TXT_FILE_ENTRADA.0 - linenr 450000

2016/09/21 23:10:26 - TXT_FILE_ENTRADA.0 - linenr 500000

2016/09/21 23:10:29 - TXT_FILE_ENTRADA.0 - linenr 550000

2016/09/21 23:10:32 - TXT_FILE_ENTRADA.0 - linenr 600000

2016/09/21 23:10:35 - TXT_FILE_ENTRADA.0 - linenr 650000

2016/09/21 23:10:46 - TXT_FILE_ENTRADA.0 - linenr 700000

2016/09/21 23:10:55 - TXT_FILE_ENTRADA.0 - linenr 750000

2016/09/21 23:11:09 - TXT_FILE_ENTRADA.0 - linenr 800000

2016/09/21 23:11:24 - TXT_FILE_ENTRADA.0 - linenr 850000

2016/09/21 23:11:28 - TXT_FILE_ENTRADA.0 - linenr 900000

2016/09/21 23:11:31 - TXT_FILE_ENTRADA.0 - linenr 950000

2016/09/21 23:11:34 - TXT_FILE_ENTRADA.0 - linenr 1000000

2016/09/21 23:11:37 - TXT_FILE_ENTRADA.0 - linenr 1050000

2016/09/21 23:11:41 - TXT_FILE_ENTRADA.0 - linenr 1100000

2016/09/21 23:11:48 - TXT_FILE_ENTRADA.0 - linenr 1150000

2016/09/21 23:11:55 - TXT_FILE_ENTRADA.0 - Finished processing (I=1192940, O=0, R=0, W=1192940, U=1, E=0)

2016/09/21 23:11:57 - User Defined Java Class 2.0 - Finished processing (I=0, O=0, R=1192940, W=1192940, U=0, E=0)

I see that Pentaho loads all rows of my file before pass to Java step. In any case you correct me if I'm wrong.

But I'm correct. Please, tell me, how can Kettle process my file on batch way? (First It loads 50 000 rows, insert it on BD and continue with next 50000 rows)

Thank's a lot

↧

PageBreak on Detail's Section

September 21, 2016, 11:50 pm

≫ Next: join of the two tables of two different DB

≪ Previous: Why my job don't proccess my txt file on batch way?

hi... i need to create a page break if number of row in details section more than 15 row , then row 16 will appear on page 2.

i tried PageBreak-after on Detail Section 's Style with this formula

=IF([itemCountRunning]>15;true;false) //its means when itemCountRunning is bigger than 15 then pagebreak is true, then false

but its not success .. the row of 16 still on the first page..

↧

join of the two tables of two different DB

September 22, 2016, 4:16 am

≫ Next: Getting Multiple mails from mail step?

≪ Previous: PageBreak on Detail's Section

Hello,
I have two tables that I want is a jointureTable A and Table B

whose date of table A should be between Date 1 and Date 2 of Table B
and a condition that a table field = A "s" for example
knowing that the two tables are of two different data bases
how I should do and what are the steps to follow in my kettle

↧

Getting Multiple mails from mail step?

September 22, 2016, 4:43 am

≫ Next: Null value in formula

≪ Previous: join of the two tables of two different DB

Hi ,

i designed a transformation like below. in the below image created variables for mail configuration and used in mail step.after that i used this transformation in job .after exuting the job getting multiple mails to my inbox.please help me on this?:(

transformation image.PNG

Attached Images

transformation image.PNG (7.8 KB)

↧

Null value in formula

September 22, 2016, 4:59 am

≫ Next: Unable to set manager value if manager is null

≪ Previous: Getting Multiple mails from mail step?

hi..i have some problem

i have a boolean field on my database. this field has value [true or false AND NULL]

in my formula...
i create like this..

=IF([includeppn]= TRUE();([totalQtyXprice]-[totalDariDiskonPerRow])*(100/110);IF([includeppn]=NULL();[totalQtyXprice]-[totalDariDiskonPerRow];[totalQtyXprice]-[totalDariDiskonPerRow]))

its look like Boolean type cannot read as NULL

↧

Unable to set manager value if manager is null

September 22, 2016, 7:32 am

≫ Next: Loop XPath is empty!

≪ Previous: Null value in formula

Hi,

I am using Kettle - Spoon release -5.04.0.1-130.

I am unable to set the manager value, if manager is null.
below is the javascript code.

var ManagerID;
if (ManagerID == "") {
ManagerID = 'XXX';
}
this did not work and tried
var ManagerID;
if (ManagerID.isNull()) {
ManagerID = 'XXX';
}
also did not worked.

Please provide the solution for this.
Thanks in advance.

↧

Loop XPath is empty!

September 22, 2016, 8:24 am

≫ Next: Character substitution

≪ Previous: Unable to set manager value if manager is null

I made a transformation with only one "Get XML Data" step configured as follow:
- read xml from a file;
- loopxpath set up to /test
- defined a set of fields.

Running from Spoon this transformation works returning no errors, no warnings, only the expected result.
However, when I tried to run the same ktr from a java application I obtained the following error:

ERROR (version 6.1.0.1-196) : Loop XPath is empty!

but it is not true! I have already verified in the ktr file that the tag <loopxpath> has /test as value.

Any advice?
Thanks in advanced.

↧

Character substitution

September 22, 2016, 9:47 am

≫ Next: Passing Parameters into Job Executor

≪ Previous: Loop XPath is empty!

I'm trying to do a transform on a string and so far have had no luck.

I have a string that is divided by a special character and I'd like to replace that special character with a hyphen. For example, given "aýbýcýd", I'd like to return "a-b-c-d".

I tried using regular expressions and "replace in string", but the number of occurrences of the special character is variable. I also tried "execute a process", using a python script to do the substitutions, but there are over 30,000 rows in my file and it's taking too long to execute.

Anyone have any ideas? Many thanks!

↧

Passing Parameters into Job Executor

September 22, 2016, 11:34 am

≫ Next: Use of Semicolons

≪ Previous: Character substitution

Hi all,
I'm back to having some issues passing parameters between my job and transformations and could use some help.

It's happening in my clean_and_execute_run_time_schema_list transformation (TR03)

I want to pass two parameters from my executed query (name, id) to a job executor.
I configure the job executor to take my two fields from above and map them to two parameter names it is asking for (schema_name, schema_id)
- I've tried both checking and unchecking inherit all variables from the transformation btw with no difference.
Inside of that job (select_run_time_schema_job) I specify the same two parameters (schema_name, schema_id) in the job properties
I then try and send those two parameters to a transformation (TR04) select_run_time_table_read
- I map the parameters here two the local names I want to use int he transformation
  - schema_id -> run_time_id
  - schema_name -> run_time_schema
- I've checked and unchecked transformation properties for copy previous results to parameters as well as execute for every row (though there should only be one). These didn't help
The first step in this transformation is a query which uses the parameters identified.

When I set a default value in that lowest transformation (TR04) it'll work fine, but no so when I try to feed it in from the job executing it.
I can't figure out why though.

I've attached my jobs and transformations. hcp_anywhere_report_queries_main_job is the topmost level.

Attached Files

hcp_anywhere_report_transformations.zip (46.9 KB)

↧

Use of Semicolons

September 22, 2016, 11:56 am

≫ Next: CDE Table - Expand on Click

≪ Previous: Passing Parameters into Job Executor

We are using (don't laugh) version 3.2.0 of the community edition of PDI. One of the drawbacks is that a semicolon in a SQL step of a transformation is seen as a batch terminator (similar to using GO in a sqlcmd script or in SSMS. All variable declarations are lost. As a result, if a semicolon is required in code (e.g. at end of MERGE statement) all code must be built as a string and executed as dynamic SQL. If you've ever worked with dynamic SQL you know how messy that can get. Does newer version of PDI eliminate this 'limitation' regarding use of semicolons? I am looking to build my repertoire of reasons to upgrade to a later version (I've got quite a few so far).

I would also like to know if anyone is aware of considerable improvements to JDBC and other file handling drivers for connection to SQL Server between version 3.2.0 and latest. I would think ETL processing times would be greatly enhanced over the past 8 years.

TIA

JLP

↧