Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

Chack date in name of txt file and system date for start transformation

$
0
0
Hi to all,
I have a question. I have my transformation that work ok. Now I must modify it. I have a file named with date and I must check if the date is the same of system date. If the date is the same the transformation can be execute, if check is not ok, the transformation must not start.
Can you help me?
Thanks a lot


Giorgio

Excel input that has rows represeting a group

$
0
0
Hi,

I have to read a Excel file, questionnaire, which contains rows that represent a category (check the example below). Category can be distinguished by the empty value in '#' column. Which component and how it should be used to get the category in each question-answer row?

# Question Answer
Category 1
1 Question 1 Answer 1
2 Question 2 Answer 2
Category 2
3 Question 3 Answer 3
4 Question 4 Answer 4

Table Join - Concatenate multiple rows from table B in one cell / variable

$
0
0
Hi all

I have the following situation:

- DB Lookup which gives me back 100 different car_id's from table car
- DB left join on car_colors (contains the car_id) to retrieve different information about the color (color ID, color name), each car can have n entries in car_color

I would need an Excel output which states

Car 1 Color ID: 1 - Color Name blue
Color ID: 2 - Color Name yellow
Color ID: 3 - Color Name pink
Car 2 Color ID: 4 - Color Name green
Color ID: 5 - Color Name yellow
... until Car 100

Can you tell me, how I can combine the different columns and values from table car_color in one field / variable for each row of table car

Another option I would be interested in is to loop through entries from table car_color via VB script but I neither don't know how I can set this up in Spoon.

Your help is very much appreciated!!

Best regards
Michael

DataCleaner plug-in for Kettle 6 will not start

$
0
0
I installed DataCleaner v4.5.2 (stable) in to Kettle v6.0.1 CE via the Marketplace.

When I tried to use DataCleaner from within Spoon, it complained of "No configuration file". So I created a file "configuration.txt" within the DataCleaner plugin directory and I placed in this file the path to a local copy of DataCleaner Community edition 4,5 that I downloaded from:
This fixed the "No configuration file" error, but now I get a different error when trying to use DataCleaner from within Spoon:
Unexpected error!
Message:
java.lang.NoSuchMethodError: org.datacleaner.connection.DatastoreConnectionImpl.<init>(Lorg/apache/metamodel/DataContext;Lorg/datacleaner/connection/Datastore;[Ljava/lang/AutoCloseable;)V
Level:
SEVERE
Stack Trace:
org.datacleaner.connection.DatastoreConnectionImpl. (Lorg/apache/metamodel/DataContext;Lorg/datacleaner/connection/Datastore;[Ljava/lang/AutoCloseable;)V
org.pentaho.di.profiling.datacleaner.KettleDatastore.createDatastoreConnection(KettleDatastore.java:52)
org.datacleaner.connection.UsageAwareDatastore.getDatastoreConnection(UsageAwareDatastore.java:117)
org.datacleaner.connection.UsageAwareDatastore.openConnection(UsageAwareDatastore.java:128)
org.datacleaner.job.JaxbJobReader.create(JaxbJobReader.java:430)
org.datacleaner.job.JaxbJobReader.create(JaxbJobReader.java:403)
org.datacleaner.job.JaxbJobReader.create(JaxbJobReader.java:334)
org.datacleaner.job.JaxbJobReader.create(JaxbJobReader.java:313)
org.datacleaner.actions.OpenAnalysisJobActionListener.openAnalysisJob(OpenAnalysisJobActionListener.java:188)
org.datacleaner.actions.OpenAnalysisJobActionListener.open(OpenAnalysisJobActionListener.java:115)
org.datacleaner.bootstrap.Bootstrap.runInternal(Bootstrap.java:207)
org.datacleaner.bootstrap.Bootstrap.run(Bootstrap.java:98)
org.datacleaner.Main.main(Main.java:60)
org.datacleaner.Main.main(Main.java:46)

Can anyone suggest what I can do in order to get DataCleaner to function from with Kettle/Spoon 6.0.1 CE? I have run out of ideas.

Best regards,
Jeffrey

How to build the "agile-bi" plugin?

$
0
0
Hi all,

Does anyone know how to build the "agile-bi" plugin source for Kettle 6.0.1 CE? I can see that the source is available as a git repository at:

I have cloned this repository, but I have not been able to build it successfully. There do not seem to be any build instructions included. The old instructions at http://wiki.pentaho.com/display/AGIL...lding+Agile+BI do not make sense to me and are probably out of date anyway. If, indeed, these instructions are correct, could someone please explain them in a more step-by-step understandable fashion?

This plug-in appears to provide very nice analytic features from within Spoon, if only I could manage to get it to work :mad:

Any help would be greatly appreciated.

Best regards,
Jeffrey

Kitchen.Error.No Rep Definition

$
0
0
Hi All

I have created 12 jobs .I am running those jobs using kitchen.It was ran very successfully in version 5.3.I upgraded to version 6.0.But in version 6.0 i got the error ERROR [WebjarsURLConnection] Error Transforming zip.I tried my level best to fix the issue.But i am not able to fix it.So decided to go back 5.3 version.Now in the 5.3 randomly it is not loading any one of the job says job could not be loaded.Say for example first time if run it says 3rd job cloud not be loaded from repository and it loading rest all the jobs.Second time if run it says 7th job could not be loaded.Third time if run it says 9th job could not be loaded.Like this randomly it is not loading any one of the job.It is behaving very strange.Any idea what will be the issue and how to fix it.

Thanks
Shekar

[Marketplace Spotlight] PDI Xero plugin

$
0
0
Starting a new series today - Highlighting some of the contributions made by our community. They are so many that's it's hard to track, but I will make a serious effort. After all... it's my job... :p




Marketplace Spotlight - PDI Xero Plugin

Product: PDI
Plugin: PDI Xero Plugin
Author: Bulletin.Net (NZ) Limited
Maturity Classification: Level 3, Community Lane (Unsupported)

Plugin info

This plugin allows to extract data from the Xero Accounting Software. I admit I didn't know about it (me and finances don't quite go well together), but looks really interesting.


They allow a trial period, so I went for a spin. Once I logged in to the application I went to the demo company:


Demo company dashboard

What the plugin does is allow to extract and then further process the data from Xero; After installing it from the marketplace this is what I see:

Xero GET step
So the first thing I need is to get a customer key. A quick google search takes me to http://api.xero.com, where I can register something they call a private application.

Registering a private application


Ah - a X509 Public Key Certificate. Always a great excuse to resort to my rusty old openssl skills. So I generated a self signed certificate:

$ openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes


The plugin doesn't seem to allow a passphase, so I didn't use one. After generating the certificate and uploading it to the site, I was able to get the info I needed

My test application
So now I have everything I need to test this out. I decided to extract the contacts data:

Demo company contacts


Turns out this was dead easy!

The exported results - just as expected
Conclusion



Another great contribution focused on a highly useful usecase. I'd like to thank the guys from Bulletin.Net for sharing this with the community!




-pedro




More...

How to check the execution of a scheduled job?

$
0
0
Hi everybody,
I am new on PDI and I would like to know some basic stuff. Where can I find the execution log of a scheduled job on Pentaho? Do I need the Pentaho Admin Console or something else but PDI? indeed the question is a little more deep, I need to check if all the scheduled jobs ran ok, where can I check that? Where can I manage the execution jobs?


for example,
lets say that I have a couple of Job that will be executed from 03:00 AM on, how can I check if they ran ok? Is there a Job Execution Control Console that I may handle that?

Thanks.;)

Problems generating performance Logging

$
0
0
Hi everyone,

Is it possible that with the performance monitoring enabled and with a well defined Log Table, PDI does not detect anything during its execution? In the end of the execution of the transformation there's not any row in the relational table that was defined....the most awkward thing is that the same table is used for another transformation and it works perfectly well... :confused:

Running kettle job in unix

$
0
0
Hi,
I created a kettle job and transform to manipulate a file. When i run the job in unix, processing takes hours(5+)and the log is displaying some weird data that I haven't seen in other jobs I have created. However, when I run this on my desktop, i don't have the same issues.

I have attached the kettle job and transform.

Here is a screenshot of the log (i haven't seen it display data from the file, data is displayed after pol=)
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=025081704, prev=160
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=9212127, prev=170
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=1015776013, prev=178
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=00001, prev=189
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol= , prev=195
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol= , prev=208
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol= , prev=216
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=AMERICAN CENTURY INFL , prev=247
DEBUG 09-03 08:34:04,854 - trn split D1 rec - pol=ADJ BD INV , prev=272
DEBUG 09-03 08:34:04,855 - trn split D1 rec - pol= , prev

here is the command I am using:
/interfaces/etl/client/data-integration/kitchen.sh -norep -level=Debug -log=/interfaces/etl/log/fix_trans_swb_position.log -file=/interfaces/etl/code/fix_trans_swb_position.kjb swbschpos.asc swbschposd1etl.asc swbschposd2etl.asc

Thanks
Betsy
Attached Files

Pentaho 6 CE log4j.xml

$
0
0
Hi there,

I've recently updated to pentaho 6.0 on my server and keep getting the error :

log4j:ERROR Could not parse file [plugins/kettle5-log4j-plugin/log4j.xml].
java.io.FileNotFoundException: /home/pentaho/plugins/kettle5-log4j-plugin/log4j.xml (No such file or directory)

The problem that seems to be is that
/home/pentaho/plugins/kettle5-log4j-plugin/ is not the directory where I have pentaho 6. I've even created that directory and I have uploaded manually the file but then I receive:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassCastException: org.apache.log4j.ConsoleAppender cannot be cast to org.apache.log4j.Appender

Is there a way to specify where the directory is or disable the logging ? Or maybe I am missing something ?!

CPU USAGE is reaching to 100%

$
0
0
Hi,

I am using PDI-CE-6.0,MySQL 5.6,Java 1.7 , file repository,Windows server 2008 R2. whenever i am running ETL's with Initeal load data(i.e. mean to say not the incremental data) then it is reaching to CPU usage to 100%.

I am running my job from batch file instead of spoon -> run button.

Total volume is 9400000, previouly i worked on Oracle, MS SQL Server i haven't faced but i am facing issues with MySQL database. when i search in mysql forum, there are so many facing CPU usage problem. Still i am assuming there is a hope in Pentaho forum.

I am distributing total records in below way, is it the main reason to reach 100% of CPU ?

Example :- ( Table Input : Product 2000000 records with 101 columns) --> distributed into 4 steps

distributing.jpg


Thanks,
Santhi
Attached Images

is it possible to run Pentaho using other tool but bat files and Win Task Scheduler?

$
0
0
hi everybody,
I am new on Pentaho kettle, and so far, it is not all clear for me.

Hope you guys may help me.


Well, I am gonna expose the scenario.

I have created a simple tranformation on Pentaho and then a job, where I can execute that transformation. Everything is working fine with no errors. Well, now I want to "deploy" that job in production. When I sad deploy, I ment, I need schedule it and run it somehow without using the PDI. I forgot to mention, the PDI is running on Windows Server 2012.

Should I use batch files in order to execute the job?

If so, I have no other option but schedule it with the Windows Task or another scheduler, is taht right? doesn't the PDI able to do that?

whether I have no better option, may you indicate me some deployment architecture best pratice for Pentaho?


I am not sure if I made myself clear, but I hope someone may help me.


thanks in advance.

:)

Mongo input - Endless loop detected for substitution of variable: ${url}

$
0
0
Hi,

Here's my situation :-

1) I have a transformation where I setup a json formatted field for us in a mongo input component. I 'copy the rows to results'

2) I call the mongo input transformation checking the 'copy previous results to parameter' and 'execute for every row' . In the parameters tab I set the variable.

3) in the mongo input trans, I set up the incoming parameter under settings and whne I try to use the variable as input (see below) I get endless loop for variable

pentaho_1.jpg

Any ideas - I've tried several variations and can't get to to work.

Thanks,
Attached Images

Advice on continuous data integration

$
0
0
Hello, We have an IVR application that saves call information in XML structures in real time. This data is extracted, transformed and loaded into database tables. Our client's requirement is to have the data
loaded into the database on a 'near' real time basis (meaning within a few minutes). From reading the PDI (Kettle) documentation, I understand that transformation jobs have to be run stand alone
using the Community Edition (using the Op Sys scheduler if we want it to repeat) or through the DIServer Scheduler when using the Enterprise Edition version. It appears that to meet the requirement of
near real time load, we would have to schedule a job to run every couple of minutes and due to the volume of data I am unsure that a transform job would be able to complete the ETL process in that period
of time, meaning that we would most likely end up with multiple instances running concurrently. This doesn't seem to be a very efficient way of handling our ETL requirements. Does anyone on the forum
have any experience with this kind of environment, or any advice for us? Is there any way we could set this up so that PDI would fit our requirement?

Thank you in advance!

How to make the multiple XML input in ktr?

$
0
0
Hi,

I found the step "XML Input Stream (StAX)" just operate one xml file . how to meke the ETL works when the input file is two or more xml file ?

Comaprison other research on TimeSeries forecasting

$
0
0
Hi,

I would like to do a comparison with other research that use weka timeseries forecasting preferably that use a publicly available data such as "UCI Machine Learning Repository", can anyone give me a link to any research? thanks.

Query dimension contents?

$
0
0
Is it possible to build a MDX query that just queries a cube dimension contents?

For example, for a sales cube that includes a customer dimension where the customer dimension has property columns of 'name', 'customer id', 'industry'.... return a list of all the customer dimension entry names and customer ids where the customer dimension property 'industry' is 'banking'.

Error in pan but no error in spoon

$
0
0
Hello,

I am using Pentaho CE - 5.4 with java 1.7. I am using mysql-connector-java-5.1.28.jar for mysql connector on my Ubuntu 14.04 machine. I have created a ktr file with spoon, which runs perfectly spoon, but gives error when I runs it through pan.

To execute the ktr file from pan i am executing
Code:

./pan.sh -file=/home/kaushik/Documents/test/showDataInBI.ktr -level=Rowlevel

Please look below for the error which I am getting while executing ktr file from pan.

2016/03/10 14:00:41 - Table input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : An error occurred, processing will be stopped:
2016/03/10 14:00:41 - Table input.0 - Error occured while trying to connect to the database
2016/03/10 14:00:41 - Table input.0 -
2016/03/10 14:00:41 - Table input.0 - Driver class 'org.gjt.mm.mysql.Driver' could not be found, make sure the 'MySQL' driver (jar file) is installed.
2016/03/10 14:00:41 - Table input.0 - org.gjt.mm.mysql.Driver
2016/03/10 14:00:41 - showDataInBI - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Something went wrong while trying to stop the transformation: org.pentaho.di.core.exception.KettleDatabaseException:
2016/03/10 14:00:41 - showDataInBI - Unable to get database metadata from this database connection
2016/03/10 14:00:41 - showDataInBI - at java.lang.Thread.run (Thread.java:745)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.StepInitThread.run (StepInitThread.java:69)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.init (TableInput.java:344)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.BaseStep.stopAll (BaseStep.java:2834)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.Trans.stopAll (Trans.java:1880)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.stopRunning (TableInput.java:287)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.cancelQuery (Database.java:673)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.getDatabaseMetaData (Database.java:2762)
2016/03/10 14:00:41 - showDataInBI - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
2016/03/10 14:00:41 - showDataInBI - Unable to get database metadata from this database connection
2016/03/10 14:00:41 - showDataInBI - at java.lang.Thread.run (Thread.java:745)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.StepInitThread.run (StepInitThread.java:69)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.init (TableInput.java:344)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.BaseStep.stopAll (BaseStep.java:2834)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.Trans.stopAll (Trans.java:1880)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.stopRunning (TableInput.java:287)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.cancelQuery (Database.java:673)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.getDatabaseMetaData (Database.java:2762)
2016/03/10 14:00:41 - showDataInBI -
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.getDatabaseMetaData(Database.java:2765)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.cancelQuery(Database.java:673)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.stopRunning(TableInput.java:287)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.Trans.stopAll(Trans.java:1880)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.BaseStep.stopAll(BaseStep.java:2834)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.steps.tableinput.TableInput.init(TableInput.java:344)
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.trans.step.StepInitThread.run(StepInitThread.java:69)
2016/03/10 14:00:41 - showDataInBI - at java.lang.Thread.run(Thread.java:745)
2016/03/10 14:00:41 - showDataInBI - Caused by: java.lang.NullPointerException
2016/03/10 14:00:41 - showDataInBI - at org.pentaho.di.core.database.Database.getDatabaseMetaData(Database.java:2762)
2016/03/10 14:00:41 - showDataInBI - ... 7 more
2016/03/10 14:00:41 - Table input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Error initializing step [Table input]
2016/03/10 14:00:41 - showDataInBI - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Step [Table input.0] failed to initialize!


I have copied "mysql-connector-java-5.1.28.jar" in /../data-integration/lib/ folder.

Please let me know whether I am doing it correctly or am I missing something.
Thankyou in advance.

Kaushik Karan

List or array data structure

$
0
0
Hi,

I am having a job that processes xml files and load them into stage and fact tables. In the first transformation of the job I am generating a list of files that should be processed. At the end of the job I would like to move those files into another location so I am looking for some kind of global data structure that would be populated in the first transformation and could be referenced in the last. Filenames are currently inserted into the Pentaho resultSet.

000309.jpg

Thank you.
Attached Images
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>