Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

PDI Hadoop Integration [KETTLE - NO SUPPORT FOR 2.2.0 VERSION]

$
0
0
I am writing here to expose my doubts regarding the integration between Hadoop and Kettle.


I read in the official documentation (http://wiki.pentaho.com/display/BAD/...ro+and+Version) how to set up and configure Kettle for a specific Hadoop distribution. The way to communicate with Hadoop, and all other components of the ecosystem, is to download a “shim” (a small library that intercepts API calls and either redirects or handles them, or changes the calling parameters) for a specific Hadoop distro and version.


The component above exists for all commercial distributions, even for the most updated versions. However, for vanilla Apache Hadoop there is no downloadable component that allow to develop big data solutions without a commercial version of the platform (exists only the pre-configured 0.20.2 version).


I would like to ask the following:



1) The plug-ins for vanilla Hadoop distributions are not updated, so is there any commercial or technical reasons? While hadoop supports is just for commercial version I suppose could exist commercial reasons without any specifics tecnichal reason.



2) So, if there are not any technical reasons, what is the best way to develop a plug-in for Hadoop 2.2.0?


P.S. The versions of installed components on my cluster are: HBase 0.98 + Hive 0.11 + Zookeeper 3.4.6


Thanks in advance.
Regards
Pietro and Gaetano.

csv file parsing and validating using pentaho kettle

$
0
0
Hi,

I'm using Pentaho Data Integration (kettle) 5.0.1.

What I'm looking for is :

I have a data file (input file) say .csv file and it is given below :

21,John,FL
23,,MI
2p,Taylor,FL
25,Tony,,



Also I have a text file say config.txt file where i'm defining schema of a data file

id,integer
name,string,NOT NULL
location,string



Here what I'm trying achieve is to create a job/transformation in kettle such a way that, when i read the data file it should take the schema from the config.txt file

and validate the data based on datatype,field length and nullable values. If the invalid record is found in the data file, then that error record has to move to error

file and the good validated record has to be dumped on HDFS.

So in the above example expected result(good records) is :

id,name,location
21,John,FL
25,Tony,,


Error file records are
id,name,location
23,,MI --> 2nd field value cannot be NULL
2p,Taylor,FL --> 1st field value should be integer type


Here i do not want to create the schema on the fly. So please suggest me how to read the data file by referring config.txt for the schema and how to do the validation and move validated data on HDFS.

So please let me know if the scenario is not clear.

Thanks,
Shree

Duplicate /unifiedRepository webservices

$
0
0
I am migrating from 4.8.0-stable to 5.0.1-stable. I have downloaded JBoss 6.2.2, JDK 1.7, and am working on a Windows 7 box. I have been using the install-manual.pdf as a guide for installation. I figure I either missed something or screwed something up in the process.

When I start the JBoss 6.2.2 and it tries to deploy the pentaho war, I get an error indicating that it is trying to register a duplicate /unifiedRepository webservice.

Code:

15:50:42,363 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service jboss.deployment.unit."pentaho.war".PARSE: org.jboss.msc.service.StartException in service jboss.deployment.unit."pentaho.war".PARSE: JBAS018733: Failed to process phase PARSE of deployment "pentaho.war"
        at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:127) [jboss-as-server-7.3.2.Final-redhat-2.jar:7.3.2.Final-redhat-2]
        at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
        at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]
Caused by: java.lang.IllegalArgumentException: JBAS015533: Web Service endpoint org.pentaho.platform.repository2.unified.webservices.jaxws.DefaultUnifiedRepositoryJaxwsWebService with URL pattern /unifiedRepository is already registered. Web service endpoint org.pentaho.platform.repository2.unified.webservices.DefaultUnifiedRepositoryWebService is requesting the same URL pattern.
        at org.jboss.as.webservices.metadata.model.AbstractDeployment.addEndpoint(AbstractDeployment.java:60)
        at org.jboss.as.webservices.metadata.model.JAXWSDeployment.addEndpoint(JAXWSDeployment.java:27)
        at org.jboss.as.webservices.deployers.WSIntegrationProcessorJAXWS_POJO.processAnnotation(WSIntegrationProcessorJAXWS_POJO.java:105)
        at org.jboss.as.webservices.deployers.AbstractIntegrationProcessorJAXWS.deploy(AbstractIntegrationProcessorJAXWS.java:92)
        at org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:120) [jboss-as-server-7.3.2.Final-redhat-2.jar:7.3.2.Final-redhat-2]
        ... 5 more

Can anyone point me to what can cause this to happen?

Thanks

Lancio job parallelo dal risultato di una trasformazione

$
0
0
Salve a tutti

mi trovo d'avanti ad un piccolo problema, come già accennato nel titolo ho la necessità di lanciare un job n volte in base alle righe che mi restituisce una trasformazione.

Vi posto il Job principale.

helpme.jpg

La trasformazione "RECUPERO DIPENDENZE" mi restituisce una serie di tuple nelle quali trovo un valore con cui avviare il "Job 2", come set del job ho impostato che lavori su ogni riga dell'input, fino a qui tutto ok funziona, ma il problema e che li esegue in sequenza e non in parallelo.
Se c'è da lanciare il "Job 2" n volte lo farà in maniera ciclica, attendendo sempre il termine dell'esecuzione precedente prima di riavviarle il job con il nuovo parametro, mentre io avrei bisogno che vengano lanciate più istanze del "Job 2" in parallelo per tutti i valori del risultato della tabella.

E possibile farlo ?

Vi ringrazio
Attached Images

unbuffered entity error with SOAP in PDI

$
0
0
Hello,

I'm having the following problem with a SOAP call:

2014/04/16 08:55:41 - Generate SOAP request.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Because of an error, this step can't continue:
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Can not result from [https://10.3.1.75:8090/services/AdminMgmtService]
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unbuffered entity enclosing request can not be repeated.
2014/04/16 08:55:41 - getHandle - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errors detected!
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.pentaho.di.core.exception.KettleException:
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Can not result from [https://10.3.1.75:8090/services/AdminMgmtService]
2014/04/16 08:55:41 - HTTP Post.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unbuffered entity enclosing request can not be repeated.

I was wondering if there was a way to know what is causing the "Unbuffered entity enclosing request can not be repeated".

It seems the call is never reaching the server (based on server logs not updating)

If anyone experienced this before I would be happy to know about it.
Otherwise, is there any logs in PDI which I could go read?

Thank you

Oracle DB connect error: TNS does not know SID

$
0
0
I am getting a TNS error when trying to create a new Oracle database connection:
TNS: listener does not currently know of SID given in connect descriptor

Pentaho: pdi-ee 5.0.2
Using Oracle 10g
Native (JDBC)
Entered SID in the Database Name field.
From the Database Connection/Feature list window:
Driver class: oracle.jdbc.driver.OracleDriver
URL: jdbc:oracle:thin@server_ip:1521:my_SID


I have copied the ojdbc14.jar file into data-integration/lib, which I believe is the correct version for Oracle 10g.

Please advise,
thanks,
-John

Installing a JDBC jar

$
0
0
Hi,
I need to connect PDi to a filemaker database (an old version 5).
I found a sljc.jar file (that I think it fits for the above version) and put it into the /lib folder of
my 5.0.1 PDI version.
But it always complains that the driver is not found.

Questions:
1. is the /lib the right folder?
2. dropping the file into the dir is the only thing to do? or I need to register the jar in an appropriate file? If so, which one?
3. someone has experience with filemake+PDI? Is the task possible, or am I loosing time?

Thank you in advance.
bestregards,
Nico

KettleDatabaseRepositoryDatabaseDelegate.getDatabaseID() fails in 5.0.3/PostgreSQL9

$
0
0
Looking up by name generates 'SELECT ID_DATABASE FROM R_DATABASE WHERE "name" = ?' - which fails in Postgres as name is a reserved word. Is there a way around this?

how to add a open password for excel

$
0
0
Hi All

I want to generate a excel file and add a open password to protect the data, but i found the password is protect the excel to edit.

how to do it

Thanks

kettle stop run job with no error

$
0
0
when I use shell script to run kettle, kettle job stop run and no error infomation is written to the log. I want to know why kettle stop run job . How should I do?

how to log on a url and get the session id

$
0
0
Hi All

I want to get something from a website. but the website need login . when login successfully then do the following things.

But how i can get the session id.

[CDE] wd.helpers undefined error when trying to Export Chart via Export Pop-up

$
0
0
I've followed the following tutorial (http://type-exit.org/adventures-with...ards-with-cde/) to create a basic dashboard based on the SteelWheels dataset using the latest release of CTools on Pentaho 5.0.
I've added some modifications to try out the export options and get the following error when I attempt to 'Export Chart' via the 'ExportPopupComponent'.

(Taken from Firebug console log: )

Code:

CDF: Exporting to png

script...9d5172a (line 3639)



TypeError: wd.helpers is undefined - CDF.js?v=2faa73ec4b1288e5319e3e1dec435534 (line 295)

Can anyone help to resolve this issue?

Thanks

Unable to create stock/candlestick chart in pentaho

$
0
0
Hi,

As per client requirement, I have to implement stock chart ot candle stick chart in my project. But pentaho do have this chart facility.
Please refer attachment.

So I want to know, is there any third party plug in or code available to interate with Pentaho for this kind of chart type. If yes then how to implement that?
OR
Is there any other alternative for this kind of chart type?

Thanks.
Attached Images

CSV file upload issue

$
0
0
Hi,

I get the following message when i try to upload a csv file to Pentaho BA 5.0 server:
"the compressed file uploaded contained more than one file. any compressed files must contain only one file".

The files contains approx 1.75 mil lines and is about 475Mb.

Any assistance will be appreciated.

TIA.

F

How to track user that running report with parameters

$
0
0
HI,

I am using Pentaho 4.8 BI Community Edition.

i want to know How to keep track of users that run reports with parameters ?

I Know there is a PentahoAuditLog.log file which can give us some information ,

but i am not able to find the parameters that user enters.

can anybody help me to solve this?

Thanks in advance.

The CDE component NewMapComponent doesn't work in Pentaho 5.0.5

$
0
0
The CDE NewMapComponent, component in version 5.0.1 community edition works perfectly (an example can be viewed at: http://localhost:8080/pentaho/api/re...neratedContent). However, the same NewMapComponent component in Pentaho Enterprise Edition version 5.0.5 this component does not work The error I get from firebug is.:
Dashboards.getWebAppPath is not a function
...
CDF: Object type can not be mapped to NewMapComponent a valid class


Any tips on how to solve this problem?




Thank's

Does Mondrian Support subqueries???

$
0
0
I have the following MDX query using a subquery approach ...

SELECT
NON EMPTY {{[Measures].[totalpayments]}} ON COLUMNS,
NON EMPTY {[organization_masterentity.bemaster].[master].Members} ON ROWS
FROM (Select ({[organization_masterentity.bemaster].[master].&[1]}) ON COLUMNS
FROM [opsCharges_Rev001_cub])

The syntax is correct, however I keep getting the following bracket error:

MondrianException: Mondrian Error:Syntax error at line 4, column 6, token '('

Does this means that Mondrian doesn't support subqueries??

Thanks!!

R script executor external lib load

$
0
0
Hi!

I tested the R Script Executor Step, using a script that works in R console, and it fails

The conflicting line seems to be the use of RJDBC lib which I installed previously through the console

I can test:
data(iris)
iris

works fine, but

require(RJDBC)
drvVertica <- JDBC("com.vertica.jdbc.Driver",'C:\\pdi-ce-5.0.1.A-stable\\data-integration\\lib\\vertica-jdk5-6.1.2-0.jar'); #Should do only a variable load
data(iris)
iris

throws message error "no output data" when tested, so I assume something in the JDBC loading line breaks the script

Might it be that lib loading is disabled in the step? Same code works in R console.
Could somebody shed some light?

Thank you very much!

How to copy reports to the pentaho bi server ?

$
0
0
Hello,

I have lot of prpt reports and it will be tough for me to publish them one by one.

I read somewhere that copying the folder (which contain reports) to solution folder also works ?

Can anyone give a step by step approach to accomplish this?

Thanks in advance

Suggestion Required on this KTR

$
0
0
HI all,

I being working with PDI for a month and still not able to deal with this requirment

Let me explain the requirement

1.property file data grid contains formula which results should be like field1=field1-field2
2.data grid contains value for 73 id and with data for hourly and formulae are only for 13 id

Result should contain all the id values by replacing just those 13 id values with calculated value

what i achieved till now is i am able to calculate only those 13 id value and that to for only one hourly time stamp

is there a way to dynamically split formula field and get this right please have a look at ktr if u got some time thank you

I am stuck with this for a long time


Formula Calculations.ktr
Attached Files
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>