Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

Getting issue in Date conversion when passing null value in column.

$
0
0
Hi Folks,
Getting issue in Date conversion.

Currently value of termination date column coming in a file is null but in future it can have a value.
When passing a termination date column in pentaho CSV file it is giving conversion error.

An error is with date conversation.
Datatype
Term_date_datatype.jpg

Termination_date_null.jpg

How can I handle it.
I want if value is null it should remain null if in the future we get any value it should convert it into date format.

Termination error.jpg
Attached Images

Updating schema on the server

$
0
0
Whenever I'm making changes to the schema file, I have to upload it through the BI Platform's web interface. We're using the community version 5.2.

Isn't there any other way to update the schema file? Can't I simply upload it to the server in some folder somewhere?

Kind regards,
Marcus

LDAP Security

$
0
0
Hi folks,

any one know how to return value of LDAP user group? i need to make filter which group have permission to login and which group cannot login to system.


Thanks for your help

Pass fields into stream with Execute SQL Scripts

$
0
0
Hi,
Is it possible to pass the query fields into stream with "Execute SQL script" step as like "Table Input" step?

I have a requirement where I need to select MAX(DATE) from a table and use the date as a parameter in another query based on some condition.

I have defined the transformation in this way:

Table Input(MAX DATE) --> Filter rows (Check condition) ----> (if true)--> Table input(Query with date as parameter).

Can the first Table Input be replaced with some other step?

OData connectivity

$
0
0
As follow-up of Odata in BI, is there an step to connect with an OData data source in Pentaho Data Integration - Spoon?

For example, OData.svc/ lists a number of tables from which data can be extracted and it refers to OData.svc/$metadata, where the attributes and its datatypes are defined.

Thank you for your response!

Increase width of page

$
0
0
Hi,

I need design a excel report with many columns, and the page with more width is the A4-PLUS, and this is not enough for me, I need more columns. How can I make dynamically the width?

Thanks for any help,
Pablo.

Documentation inconsistency: PostgresSQL jdbc driver

$
0
0
PDI version 5.4 ships with the PostgreSQL driver named postgresql-9.3-1102-jdbc4 (in the lib directory).

But the documentation, https://help.pentaho.com/Documentati...010#PostgreSQL, guide us to use the file postgresql-8.x-xxx.jdbc4.jar. Besides, the documentation, https://help.pentaho.com/Documentation/5.4/0D0/160/000#Data_Sources, also states to use jdbc3.

Which are the combination of database and driver versions recommended for production environment (want to use the latest PostgreSQL version possible). We don't have repositories and use the database only to store data.

Thanks in advance.

Dyrson

Connection name change in DATABASE connection settings

$
0
0
Hi,

I have 140 transformations and 30 jobs which consists of database connections i.e. SOURCE and TARGET. Now i want to modify database connections names.
Example from SOURCE to OLTP , i can modify easily but i need to map each and every transformation and job with new DB connection name. it is big process to update manually.

Is there any easy process to update DB connection for all .ktr , .kjb instead of doing manually ?.

(java 1.8, PDI-CE-5.3 File Repository, MySQL database).

connection.png
Thank in advance.
Attached Images

Get data from XML problem with Wildcard regex on Linux

$
0
0
I am currently using Version 4.4.
I have a mixed environment of development on Windows and scheduled running on Linux. I am having an issue when I try to process a given set of files that match the pattern of current_file_1.xml,current_file_2.xml,current_file_3.xml.....

I use the same file path on each system:
/home/pentaho/data/inputfiles

and the regex I am using is
current_file_.*\.xml

On Windows, this collects the files properly, but on Linux I get :

INFO 02-09 15:26:33,356 - Table output - Connected to database [DRM] (commit=10)
ERROR 02-09 15:26:33,415 - FileInputList - org.apache.commons.vfs.FileSystemException: Could not find files in "file:///home/pentaho/data/inputfiles".
at org.apache.commons.vfs.provider.AbstractFileObject.findFiles(Unknown Source)
at org.apache.commons.vfs.provider.AbstractFileObject.findFiles(Unknown Source)
at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:211)
at org.pentaho.di.core.fileinput.FileInputList.createFileList(FileInputList.java:151)
at org.pentaho.di.trans.steps.getxmldata.GetXMLDataMeta.getFiles(GetXMLDataMeta.java:1145)
at org.pentaho.di.trans.steps.getxmldata.GetXMLData.processRow(GetXMLData.java:629)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.commons.vfs.FileSystemException: Invalid descendent file name "C:".
at org.apache.commons.vfs.impl.DefaultFileSystemManager.resolveName(Unknown Source)
at org.apache.commons.vfs.provider.AbstractFileObject.getChildren(Unknown Source)
at org.apache.commons.vfs.provider.AbstractFileObject.traverse(Unknown Source)
... 8 more

The path exists and the files exist and if I specify actual file names it works, but as the list grows to over 50, I do not want to have to keep editing 2 xml steps in order to add files.

Any suggestions on what is going wrong?

Pan.bat will not execute from Windows command line

$
0
0
Hi Everyone,

When I try to run pan.bat from the Windows command line, I'm not receiving any results. I navigate to C:\Users\a75043\Desktop\PDI\data-integration-4.4.0.0> (where pan.bat lives) and run pan.bat. All I receive is the command prompt C:\Users\a75043\Desktop\PDI\data-integration-4.4.0.0>​.

Imagine that the below is my command line:

C:\Users\a75043\Desktop\PDI\data-integration-4.4.0.0>pan.bat (hit return)

C:\Users\a75043\Desktop\PDI\data-integration-4.4.0.0>

Could someone please assist? I hope this makes sense.

Thank you.

Grabbing data from months other than parameter dates for comparison

$
0
0
I'm having trouble even wrapping my head around this one. I have parameters to select a month and year (although I can change that to whatever works better) and I will be getting all data timestamped in that month. What I need is to get columns of data from the month before, after, 13, 12, and 11 months prior, one column for each of those 6 months. I'm just not sure how to access that past data when the parameter is filtering it out.

Right now the select statement I'm using to get the current month is set up as a case statement as a template for the other months.

sum(case when MonthInCalendarYear = ${Month}
then itemqty * unitprice
else null
end) as 'CM Sales'

I'm just not sure how to grab the previous or next month's data (as far as adding +1 to the month or whatever)

Any thoughts?

Thanks!

Getting error while establishing the generic database ( Yolus datasource ) connection

$
0
0
I am using pentaho build version 5.4.0.1. In which I am trying to establish a generic database( Yolus datasource ) connection in Pentaho using Yolus JDBC drivers.

I am getting below error message when I test the connection

Error connecting to database [Gas_DataBase ] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occurred while trying to connect to the database
Driver class 'com.yolus.api.yes.jdbc.YolusJDBCDriver' could not be found, make sure the 'Generic database' driver (jar file) is installed.
com/yolus/yes/sql/common/IllegalArgumentSQLException

org.pentaho.di.core.exception.KettleDatabaseException:
Error occurred while trying to connect to the database
Driver class 'com.yolus.api.yes.jdbc.YolusJDBCDriver' could not be found, make sure the 'Generic database' driver (jar file) is installed.
com/yolus/yes/sql/common/IllegalArgumentSQLException

at org.pentaho.di.core.database.Database.normalConnect(Database.java:428)
at org.pentaho.di.core.database.Database.connect(Database.java:358)
at org.pentaho.di.core.database.Database.connect(Database.java:311)
at org.pentaho.di.core.database.Database.connect(Database.java:301)
at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:80)
at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2686)
at org.pentaho.ui.database.event.DataHandler.testDatabaseConnection(DataHandler.java:546)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.ui.xul.impl.AbstractXulDomContainer.invoke(AbstractXulDomContainer.java:313)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:157)
at org.pentaho.ui.xul.impl.AbstractXulComponent.invoke(AbstractXulComponent.java:141)


I have kept all the required driver and supporting JARS in libext/JDBC directory. Also Connection URL and Credentials are verified and correct.

But still I am getting above connection error and would like to know the reason for the above error message ?
Does anyone also confirm that Pentaho supports generic database connection through com.yolus.api.yes.jdbc.YolusJDBCDriver ?


Any input will be highly appreciated.

Starting a new kitchen process from within a Transformation

$
0
0
Hallo everybody,

I'm looking for a way to start from within my transformation another (separate) kitchen instance with fire and forget. My scenario is as follows:

I have a transformation which
1. collects all data for the execution of one or more new kitchen processes
2. creates for each a command/string. This command/string basically looks like this example:

Code:

nohup /path/to/kitchen.sh -rep:"MY_PDI_FILE_REPOSITORY" -dir:"/MY/MODULE" -job:"MY_MODULE" -level:Basic -logfile:/path/to/my/logfile.log >&/dev/null &
2. shall execute each command/string by fire and forget (don't wait for any result, process goes on when calling transformation gets "killed")
3. end the transformation.

I'm using PDI 5.4 and tried the "Execute a process" step, but it waits for the process to finish and returns the output results.

Thank you in advance for any help!

Steps to Upgrade BI Community Edition 4.8 to 5.0

$
0
0
Hi,

Can any one help me to Upgrade BI Community Edition 4.8 to 5.0.?

Is there any Migration tool available for the same for Community Edition.?

Regards,
Chintan Mehta

Metadata Injection - could use some help setting up my first

$
0
0
I’m having difficulty wrapping my head around the ETL Metadata Injection step which I’m assuming is the right tool for the job I’m trying to do.
Here’s the scenario - I’ve got about 50 input files that have almost the same format, but over time some fields have been added and others have gone away. All of these should end up in the same table, skipping fields that no longer exist. Without Metadata Injection the only way I can think to do this is come up with a whole new Text Input step for every time the source files change slightly. The problem is I can’t find a simple step-by-step example of using Metadata Injection that doesn’t presuppose some knowledge I just don’t have. Can anyone point me to a solid well-documented example of same?

Here’s a quick example of a test project that I’m hoping to get working which I should be able to extrapolate.

File1
FieldA | FieldB | FieldC | FieldE | FieldF
14 | 1 | 1000 | 44.3 | BLUE



File2
FieldA | FieldC | FieldD | FieldE | FieldF | FieldG | FieldH
12 | 1021 | 4/12/2010 | 49.4 | RED | 23 | 42.314


So these two files contain the same data, but the provider of the data has added/removed certain fields over the years. The target DB Table would be defined by the set of all possible fields, but the transform performed on individual files should only import the data it contains, obviously.

Is this in fact the right tool for the job, and if so, can anybody give me guidance on how to implement?

Many thanks in advance,

-Bill

How to use Metadata Injection?

Plugins Not Working

$
0
0
Hello. This is my first posting to the forums. Our group is interested in accessing Pentaho and related third-party content from the Pentaho Marketplace. However, our server has web restrictions in order to prevent malicious web content from getting to the server. We have allowed access to the Pentaho website and to the Pentaho-related Github repository. However, we cannot allow carte blanche access to the web from this server as the server contains healthcare-related data that must be stored securely, and not only does the Marketplace lead to many multiple domains, the plugins themselves also redirect to many additional domains, which when modified could break our reports. Have any of you had experience in healthcare and faced a similar issue? Thank you, Joel

List of integers as parameter, how to use in query?

$
0
0
Hi,

I got a little problem here:
I'm trying to create a report that has a parameter "param_a" which contains a comma-separated list of integers (actually they're years like 2015,2014 etc. that the user selects in a web-frontend), the type of the parameter in pentaho is String. What I need to do is use the parameter in a sql query like this:

SELECT * FROM table WHERE columnA IN (${param_a})

columnA in the database is integer. What I tried to was creating a hidden parameter and in that parameter use CSVARRAY([param_a]) to convert param_a to an array. That didn't work out as planned because that way, the query fails because pentaho tells me I can't compare integer and var char (exception is thrown). So I guess the result of CSVARRAY is always an array of String-values.

How can I convert this array of strings into an array of integers? Did I miss a function? Can I use a custom function or / and Beanshell-Scripting to actually manipulate the array before it's passed into the query?
SQL casting is not an option, I think I can rule out query scripting as well.

Detect Substring and do a Lookup

$
0
0
I'm having trouble with a functionality that I cant figure out how to do it inside the Data Integrator.

Basically I want find a specific set of words in a column which contains a long string (about 15 to 20 words) and then if I find said word (it can be only one) I want to add it to a new column, if none is found, the new column must remain empty or set a default value.

This is a vital part of my process, and I want to be able to do it automatically. For the moment I am using other methods (a Java application) to do this sort of things, but I believe there must be a way to do it in the Data Integrator.

Hope to find some help.
Thanks in advance

Gotchas With Upserts + MySQL

$
0
0
I've been trying to avoid LOAD DATA INFILE (the MySQL Bulk Loader) because it can puts some intense load on my database and it can cause cluster replication issues.

Here are some of the alternatives to the bulk upload step that I've tried for doing Upserts (Update + Insert):

Insert/Update Step
  • Fatal Flaw: This step runs a single SELECT statement for each row. This will be seriously slow if your database is remote.


Using Table Output + Error Handling To Update

  • You will need to enable "Skip Lookup" in the update step and "Use batch update" in both steps.
  • This approach is recommended in the "Note:" section of this doc: http://wiki.pentaho.com/display/EAI/Insert+-+Update
  • You won't see any major speed benefits from batch updates without tweaking JDBC settings. See the recommendation to change useServerPrepStmts=false and rewriteBatchedStatements=true in this article: https://anonymousbi.wordpress.com/20...a-integration/
  • Fatal Flaw: Table output will stop inserting when it reaches the first duplicate in the batch when rewriteBatchedStatements=true. So, you need to enable continueBatchOnError=true and set rewriteBatchedStatements=false to allow Table Output to continue inserting once it hits an error in the batch. You can't get both the speed benefits from batch updating and all of your records inserted.
  • If you look at the queries in "show full processlist" during the "init" status when "rewriteBatchedStatements=true" you clearly see batches of values. When it's turned off, it looks like it's not batching at all.
  • I ended up only enabling rewriteBatchedStatements=true on the update step.


"Modified Java Script Value" + "Execute row SQL script"

  • The idea is to create a query for each row (using either ON DUPLICATE KEY UPDATE or REPLACE INTO) with the "Modified Java Script Value" step then feed those query strings to "Execute row SQL script".
  • Fatal Flaw: The "Execute row SQL script" step does not allow batching and will be slower than Table Output + Update.


"User Defined Java Expression" + "Execute row SQL script"
  • Haven't tried this yet, but it seems like it might be the only option.
  • The idea is to aggregate the rows and create a REPLACE INTO or ON DUPLICATE KEY UPDATE query with multiple rows in the query then pass the query to "Execute row SQL script".
  • I'm not excited about duplicating this code across all of my transformations, so it should probably be a plugin.


Other gotchas:
  • Duplicate keys in the same batch will cause deadlock errors... It can take a while to troubleshoot this.
  • With UTF8 data, the MySQL bulk uploader probably won't work unless you add -Dfile.encoding="UTF-8" to the OPT= definition in your spoon.sh file.


I'm using the MySQL JDBC connector version 5.1.35 and Kettle 5.4.0.1.

Is there a better alternative for doing Upserts without the MySQL bulk loader?
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>