Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

Windows Server is Rate Limiting SQL connections to database over network - help?

$
0
0
Hi there-

Currently my instance of Spoon is installed/ running on a machine separate from the destination data warehouse database (for security purposes apparently).

So it must send data/ write records across a network/ web connection, unfortunately.

Due to this final step (write the data to a database over the web/ network) being a bottleneck, I have used parallelization to speed things up. I have Spoon "round-robin" the data to 15 copies of the "Table Output" SQL step. It has worked just fine in this regard. 15 connections are opened, the data is written in 20 seconds, and this process is repeated about 1,500 times for the initial data load.



Recently, we have switched our date warehouse from Machine 1 to Machine 2. Now I'm getting connection errors:

Quote:

Spoon error:
I/O Error: Connection reset by peer: socket write error

MS SQL Studio:
A transport-level error has occurred when receiving results from the server. Error: 0 – the semaphore timeout period has expired. (Miscrosoft SQL Server, Error: 121).
The process works just fine on Machine 1. It's suddenly failing on Machine 2. I'm inclined to believe that the server settings are the issue.

After talking to our DBA, I was informed that the suspected difference is that Machine 1 has service pack 1 installed for windows and Machine 2 has service pack 2 installed for windows. Apparently the latter may be blocking connections after a certain point to prevent a DDOS attack --- or enough open connections may cause the firewall to jump in.



Here are my thoughts. The connections are closing just fine. I know this because (for testing) - I used a MySQL DB on HostGator which limits me to 24 simultaneous connections (and would let me know if I exceeded it).

But maybe merely opening connections repeatedly, or sessions (?) -- is triggering some kind of security/ rate-limiting on this server.



My question is --- how do I proceed from here?

The way I see it, I have two options:

1. Make Spoon close enough to the destination DB so that parallelization (15 connections) is not necessary. So far I've been sending data across the US (ping 48ms) and across the Atlantic (ping 112ms). I'm not certain if getting a response time of 1ms (possible) by putting the two endpoints in the same room would make the SQL write speed much faster. I would love for it to be on the same computer, but getting pushback.

2. Somehow circumvent this Machine 2's response of trying to cut off my connection. I mean, sure, maybe it's some sort of spam/ DDOS defense. But honestly --- I thought it was quite common for a server to take lots and lots of connections and queries like this.

XML to JSON

$
0
0
Hello,

I want to create a transformation that reads a XML file and transform it to a JSON object. (Then I will put it in a MongoDB database)

The XML structure is as follows:

<PARENT info1="blah" info2="bleh">
<CHILD info="blih">
<GRAN GRAND="bloh"/>
<GRAN GRAND="bluh"/>
</CHILD>
...
</PARENT>

And I want a JSON output like that:

{
"PARENT" : {
"info1" : "blah",
"info2" : "bleh",
"CHILD" : {
"info" : "blih",
"GRAN" : [
{
"GRAN" : "bloh"
},
{
"GRAN" : "bluh"
}
]
}
...
}
}

The files loaded does not necessarily have the exact same attributes. Ex. one file has PARENT, CHILD, GRAN and the other has PARENT, FOO, GRAN.

I've tried to manipulate the data with "Get data from XML" and "XML Input Stream (StAX)" but after many hours of work and research could not get to the result I wanted.

Any help?

PDI - Spoon - JSON input step error msg

$
0
0
2015/09/18 18:10:45 - Json Input.0 - The data structure is not the same inside the resource! We found 1 values for json path [$1], which is different that the number returned for path [$2] (2 values). We MUST have the same number of values for all paths.
2015/09/18 18:10:45 - Json Input.0 - ERROR (version 5.3.0.2-261, build 1 from 2015-04-06_20-35-13 by buildguy) : org.pentaho.di.core.exception.KettleException:
2015/09/18 18:10:45 - Json Input.0 - The data structure is not the same inside the resource! We found 1 values for json path [$1], which is different that the number returned for path [$2] (2 values). We MUST have the same number of values for all paths.

LDAP Output for new users

$
0
0
good afternoon All,

We currently use FIM 2010 to create users in Active Director. It is wicked slow. I would love to use Pentaho Data-Integration.

I can import the users from SQL and can update the existing users in AD, but the new users alway gives the the error

"Caused by: javax.naming.directory.NoSuchAttributeException: [LDAP: error code 16 - 00000057: LdapErr: DSID-0C090C3E, comment: Error in attribute conversion operation, data 0, v1db1"

I would assume this is because I am not passing a password. I am not able to find how to do this using the ldap output.

any help would be awesome!!!!!

Can someone validate this approach

$
0
0
Hi,

I have the following requirement:

1) Load data from interface table.
2) Look up translation table.
3) If found in translation table, it means updated record. Fetch the DIMENSION key after looking up translation table and update the DIMENSION table (Type-2). It also inserts a new record with the same natural key in DIMENSION table and return the DIMENSION keu which is used to update translation table.
4) If not found in translation table, it means new record. Insert into DIMENSION table (Type-2) and return the DIMENSION key which further needs to be inserted into translation table.
5) If any error is encountered in 3) and 4), rollback the data from DIMENSION and translation table and mark the record in interface table as ERROR.

I have attached the transformations through which I have achieved the same. Can someone validate whether the approach is correct?
Attached Files

Problem to write the error in log linux- PDI 5.3

$
0
0
Hi,

I have a process in a linux server with a PDI 5.3 and I create a .sh for execute a process, for example:
fecha=$(date +"%Y-%m-%d-%H_%M_%S")
/opt/data-integration/kitchen.sh -file=/opt/ETL/Job_01.kjb -level=Minimal >/opt/ETL/Job_01_$fecha.log

When I run the process in a version 4.4 , the file "Job_01_$fecha.log" show me the process content , the error, etc.
However when I run the process in a version 5.5 the file "Job_01_$fecha.log" doesn't show me any error like in the version 4.4, but show me in the linux console.

I need that the process show me the errors in the log file, not in the console.

I don't understand why happend that if the process is the same for both versions.

I attach the log for a process in 4.4 and 5.3
Job_01_Version 5.3_2015-09-18-22_56_48.logJob_01_Version_4.4_2015-09-20-20_52_46.log



Thanks for your help.

Is it possible to upload file type PDF to Bi-server ce 5.0.1?

$
0
0
Hi there,
I got an issue to upload pdf files to bi-server ce 5.0.1, it was upload unsucessful.
I can generate pdf from prd in the biserver, but I can't upload pdf to biserver.
Anybody know how to upload pdf type file?
Thanks in advance.
Regards
Jenny:confused:

Missing plugins while Running transformation via Java involving no-sql databases

$
0
0
it's been 2 months I am facing this problems.I've tried a lot of things I could think of but still could not get it to works.

I've been trying to read the data in my mongodb (I've tried cassandra as well) but it always returned "missing plugins".

I have posted a thread about that too but still no clear answer for me. is there anyone that could help me with this problems?

http://forums.pentaho.com/showthread...issing-Plugins

Is it possible to include user comments on a dashboard?

$
0
0
Does Pentaho have a collaborative feature? Is it possible to send messages and comments in the pentaho tools? If yes, could someone please explain how it is done? Thanks in advance!!

Weka 3.6.13 and 3.7.13 releases

$
0
0


Hi everyone! Passing some great news from Mark Hall and the weka team

New versions of Weka are available for download from the Weka homepage:


  • Weka 3.6.13 - stable book 3rd edition version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.7.0_80, Win64 installer, Win64 installer incl. 64 bit JRE 1.7.0_80 and Mac OS X application (both Oracle and Apple JVM versions).
  • Weka 3.7.13 - development version. It is available as ZIP, with Win32 installer, Win32 installer incl. JRE 1.7.0_80, Win64 installer, Win64 installer incl. 64 bit JRE 1.7.0_80 and Mac OS X application (both Oracle and Apple JVM versions).


Both versions contain a significant number of bug fixes, it is recommended to upgrade to the new versions. Stable Weka 3.6 receives bug fixes only. The development version receives bug fixes and new features.

Weka homepage:
http://www.cs.waikato.ac.nz/~ml/weka/

Pentaho data mining community documentation:
http://wiki.pentaho.com/display/Pent...+Documentation

Packages for Weka>=3.7.2 can be browsed online at:
http://weka.sourceforge.net/packageMetaData/


What's new in 3.7.13?

Some highlights
---------------

In core weka:


  • Numerically stable implementation of variance calculation in core Weka classes - thanks to Benjamin Weber
  • Unified expression parsing framework (with compiled expressions) is now employed by filters and tools that use mathematical/logical expressions - thanks to Benjamin Weber
  • Developers can now specify GUI and command-line options for their Weka schemes via a new unified annotation-based mechanism
  • ClassConditionalProbabilities filter - replaces the value of a nominal attribute in a given instance with its probability given each of the possible class values
  • GUI package manager's available list now shows both packages that are not currently installed, and those installed packages for which there is a more recent version available that is compatible with the base version of Weka being used
  • ReplaceWithMissingValue filter - allows values to be randomly (with a user-specified probability) replaced with missing values. Useful for experimenting with methods for imputing missing values
  • WrapperSubsetEval can now use plugin evaluation metrics


In packages:


  • alternatingModelTrees package - alternating trees for regression
  • timeSeriesFilters package, contributed by Benjamin Weber
  • distributedWekaSpark package - wrapper for distributed Weka on Spark
  • wekaPython package - execution of CPython scripts and wrapper classifier/clusterer for Scikit Learn schemes
  • MLRClassifier in RPlugin now provides access to almost all classification and regression learners in MLR 2.4


As usual, for a complete list of changes refer to the changelogs.

Cheers,
The Weka Team

More...

How to judge ? Loading the second table using lookup or another table !!

$
0
0
Hello,

I have two tables Table 1 and Table 2 in the source DB. (Parent Child Relationship)

While applying transformation on Table 1, I will apply few validations and generate an output as flat file 1. During this process there could be entries discarded while generating in flat file 1.

Now, for Table 2, I have to take the references from the flat file 1, apply the transformation and generate flat file 2.

Do I need to use look up step or use join step while referring flat file 1 to Table 2 ? There are millions of records. Is there any other way around to simplify this ?

Please help.

Pentaho & PostgreSQL

$
0
0
Hi,

I am new to Pentaho. I just downloaded PDI. I do not have a database installed on my PC. So can I use the PostgreSQL that comes with PDI as a temporary platform to play around (creating my own database, tables etc) and get my hands dirty with PDI.

Please advise.

Input Each records as Variable

$
0
0
Hi Folks,

Please help this Gurus.

I am having a problem in Pentaho design where the ETL needs to connect to a number of DBs in different offices sequentially and then perform the same job afterward.

(The IP will be stored in a table)

For example :

(Job)
Site, IP Connect to
Site A, 1.1.1.1 1.1.1.1 do sth
Site B 2.2.2.2 ----> after 1.1.1.1 do sth finish, do sth
Site C 3.3.3.3 after 2.2.2.2 do sth finish, do sth

Repeat this step until end of rows (Site,IP)


Thanks in advanced guys.

Parameters not showing on publishing pentaho report

$
0
0
I have made a basic report that takes 6 parameter inputs as follows:
1: Group number (Drop down with query)
2: Account number (Drop down with query) (Mandatory field) (00000 default)
3: Branch number (Drop down with query)
4: Department number (Drop down with query)
5: Start Date (Date picker) (dd/MM/yyyy format)
6: End Date (Date picker) (dd/MM/yyyy format)

On publishing this report non of the paramets show, when the report is launched.

Please suggest where can i possibly be going wrong!

Awaiting help.

Regards,
Siddhant

CGG - working with extension points

$
0
0
Hi

I am struggling with the following problem - I am using something like this in Pre-execution of a chart component:

function f() {
var cd = this.chartDefinition;
cd.baseAxisLabel_text = function() {
return 'somevalue';
}
}

This is working fine in Dashboard except when using CGG in which case baseAxisLabel_text function is not being called at all. Is this supposed to be so?
The reason for all this is - I want to customize the vase axis label text, based on some of my dashboard parameters.

Thx very much

dejan

Big data resources??

$
0
0
How many sources are available on internet for hug data?

BioMe - Sparkl Transformators

$
0
0
Hi there,

We're thoroughly looking at the BioMe source code in order to expand our knowledge and we're having some troubles making the Jobs/Transformators run on PDI.

We know these transformations are launched from the Sparkl backend system, but thought they should be able to be opened using PDI (we're using 5.4).
This error pops out on many of the files
Code:

Unfortunately a file could not be loaded because of missing plugins.
Here are the missing plugins:
Step : ****
Do you want to go to the marketplace to see if the plugins are available?

There are many step values that fail, such as
- FieldMetadataAnnotation (endpoints/kettle/_BioMe/_BioMe_REFINERY.ktr)
- Publish_Model (endpoints/kettle/_BioMe/_BioMe.kjb)
- DataRefineryBuildModel (")

Anything we might be missing? Have looked up at the PDI marketplace for those step types, but didn't prompt anything.
Any tip would be greatly welcomed, thx!

edit: might be that those files are for internal usage (they are preceded by _ , so sparkl would not mark them as backend endpoints) and are not intended to work but on their development origin environment?
Would be interested on the response regarding the plugins though :)

A couple of questions about weka and bayesian network implementation

$
0
0
Hi, I have a couple of questions. Note: I'm using the java API.

The first is related to the .arff format, is it possible to have relational attributes within other relational attributes?
Next, once I have generated a Bayesian network , is it possible to extract the network information. The data I need is how each node in the network is weighted, so I can find out what attributes from the data affected the prediction the most.

Thanks in advance!

Import Repo

$
0
0
Hello.
Create a bat file as it is written here. I run, everything works. But when I run in pentaho through the shell, I get an error.

2015/09/22 13:51:24 - Import.bat - ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : (stderr) ¥¯à¥¤¢¨¤¥­­®¥ ¯®ï¢«¥­¨¥: =256m""=="".

Model to represent transformations and jobs !!

$
0
0
Hi,

Thanks in advance for your support

I have developed a POC for one of our client using PDI CE.

Instead of giving the ktr's and kjb's, I would like to show the POC in form of model (a kind of UML representation) at the first introduction.

Is there any tool that can model from the source code of PDI kjb's and ktr's (reverse engineering) ?

or

A tool where we can describe the model conveying how the ktr's and kjb's have been organised and written ?

Regards,
Hari
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>