September 25, 2014, 9:04 am
I have a text file with 1.5 million records. I need to find a specific record, using a key field with a specific id. When I find that record, I need to set a variable that will be used later downstream. Since this record is not the first record in the file, I need to search until I find it, but I don't want to waste the processing time to read all 1.5 million records, then throw away all but a single record. How can I get the record I am looking for and reduce the load time?
Thanks in advance for your response.
↧
September 25, 2014, 10:38 am
Hello everyone--
I'm a bit new to XML/ XPath in general as well as Pentaho, so bear with me on this question.
Essentially, I'm trying to fetch data from an XML-based application to ultimately automatically give me a single number "hours worked" every day.
Now, the required data in the application appears to be spread over several different XML documents (I think).
Aka there is localhost:8080/employees.xml as one file, and localhost:8080/groups.xml as another one, and localhost:8080/schedule/2014-09-20.xml as yet another.
As you can see by that last one, it appears every day's schedule data is in yet ANOTHER xml document, right?
ANYWAY -- I need two of these documents for my query. The schedules XML document contains all the granular schedule data -- and employee ID's. The GROUPS document links employee IDs to specific groups -- I need to fetch one particular group AGENTS and find the number of hours worked per day.
So any regular XML query/ database management application like BaseX for example allows you do this with XPath/ XQuery language such as:
for $x in doc(blahblah.xml)/groups[id=AGENTS], return doc(blah.schedule.xml)/schedule[agents=data($x)]/hours_worked.
HOWEVER -- I'm not sure how to replicate this in Pentaho gracefully.
I could pull ALL the schedule data for a given day, and then separately pull the list of employees in group AGENTS, then someone do a join of those two inputs I suppose --- but is there an easier way to do it in just one XML data pull?
↧
↧
September 25, 2014, 11:22 am
Hello there--
I managed to merge two XML data-sets into the raw data I need for minor calculations (in the form of one table of fields/ rows).
I need to do some very simple calculations, like subtract the Field A datetime from Field B datetime, then sum the results.
This is extraordinarily easy to do in SQL. There's even an extremely easy conversion from XML datetimes to SQL datetimes aka convert("XMLdate", datetime).
So I thought I could use the SQL scripting in Pentaho instead of laboring over JavaScript. However, this functionality seems to want to connect to a database (again, I just want to run SQL operations on fields I created from XML docs).
I can connect it to a dummy database anyway, but where does it write the output fields from the select statement? It doesn't seem like it does --- are "insert" or "update" fields the only possible output of this scripting step? Help please!
↧
September 25, 2014, 11:40 am
Hi, I am trying to create a dashboard as show I pic 1 where I click the drop down and the type shoul be populated wth value fetced from database but I am getting the "error processing component" error
So the select component is used to get the values in dropdown and, query component to display the type based on the value in dropdown.
I created a simple parameter name_param and passed it as shown in the images
, the queries used are also mentioned in the images attached. I am not sure what I am missing. Kindly check the images and let me know and suggestions ..Thanks
Regards yashwanth
↧
September 25, 2014, 1:05 pm
Hi,
I have a PDI 4.4 job that Decrypts a PGP file (to disk) and then starts a transformation to read in the new file (which is CSV once decrypted) and save the contents to a Casandra cluster. That works. Great, I'm happy.
However, I've been given a new requirement that we don't want to dump the unencrypted information to disk for both security and space reasons. I noticed that PDI 5.1 has PGP Decrypt stream step, so I thought I'd install and try out the new version. I used Load file content into memory and then funneled it into PGP Decrypt stream. For my first test with a tiny test file it worked great, I thought I was good. However, I then tried a larger file and realized that I didn't get all my data. The output from PGP Decrypt stream seems to be truncated to 32768 characters (32KiB). And there was no error or warning, it's only because I knew what my test data should have been that I realized there was a problem.
So my questions are:
1] Is this limit a hard stop, or is there a way around it?
2] If it's a limit that I can't get around, is there a way to stream the file from Decrypt files with PGP instead of writing to disk?
Thanks,
Linnea
↧
↧
September 25, 2014, 1:17 pm
Is there a way to have optional tags in the 'JSON Input' step?
ex.
{"address":{"street":"123 Main St","unit":"1","city":"San Francisco","state":"CA"}}
I want to be able to extract unit if it is there and have unit set null or empty string if it is not there.
It appears that if the path is not found ($.address.unit) the step fails with "We can not find any data with path [$.address.unit]!".
Putting an error_handling step still generates the error halting the transformation.
Is there a way to specify a JSON schema that defines the field as optional? Does the 'JSON Input' step support this?
is in the JSON schema having a definition "unit":{"type":"string";"optional":"true"} .
I have JSON input in which almost everything is optional except a few key fields and don't want to have to separate check for
each field. Is there a recommended java pkg that can do this? (I guess I would have to use a 'Modified Java_Script Value' step as
a wrapper in this case).
↧
September 25, 2014, 2:09 pm
Hello everyone,
I am trying to do a call to a web service which uses json as the input and as the output.
I have done this in PHP and Java, but I am not sure if I can do it in Kettle by using the navite steps.
I looked in the forum and I couldn't find anything like it.
Regards,
Leonardo
↧
September 25, 2014, 2:31 pm
I'm working with the metadeta-detector discussed on Matt Casters blog here:
http://www.ibridge.be/?p=273
There's a sub-transformation called 'Determine fields' that uses an ETL Metadata Injection step. The 'Determine Fields' transformation takes as input a filename, defaults to sales_data_sample.csv. It's supposed to pass this down to the injection step, but it's not. If I executed 'Determine fields' with another value for the filename ("test.csv") the injection step still uses it's own default filename (which is also sales_data_sample.csv, thus if you delete that file the trans will break).
How can I pass the filename down to the injection step? See attachment for relevant files.
Thanks!
Metadata detector.zip
↧
September 25, 2014, 3:43 pm
Hello everyone--
I just started using Pentaho this week and am very excited about its functionality so far.
Anyway, I know it has a vast array of potential outputs -- SQL based, XML files, text files, JSON -- my question is, is there a particular output or structure that would be ideal for displaying data/ dashboards on SharePoint? Would MS SQL sever be the best back-end for that purpose, or something else?
The main reason I'm thinking of SharePoint is because the company I work at loves storing data and files on SharePoint, and our login system is already based around it -- I'm just not an expert with it.
Kind of a random question -- any help much appreciated!
↧
↧
September 25, 2014, 6:30 pm
I installed Data-Integration to both my notebook and the remote ubuntu machine. Now I try to run my notebook job on the remote machine through slave server interface. However, I am not able to do that eventhough I tired very hard with many different explanations on the documentation or the forums. Here's what I have done so far.
First:
- Run the Data Integration server at the remote machine by
./ctlscript.sh start data-integration-server
- Then I try to add new slave to Data Integration designer (Spoon) with ip-number, port no. (9080, 8080, 8081 I tried all of them), some random server_name and Web_App_name and my ssh credentials to username and password.
Second:
- I run ./carter.sh from desing-tools/data-integration
- I tried the same the the local machine like above
In both try I cannot see any sign of successful connection.
Would you tell me, what am I missing ? How can I get a successful slave server connection ?
↧
September 25, 2014, 7:30 pm
I thought of posting this in a Dev forum - but couldn't an active one. We need an ability to add repositories to a remote Carte server. The most obvious place seems to be RepositoriesMeta.writeData() after updating in-memory repository list data. The update can come through a Servlet registered in kettle-servlets.xml.
Thnx.
↧
September 25, 2014, 11:37 pm
Hi,
Can you please explain What is mean by metadata driven approach in Pentaho and how it is useful in developing ETL solutions.
Thanks
Avinash
↧
September 26, 2014, 2:01 am
Background:
We are around four developers developing a kettle solution using the community edition.
Our process include a peer review step where we discuss changes before accepting them into the "master branch"
We base our process on a file based repository that in turn is version controlled by git.
Due to the lack of a "visual diff" in spoon we actually diff the .ktr/.kjb file versions in order to isolate the changes.
Unfortunately the xml is not pretty printed by spoon as the file is saved.
Question:
Is there some option in spoon to make it write "pretty-printed" xml instead of super-long lines for some parts?
(Something like the output of "xmllint --format")
(I understand that I can do this manually, but it would mess up process to do it before every commit of a change)
↧
↧
September 26, 2014, 2:43 am
Hello.
I'd really appreciate some help. I'm trying to send an email through a transformation, but I'm getting the follow error message:
We cannot find destination field "example@hotmail.com" in destination field.
Below is my transformation. I'm simply pulling some data from a database, writing a text file, and then I want to send an email. I don't understand why an input stream is needed in a transformation, it works perfectly fine in the job.
pentahoy.PNG
It seems like it wants everything added in the input stream. Such as port, sender address, context, etc.
If anybody could help then I'd be really grateful.
Thank you and regards,
Adam
↧
September 26, 2014, 5:05 am
Hello,
Is there anyone who has found how to implements the D3 component using JSON as the data in Pentaho CDE :(
Code:
d3.json("/d/4063550/flare.json", function(error, root) {
var path = svg.selectAll("path")
.data(partition.nodes(root))
.enter().append("path")
.attr("d", arc)
.style("fill", function(d) { return color((d.children ? d : d.parent).name); })
.on("click", click);
function click(d) {
path.transition()
.duration(750)
.attrTween("d", arcTween(d));
}
});
Thank you in advance :)
↧
September 26, 2014, 6:41 am
I have been unable to determine if Pentaho Report Designer has the capability to fulfill my requirement. I need a bar line graph that operates like a time-series bar graph having a Y vertical axis by % and the X horizontal axis by week having tick marks which are only labeled per Month. The secondary data source will have the moving average line. I have attached a pdf example of my current results that displays the graph and moving average correctly but is incorrectly labeling every day recorded in the x-axis. I have also attached a pdf sample of how I desire the X vertical axis to display. Is this change possible by manipulating the x-axis with a beanshell script or are there any other suggested methods?
↧
September 26, 2014, 6:47 am
Hi,
I'm using Pentaho Report Designer since a month ago.. It's very cool! Only one thing is annoying me... I set my group footer with pagebreak-after = true, and It works. The only problem, I have a blank page at the end of the report.
How can I fix this?
Thanks!
----
Almir
↧
↧
September 26, 2014, 8:21 am
Hello everybody,
I want to place messages.properties file in the same directory as my cde dashboard.
For the moment I load i18n as this with a js file :
Code:
function loadBundles(lang) {
jQuery.i18n.properties({
name:'messages',
path:'/pentaho/content/pentaho-cdf/js/',
mode:'both',
language:lang,
});
}
I try to use ${res:} but it was not working (I don't know exactly how res tag working in cde).
I found that :
Quote:
we have two level of messages files in CDF
- Global message file: located in <biserver_home>/pentaho_solutions/system/pentaho_cdf/resources/languages> and useful
when used in dashboard global templates.
- Dashboard specific message file: located in <dashboard_home> they can be distributed together with single dashboard.
To use the internationalization support you've to use the CDF.i18n tag with the following syntax:
Code:
CDF.i18n(<message_key>)
But this only work with xcdf but not wcdf.
Have you an other proposition, I'm very interested !
Thanks.
↧
September 26, 2014, 9:32 am
Friends,
I am a BI/DWH professional and interested in exploring and learning Pentaho Data Integration(PDI) and Pentaho Report Designer. Kindly direct me an authentic link to download Penataho BI 5.0 Suite (Enterprise Edition). Also suggest me some documents or guide for the same.
email :
prabhakar.ujjain@gmail.com
Prabahakar
↧
September 26, 2014, 5:53 pm
Pentaho 5.2 CE Release Candidate is available for testing.
Download the builds from here.
What's new? As usual, lot's of bug fixes and improvements. Biggest thing you'll notice in the BA server? A shiny new marketplace!
Couple of things you'll notice - we're keeping (and improving, with the help of the translators) the i18n support.
The maturity classification is also reflected there, so users and organizations can see the maturity level of the plugins. Also... a new Theme available from the marketplace, an offer by Webdetails - hope you enjoy it!
In order for us to get to a great quality release, your feedback is fundamental. Please report any issues you find to either:
Bring on the feedback!
-pedro
More...
↧