Checksum of large file

November 19, 2015, 12:19 pm

≫ Next: Unable to connect to a secured Mongo Database

≪ Previous: Data Validator - Filtering allowable values based on field in inputstream

Hi - I am currently using Calculator to create the SHA-1 checksum of my input files. Input files are between 1K and 1G. Everything was working fine, till I got to the larger files. With a larger file it fails instantly with the following exception:

Code:

2015/11/19 14:08:55 - Calculator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : UnexpectedError:

2015/11/19 14:08:55 - Calculator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : java.lang.OutOfMemoryError: Java heap space

2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.core.row.ValueDataUtil.createChecksum(ValueDataUtil.java:310)

2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.calcFields(Calculator.java:394)

2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.processRow(Calculator.java:162)

2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)

2015/11/19 14:08:55 - Calculator.0 -     at java.lang.Thread.run(Unknown Source)

With CRC-32 Adler-32 it is successful after ~25 seconds. It also fails immediately with MD5.

Does anyone have a work around for successful generation of SHA-1 for large files, or quicker generation from the other algorithms?

I know I could probably just add memory, but it would be a shame to do that for just this function when the actual file contents processes just fine.

Thanks!

↧

Unable to connect to a secured Mongo Database

November 19, 2015, 12:27 pm

≫ Next: Error using Mondrian 4 aggregate tables

≪ Previous: Checksum of large file

I have a project where I need to read data from MongoDB. I have not worked with Mongo before so I set up a MongoDB server and did some testing using Spoon's MongoDB Input Step. I was able to easily read data from a collection in a test database and write it into a SQL database.

Because of security requirements, I enabled Authorization on the Mongo Database and added a user to access that database. Since I have enabled Authentication I have been unable to connect to the Mongo database although I am able to connect with several Mongo tools that I got off of the Internet.

My guess is that I somehow need to authenticate against the admin database and use the test database and its collections which I created. I have looked through this forum and on the Internet to see if there were any posts dealing with this problem. I cannot see any solutions although I see some previous post on this subject. Has anyone been able to connect to a secured Mongo Database using the MongoDB Input Step? Any help would be greatly appreciated.

Best Regards,
Steve Johnson

pdi-ce-6.0.0.0-353
Java 1.7.0_51
Windows 7 Pro 64-bit
MongoDB 3.0.7

↧

Error using Mondrian 4 aggregate tables

November 19, 2015, 12:27 pm

≫ Next: k-means|| clustering implementation with distributedWekaHadoop??

≪ Previous: Unable to connect to a secured Mongo Database

Trying to port my schema from v3 to v4, and I'm having trouble with aggregate tables. The docs haven't even been updated yet to describe what everything is which is frustrating, and the Mondrian in Action book briefly talks about it, but that's it. There is also nothing on the internet to help with problems either.

I'm getting the following error:

Caused by: mondrian.rolap.RolapSchema$PhysSchemaException: Could not find a path from agg_year_employment to any of [Employment as employment]
at mondrian.rolap.RolapSchema$PhysSchemaGraph.findUniquePath(RolapSchema.java:1211)
at mondrian.rolap.RolapSchema$PhysSchemaGraph.addHopsBetween(RolapSchema.java:1161)
at mondrian.rolap.RolapSchema$PhysSchemaGraph.findPath(RolapSchema.java:1307)
at mondrian.rolap.RolapSchemaLoader.registerExpr(RolapSchemaLoader.java:2222)
...

Here is the relevant sections from my schema:

Code:

   <PhysicalSchema>

      <Table name='agg_year_employment'/>

   </PhysicalSchema>



<Cube name="workforce" visible="true" caption="%{cube.workforce}" cache="true" enabled="true" enableScenarios="false">        <Dimensions>

            <Dimension name="employmentDate" source="Time" visible="true" caption="%{dimension.employmentDate}"/>

            <Dimension name="employee" source="Employee" visible="true" caption="%{dimension.employee}"/>

            <Dimension name="organization" source="Organization" visible="true" caption="%{dimension.organization}"/>

            <Dimension name="workLocation" source="WorkLocation" visible="true" caption="%{dimension.workLocation}"/>

            <Dimension name="residenceLocation" source="Region" visible="true" caption="%{dimension.residenceLocation}"/>

            <Dimension name="employeeStatus" visible="true" caption="%{dimension.employeeStatus}" key="$Id" hanger="false">

                <Hierarchies>

                    <Hierarchy name="employeeStatus" visible="true" hasAll="true" caption="%{dimension.employeeStatus}">

                        <Level name="employeeStatus" visible="true" attribute="employeeStatus" hideMemberIf="Never" caption="%{level.employeeStatus}">

                        </Level>

                    </Hierarchy>

                </Hierarchies>

                <Attributes>

                    <Attribute name="employeeStatus" caption="%{level.employeeStatus}" levelType="Regular" table="Employee_Status" datatype="Integer" hasHierarchy="false">

                        <Key>

                            <Column table="Employee_Status" name="id"/>



                        </Key>

                        <Name>

                            <Column table="Employee_Status" name="default_description"/>



                        </Name>

                        <OrderBy>

                            <Column table="Employee_Status" name="id"/>



                        </OrderBy>

                    </Attribute>

                    <Attribute name="$Id" levelType="Regular" table="Employee_Status" keyColumn="id" hasHierarchy="false">

                    </Attribute>

                </Attributes>

            </Dimension>

        <MeasureGroups>

            <MeasureGroup name="workforce" type="fact" table="employment">

                <Measures>

                    <Measure name="headCount" column="headcount" formatString="#,###" datatype="Integer" aggregator="sum" caption="%{measure.headCount}"/>

                    <Measure name="terminations" column="termination" formatString="#,###" datatype="Integer" aggregator="sum" caption="%{measure.terminations}"/>

                    <Measure name="newHires" column="new_hire" formatString="#,###" datatype="Integer" aggregator="sum" caption="%{measure.newHires}"/>

                    <Measure name="retroHires" column="retro_hire" formatString="#,###" datatype="Integer" aggregator="sum" caption="%{measure.retroHires}"/>

                    <Measure name="transfers" column="transfer" formatString="#,###" datatype="Integer" aggregator="sum" caption="%{measure.transfers}"/>

                    <Measure name="hourlyRate" column="usd_hourly_rate" formatString="Currency" datatype="Numeric" aggregator="sum" caption="%{measure.hourlyRate}"/>

                    <Measure name="averageTenure" column="tenure" datatype="Integer" aggregator="avg" caption="%{measure.avgTenure}"/>

                    <Measure name="avgPerformanceBand" column="performance_band" formatString="0.0%" datatype="Integer" aggregator="avg" caption="%{measure.avgPerformanceBand}"/>

                </Measures>

                <DimensionLinks>

                    <ForeignKeyLink dimension="employee" foreignKeyColumn="employee_id"/>

                    <ForeignKeyLink dimension="organization" foreignKeyColumn="company_structure_id"/>

                    <ForeignKeyLink dimension="workLocation" foreignKeyColumn="work_address_id"/>

                    <ForeignKeyLink dimension="residenceLocation" foreignKeyColumn="residence_address_id"/>

                    <ForeignKeyLink dimension="employeeStatus" foreignKeyColumn="employee_status_id"/>

                </DimensionLinks>

            </MeasureGroup>

           <MeasureGroup table="agg_year_employment" type="aggregate">

               <Measures>

                   <MeasureRef name="headCount" aggColumn="headcount"/>

                   <MeasureRef name="terminations" aggColumn="termination"/>

                   <MeasureRef name="newHires" aggColumn="new_hire"/>

                   <MeasureRef name="retroHires" aggColumn="retro_hire"/>

                   <MeasureRef name="transfers" aggColumn="transfer" />

                   <MeasureRef name="hourlyRate" aggColumn="usd_hourly_rate"/>

                   <MeasureRef name="averageTenure" aggColumn="tenure"/>

                   <MeasureRef name="avgPerformanceBand" aggColumn="performance_band"/>

               </Measures>

               <DimensionLinks>

                   <ForeignKeyLink dimension="employee" foreignKeyColumn="employee_id"/>

                   <ForeignKeyLink dimension="organization" foreignKeyColumn="company_structure_id"/>

                   <ForeignKeyLink dimension="workLocation" foreignKeyColumn="work_address_id"/>

                   <ForeignKeyLink dimension="residenceLocation" foreignKeyColumn="residence_address_id"/>

                   <ForeignKeyLink dimension="employeeStatus" foreignKeyColumn="employee_status_id"/>

                   <CopyLink dimension="employmentDate" attribute="Year">

                       <Column table="time" name="year" aggColumn="yearquartermonth_year"/>

                   </CopyLink>

               </DimensionLinks>

           </MeasureGroup>

     </Cube>

So how is this "linked" to the Employment fact table? Do I need to use Link tags in PhysicalSchema?

↧

k-means|| clustering implementation with distributedWekaHadoop??

November 19, 2015, 6:26 pm

≫ Next: Cannot view report in different output type using pentaho ce 5.1

≪ Previous: Error using Mondrian 4 aggregate tables

Dear all...

Could anyone please help me give an example of k-means|| clustering implementation with distributedWekaHadoop, I hv already read mark hall blog[1], it seems like common k-means algorithm but not clear enough how to configure, run and evaluate cluster with hadoop. compared to traditional k-means algorithm [2] and another k-means enhancement [3], does k-means|| pretty accurate in clustering with large datasets? does k-means|| guarantee a better or optimum result, or just better performance, by mean faster computing than another k-means algorithm?. I'm sorry for my newb question, I'm really appreciate for any help can provide. Thank You.

[1] http://markahall.blogspot.co.id/2014...or-hadoop.html
[2] http://theory.stanford.edu/~sergei/p...db12-kmpar.pdf
[3] http://www.eecs.tufts.edu/~dsculley/...fastkmeans.pdf

↧

Cannot view report in different output type using pentaho ce 5.1

November 19, 2015, 7:28 pm

≫ Next: Spoon terminates while editing javascript in LINUX machine

≪ Previous: k-means|| clustering implementation with distributedWekaHadoop??

Hi,
I'm using BA server ce 5.1. My report doing fine when opened with the default type of HTML output. When I try to change the output type to another format such as pdf, excel, rtf and etc, it's not working. I still achieve the same result with the default output type. Any solution on this issue?

↧

Spoon terminates while editing javascript in LINUX machine

November 19, 2015, 7:45 pm

≫ Next: [CDE] Datasource Properties For SQL Server Connection

≪ Previous: Cannot view report in different output type using pentaho ce 5.1

Hi ,

I a using Pentaho in Linux environment. It runs fine, but terminates when I edit something in JavaScript.

Kindly suggest a solution for this

Thanks,

Hema

↧

[CDE] Datasource Properties For SQL Server Connection

November 19, 2015, 8:45 pm

≫ Next: How to freeze header in Saiku Dashboard

≪ Previous: Spoon terminates while editing javascript in LINUX machine

I'm using CDE Dashboard in Pentaho BI Server 5.4. I use mdx over mondrian jdbc. I use SQL server as the datasource.
What is properties value for Driver and URL for SQL Server datasource ?

In MySQL datasource I assign :
Driver : com.mysql.jdbc.Driver
URL : jdbc:mysql://localhost/MyDatabase

But I don't know the value for SQL Server datasource.

Thank you.

↧

How to freeze header in Saiku Dashboard

November 19, 2015, 9:46 pm

≫ Next: CWM estandar in Pentaho DI

≪ Previous: [CDE] Datasource Properties For SQL Server Connection

Hi,

I have created one saiku dashboard .

There is a vertical scroll bar in my dashboard for more data, at that time I want to freeze column header so it is visible when I scroll down.

Any advise ?

↧

CWM estandar in Pentaho DI

November 20, 2015, 2:11 am

≫ Next: Query dividing multiple statements when running Saiku

≪ Previous: How to freeze header in Saiku Dashboard

Hi,

I want to know if it's possible to use metadata of a ETL process made in other ETL tool in Pentaho DI.
This metadata is exported in CWM estandar.

Thanks in advance,

↧

Query dividing multiple statements when running Saiku

November 20, 2015, 2:18 am

≫ Next: Pivot4J not showing in Pentaho Business Analytics - Community Edition.

≪ Previous: CWM estandar in Pentaho DI

Hi,

I have one Saiku Dashboard version 3.1.8 in which when I take two dimensions, taking too much time to run.

When I checked mondrian_sql.log file then i checked there are multiple queries found.

For example , it is executing using "IN" with query for data. like select * from tb where a in (''''),select * from tb where a in ('''') .....................

Why it is taking this , any ideas??

↧

Pivot4J not showing in Pentaho Business Analytics - Community Edition.

November 20, 2015, 5:41 am

≫ Next: Connect Report Designer to BAP Data Source

≪ Previous: Query dividing multiple statements when running Saiku

I have downloaded and unzipped the latest Pentaho Business Analytics server - community edition and plugin for Pivot4J as directed in most tutorials, however, when I launch the web interface and click on New, I do not see the Create Pivot4J View as shown in most demos.I have launched the interface on 2 different systems running Windows 8.1 Pro and Windows 10 Pro OS, and both have a similar outcome.

Can someone help me out on how the Pivot4J interface can "show up" on the Pentaho Business Analytics page please?

Thank you.

Please see screenshot PBA.jpg of what I have tried to explain above.

Attached Images

PBA.jpg (30.3 KB)

↧

Connect Report Designer to BAP Data Source

November 20, 2015, 6:16 am

≫ Next: Issue with navigation component, tab name is not changing

≪ Previous: Pivot4J not showing in Pentaho Business Analytics - Community Edition.

Hi folks,

I'd like to setup Data Sources for users to access in Dashboards in the BAP (6.0.0.0-353) . I'd also like to make those Data Sources available to users building reports in the Report Designer (6.0.0.0-353), for use locally as well as being usable when they Publish the reports to the BAP for other users to run.

It seems like choosing a Community Data Access Data Source in the Report Designer should get me that capability, but instead of choosing a Data Source I believe it is expecting me to choose a .cda file from the server.

According to this old question from 2013, everytime I create a Dashboard I should get a .cda, but I'm using Pentaho 6 and only get .wcdf and a .cdfde.

Thanks for any help,
Nick

↧

Issue with navigation component, tab name is not changing

November 20, 2015, 6:29 am

≫ Next: Report in body of email?

≪ Previous: Connect Report Designer to BAP Data Source

Hi Friends,

I have created 3 dashboards using Pentaho CDE component.

Name of the dashboards are as given below,

1) Product sales by country
2) Future product sales
3) Products in development

In each dashboard, I am using 'Navigation Component' of CDE at the top so that I can open any dashboard without clicking on browse folder option. i.e. if I have opened 'Product sales by country' then I can go to 'navigation component' at the top of dashboard and select 2nd or 3rd dashboard and open in same tab.

But here issue is, when first time I select any dashboard, name of that dashboard appear in tab, but if I select any other dashboard still it shows name of first dashboard. Suppose I opened 'Product sales by country', it shows same name in tab. But if I select 'Future product sales' dashboard' from navigation menu still it shows 'Future product sales' in dashboard tab.

Please help.

↧

Report in body of email?

November 20, 2015, 7:16 am

≫ Next: Index usage while performance issue

≪ Previous: Issue with navigation component, tab name is not changing

I need to send a report in the body of an email. On version 3.6 I was able to do this with an xaction, but now they have been deprecated. Is there any way to do this in Pentaho 6?

↧

Index usage while performance issue

November 20, 2015, 7:25 am

≫ Next: How to create multiple rows from one input row

≪ Previous: Report in body of email?

Hi,

I want to ask one question if it is wrong please forgive me.

I want to load 4700000 records from source to target database, for this i am using Table input -> text file output and then Text file input -> insert/update step. i having index on my id column in source and target database and when i use insert/update step then my id column is mandatory for checking new and existing records.

when comparing id column INDEX will be useful but when it is inserting/updating then performance burden will be there... if i apply index OFF before insert/update step then comparing will be slow. am i correct ? how can i achieve performance issue on index column?.

Please tell me if my question not reach you..

select
id,
key,
--,
--,
.
.
from prd

↧

How to create multiple rows from one input row

November 20, 2015, 8:49 am

≫ Next: Read first line from multiple files

≪ Previous: Index usage while performance issue

for instance, the below input data

account_number, dob,ssn, name1, name2,name3,name4

200, 12/01/1988, 999999999, mike, Michael, john, Johnson

how can I create 4 rows of the data below in kettle

200, 12/01/1988, 99999999, mike
200,12/01/1988,99999999,michael
200,12/01/1988,999999999,john
200,12/01/1988,999999999,Johnson

thanks for your help.

↧

Read first line from multiple files

November 20, 2015, 10:50 am

≫ Next: Multi-Selector Component Tool Tip

≪ Previous: How to create multiple rows from one input row

Hi

I need to get the length of the first line of each file from a stream of filenames. I tried to read the first line of each file using Text File Input and Limit = 1, followed by a calculator step. It works with one file, but it seems the limit applies to the whole step, across files, so the Text File Input never even opens the additional files since it has already reached the limit. I'm not sure how I would accomplish this now without some very heavy read all, group by combination.

Thanks!

↧

Multi-Selector Component Tool Tip

November 20, 2015, 2:55 pm

≫ Next: MongoDB to Oracle

≪ Previous: Read first line from multiple files

Hi, I'm currently using CDE version 5.4.0.1 on the Pentaho BI Server release 5.4.0.0.128 CE to create a number of dashboards. I've got a series of multi select components that I've set up to filter my data set and I'd like to enable a tooltip when hovering over this component. I've added some text to the tooltip property under the advanced properties tab. It is my understanding that I now have to enable the tooltip functionality for that particular component's css element. I have inserted the following code into the css file but the tooltip is still not visible on the dashboard. Can anyone advise as to where I am going wrong?

Many Thanks

Richard

Code:



.multiSelector {overflow: hidden;     

width:175px;    

font-size: 10px;    

-moz-border-radius: 0 0 0 0;    

-webkit-border-radius: 0 0 0 0;    

border-radius: 0 0 0 0;    

color: #585858;
}



.multiSelector:hover data-tooltip {visibility: visible;

}

↧

MongoDB to Oracle

November 20, 2015, 3:02 pm

≫ Next: Trouble understanding association rules output

≪ Previous: Multi-Selector Component Tool Tip

i have a document structure in MongoDB with nested array of objects
{
"_id" : ObjectId("564f9ca55fc3ca3cad383de4"),
"deviceId" : "deviceId",
"deviceType" : "CM",
"cmMacAddress" : "00:00:00:00:00:00",
"portCapacity" : 0.0,
"make" : "CISCO",
"model" : "CI123",
"activationStatus" : "READY",
"service" : [
{
"serviceId" : "serviceId",
"rateCode" : "1AA",
"type" : "DATA",
"rank" : "1",
"activationStatus" : "READY"
},
{
"serviceId" : "serviceId",
"rateCode" : "1AB",
"type" : "DATA",
"rank" : "1",
"activationStatus" : "READY"
}
]
}

my requirement is to update rows on oracle table . for each such document there will one entry in the device table and one or more entries in the service table. Any suggestions how to achieve the second part that is split the service array into individual objects which can be inserted into oracle SERVICE table.

↧

Trouble understanding association rules output

November 21, 2015, 1:21 am

≫ Next: ConnectWise

≪ Previous: MongoDB to Oracle

What are those values right to the itemsets (for example [A=1]:18)
If they are absolute support as I thought, why are they different for same items in different rules ?

[A=1]: 18 ==> [B=1]: 6 <conf 0.33)> lift 52.63)
...
[B=1]: 38 ==> [A=1]: 6 <conf 0.16)> lift 52.63)

Clearly I'm missing something here,
I really hope you can help me please.

↧