Quantcast
Channel: Pentaho Community Forums
Viewing all 16689 articles
Browse latest View live

Doubts about Dimension lookup/update version field

$
0
0
Hi.

I've been experimenting with PDI lately and now I'm loading some dimensions with SCD type 2. So I decided to use the Dimension lookup/update step. But there's something that confused me a bit: the version field. I don't understand why it's required, since the start and end dates already serve the purpose of versioning a business key.

I had to create an additional field in my dimensions just because of that, and I probably won't need it. So, why is it required?

Escaping Special Characters

$
0
0
My value will work as a parameter or as itself in all cases except one.

In this one particular case, I have an amersand(&) value. So the parameter is pointing to this value and it has an ampersand.

If I put in the value itself, it works fine.

If I put in the parameter pointing to the value, it does not work.

I tried putting a backword slash, \, character in front, but had no luck.

What do you guys think?

Workaround for Excel Writer NPE when no rows exist

$
0
0
Hi everyone,
I have a question about a known bug for which I really need some form of a workaround.
I have a transformation that gets data from two tables and writes it to a single spreadsheet. Each table's data goes to a different sheet in the workbook. We recently started running into issues where the job would fail with a Null Pointer Exception. After some troubleshooting, I found the JIRA above. Given that we still need this, I attempted to use the Detect empty stream step, as pictured below, as a workaround. However, if the second table input has no results, the Excel Writer corrupts the Excel file, and there is no data at all in it.
NPETrans.jpg

Any idea how I can still get the Excel file written without corruption issues, even if there are no resulting rows? We are unlikely to get any form of version 6 in our Production environment, to fix this bug, so even when that fix is released, I don't know that it will help me here.

Thanks.
Attached Images

Trouble reading xlsx produced by PDI

$
0
0
I created an xlsx Excel file using PDI/Spoon using Microsoft Excel Writer, where Extension = "xlsx [Excel 2007 and above]"

The opens fine in Excel, but getting an error when trying to read it with Microsoft Excel Input, where Spread sheet type (engine) = "Excel 2007 XLSX (Apache POI Streaming)".

The error is: "I was unable to find any fields in the Excel file(s)." This happens when I try Get fields from header row.

If I try Spread sheet type (engine) = "Excel 2007 XLSX (Apache POI)", I get Unable to open dialog for this step java.lang.OutOfMemoryError: GC overhead limit exceeded.

I found on a smaller file where I don't hit the out of memory error, using "Excel 2007 XLSX (Apache POI)" allows the fields to be read.

Another interesting thing. If I open the file in Excel 2013 and save the file, then I can read the fields no problem using "Excel 2007 XLSX (Apache POI Streaming)". So somehow PDI does not like the xlsx file created using PDI to be read using "Excel 2007 XLSX (Apache POI Streaming)". I will see if I can increase my java memory settings, but curious if there is anything I can do to get "Excel 2007 XLSX (Apache POI Streaming)" working? Don't want to have to use Excel to get the file in a readable state.

Delete step ignores FOREIGN_KEY_CHECKS=0

$
0
0
Hello,

I have a problem with delete step. It ignores FOREIGN_KEY_CHECKS=0.

I have a job in the which I disable the foreign key checks. Then, in a inner transformation I have a delete step, but this throws an error beacause a constraint fails.

If I try to delete the same row directly in MySQL Workbench, it works. So I suppose it's a delete step problem.

Any help with this?

Thank you.

Pentaho scheduling

$
0
0
Hi All

Please assist on below question please I am still new.
1 how to schedule reports on pentaho community and send report via email or send pdf,excel,csv and so forth?
2 can I create mobile app using pentaho community if answer is yes what plugins to I need to install?

Thank you in advance.

Using Pentaho MapReduce to Parse Weblog Data - can't start mapreduce with no error

$
0
0
Using Pentaho MapReduce to Parse Weblog Data - can't start mapreduce with no error


I follow http://wiki.pentaho.com/display/BAD/...se+Weblog+Data to create a job "weblog_parse_mr.kjb" and a trans "weblog_parse_mapper.ktr".


I use the following command to start this job, but mapreduce can't start and there's no error, just keep on waiting.
/mnt/kettle/data-integration/kitchen.sh -file=/home/hduser/kettle_jobs/ini_test_jobs/weblog_parse_mr_less.kjb -level=Debug


Logs as follow: (there're some Chinese character, I think they're not very important. And my English is not good, I'm sorry.)
--------------------------------------------------------------------------------------------------------------------
[hduser@master data-integration]$ /mnt/kettle/data-integration/kitchen.sh -file=/home/hduser/kettle_jobs/ini_test_jobs/weblog_parse_mr_less.kjb -level=Debug
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
16:25:47,012 INFO [KarafInstance]
*******************************************************************************
*** Karaf Instance Number: 1 at /mnt/kettle/data-integration/./system/karaf ***
*** //data1 ***
*** Karaf Port:8801 ***
*** OSGI Service Port:9050 ***
*******************************************************************************
四月 01, 2016 4:25:48 下午 org.apache.karaf.main.Main$KarafLockCallback lockAquired
信息: Lock acquired. Setting startlevel to 100
2016/04/01 16:25:48 - Kitchen - Logging is at level : 调试
2016/04/01 16:25:48 - Kitchen - Start of run.
2016/04/01 16:25:48 - Kitchen - Allocate new job.
2016/04/01 16:25:48 - Kitchen - Parsing command line options.
2016/04/01 16:25:49 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016-04-01 16:25:52.339:INFO:oejs.Server:jetty-8.1.15.v20140411
2016-04-01 16:25:52.390:INFO:oejs.AbstractConnector:Started NIOSocketConnectorWrapper@0.0.0.0:9050
log4j:ERROR Could not parse url [file:/mnt/kettle/data-integration/./system/osgi/log4j.xml].
java.io.FileNotFoundException: /mnt/kettle/data-integration/./system/osgi/log4j.xml (没有那个文件或目录)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at sun.net.http://www.protocol.file.FileURLConn...ction.java:90)
at sun.net.http://www.protocol.file.FileURLConn...tion.java:188)
at org.apache.log4j.xml.DOMConfigurator$2.parse(DOMConfigurator.java:765)
at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:871)
at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:778)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025)
at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
at org.springframework.osgi.extender.internal.activator.ContextLoaderListener.<clinit>(ContextLoaderListener.java:253)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.felix.framework.Felix.createBundleActivator(Felix.java:4362)
at org.apache.felix.framework.Felix.activateBundle(Felix.java:2149)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2072)
at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1299)
at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
at java.lang.Thread.run(Thread.java:745)

........ (text I have entered is to long , so I clear this part of logs)

2016/04/01 16:26:02 - weblog_parse_mr_less - 开始执行任务(Begin to execute job)
2016/04/01 16:26:02 - weblog_parse_mr_less - exec(0, 0, START.0)
2016/04/01 16:26:02 - START - Starting job entry
2016/04/01 16:26:02 - weblog_parse_mr_less - 开始项[Pentaho MapReduce - mr]
2016/04/01 16:26:02 - weblog_parse_mr_less - exec(1, 0, Pentaho MapReduce - mr.0)
2016/04/01 16:26:02 - Pentaho MapReduce - mr - Starting job entry
2016/04/01 16:26:02 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:///mnt/kettle/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp22/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:///mnt/kettle/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp22/lib/client/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:///mnt/kettle/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/hdp22/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/kettle/data-integration/launcher/../lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/kettle/data-integration/plugins/pentaho-big-data-plugin/lib/slf4j-log4j12-1.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016/04/01 16:26:03 - weblog_parse_mapper - 为了转换解除补丁开始 [weblog_parse_mapper]
Attempting to load ESAPI.properties via file I/O.
Attempting to load ESAPI.properties as resource file via file I/O.
Not found in 'org.owasp.esapi.resources' directory or file not readable: /mnt/kettle/data-integration/ESAPI.properties
Not found in SystemResource Directory/resourceDirectory: .esapi/ESAPI.properties
Not found in 'user.home' (/home/hduser) directory: /home/hduser/esapi/ESAPI.properties
Loading ESAPI.properties via file I/O failed. Exception was: java.io.FileNotFoundException
Attempting to load ESAPI.properties via the classpath.
SUCCESSFULLY LOADED ESAPI.properties via the CLASSPATH from '/ (root)' using current thread context class loader!
SecurityConfiguration for Validator.ConfigurationFile not found in ESAPI.properties. Using default: validation.properties
Attempting to load validation.properties via file I/O.
Attempting to load validation.properties as resource file via file I/O.
Not found in 'org.owasp.esapi.resources' directory or file not readable: /mnt/kettle/data-integration/validation.properties
Not found in SystemResource Directory/resourceDirectory: .esapi/validation.properties
Not found in 'user.home' (/home/hduser) directory: /home/hduser/esapi/validation.properties
Loading validation.properties via file I/O failed.
Attempting to load validation.properties via the classpath.
validation.properties could not be loaded by any means. fail. Exception was: java.lang.IllegalArgumentException: Failed to load ESAPI.properties as a classloader resource.
SecurityConfiguration for Logger.LogServerIP not either "true" or "false" in ESAPI.properties. Using default: true
2016/04/01 16:26:03 - Pentaho MapReduce - mr - Using org.apache.hadoop.io.Text for the map output value
2016/04/01 16:26:05 - Pentaho MapReduce - mr - Cleaning output path: hdfs://172.16.189.123:9000/user/pdi/weblogs/parse_less
2016/04/01 16:26:05 - Pentaho MapReduce - mr - Using Kettle installation from /opt/pentaho/mapreduce/6.0.1.0-386-6.0.1.0-386-hdp22
2016/04/01 16:26:05 - Pentaho MapReduce - mr - Configuring Pentaho MapReduce job to use Kettle installation from /opt/pentaho/mapreduce/6.0.1.0-386-6.0.1.0-386-hdp22
2016/04/01 16:26:05 - Pentaho MapReduce - mr - mapreduce.application.classpath: classes/,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
2016/04/01 16:26:07 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:26:17 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:26:27 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:26:37 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:26:47 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:26:57 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
......
2016/04/01 16:39:27 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
2016/04/01 16:39:37 - weblog_parse_mr_less - Triggering heartbeat signal for weblog_parse_mr_less at every 10 seconds
--------------------------------------------------------------------------------------------------------------------
As above, there's no error, and mapreduce can't start. I don't know what to do.


PS:
I have two hadoop cluster dev-environment.


one dev-environment can run this job successfully.
parts of logs as follow:
------------------------------------------------
......
SecurityConfiguration for Logger.LogServerIP not either "true" or "false" in ESAPI.properties. Using default: true
2016/04/01 16:02:08 - Pentaho MapReduce - mr - Using org.apache.hadoop.io.Text for the map output value
2016/04/01 16:02:09 - Pentaho MapReduce - mr - Cleaning output path: hdfs://192.168.124.129:9000/user/pdi/weblogs/parse_less
2016/04/01 16:02:10 - Pentaho MapReduce - mr - Using Kettle installation from /opt/pentaho/mapreduce/6.0.1.0-386-6.0.1.0-386-hdp22
2016/04/01 16:02:10 - Pentaho MapReduce - mr - Configuring Pentaho MapReduce job to use Kettle installation from /opt/pentaho/mapreduce/6.0.1.0-386-6.0.1.0-386-hdp22
2016/04/01 16:02:10 - Pentaho MapReduce - mr - mapreduce.application.classpath: classes/,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
2016/04/01 16:02:14 - Pentaho MapReduce - mr - Setup Complete: 0.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:02:24 - Pentaho MapReduce - mr - Setup Complete: 0.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:02:34 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:02:44 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:02:54 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:03:55 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:04:05 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2016/04/01 16:04:15 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 44.475254 Reducer Completion: 0.0
2016/04/01 16:04:25 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 61.912037 Reducer Completion: 0.0
2016/04/01 16:04:35 - Pentaho MapReduce - mr - Setup Complete: 100.0 Mapper Completion: 66.66667 Reducer Completion: 0.0
2016/04/01 16:04:35 - Pentaho MapReduce - mr - [SUCCEEDED] -- Task: attempt_1459477876939_0003_m_000001_0 Attempt: attempt_1459477876939_0003_m_000001_0 Event: 0
2016/04/01 16:04:35 - Pentaho MapReduce - mr - Container killed by the ApplicationMaster.
2016/04/01 16:04:35 - Pentaho MapReduce - mr - Container killed on request. Exit code is 143
2016/04/01 16:04:35 - Pentaho MapReduce - mr - Container exited with a non-zero exit code 143
2016/04/01 16:04:35 - Pentaho MapReduce - mr - [SUCCEEDED] -- Task: attempt_1459477876939_0003_m_000000_0 Attempt: attempt_1459477876939_0003_m_000000_0 Event: 1
......
------------------------------------------------


the version of softs I used as follow:


pdi-ce-6.0.1.0-386
hadoop-2.7.1
CentOS 6.4


Thanks for reading my problem.

unable to add Step into generated XML

$
0
0
While adding the step element into the TransMeta getting some error related to thread and the XML is not generating and also adding the DB details also creating problem .Below is my code to add the Step.

TransMeta transMeta = new TransMeta(templname,templname);
transMeta.setName("Testing");

LOGGER.info("transMeta----->"+transMeta);

DatabaseMeta dbmeta = new DatabaseMeta();
LOGGER.info("dbmeta----->"+dbmeta);
dbmeta.setName("MS_SQL");
dbmeta.setHostname("172.25.164.63");
dbmeta.setDBPort("1433");
dbmeta.setDBName("DM_MetaData");
dbmeta.setUsername("sa");
dbmeta.setPassword("Password123");
@SuppressWarnings("deprecation")
Database db = new Database(dbmeta);

//Add Step Source
StepMeta inputStep = new StepMeta();
inputStep.setName("source");
inputStep.setDistributes(true);
inputStep.setCopies(1);
StepPartitioningMeta stepPartMeta = new StepPartitioningMeta();
stepPartMeta.setMethod("none");
inputStep.setStepPartitioningMeta(stepPartMeta);
LOGGER.info("inputStep----->"+inputStep);
transMeta.addStep(inputStep);

//Generated XML from Java Code
String filepath = "D:\\Pentaho\\output\\"+templname+".xml";


transMeta.writeXML(filepath);
return filepath;


Please suggest if i am wrong.

Thanks,
Satya

Dynamic job name into Job log table

$
0
0
Hi guys!
I want to load many tables from source to target
I'v made dynamic job and transformations for loading multiple tables one proccess consequentially

"get params" gets table names, intervals, sql and so on.
"loading" just loads table by gotten params
"runner" is parent job witch runs child jobs, root job

Also, I want to use Job log table for get "startdate" and "enddate" params for CDC loading.
I want to run one and the same job many times under different jobnames

My log table looks like this:



I tried to use "Set variables" but it didn't halped me, i've tried everything without successful

Cuold you help my, please?

Thanks
I

Data fitting to Linear result

$
0
0
Hello,

Complete newbie to Data Mining and not very good at maths :) I was wondering if WEKA can do something like this.
ID4 gain ID10 ID1 ID3_75C
2665 1.482 105 7040 8005
Above is a sample of data I get from a thermal device calibration. The ID_75C in an ideal scenario need to equal 9500, but due to poor calibration does not. I need WEKA to give me an equation that most fits the linear results (9500) by using the other data available.

If I am running WEKA using the available data, including the ID3_75C from the device I can get a solution to fit the "wrong data" but if I enter a column manually and populate it with the 9500 result and then try to solve for it I always get ID3_75C = + 9500, it ignore all the other variables and I need it to do the opposite, ignore ID3_75C and solve for it using everything else available.

Any help would be greatly appreciated :)

Thank You and Kind Regards,

DrD

How to build WEKA dataset from arrays?

$
0
0
Hello

I want to use the Java WEKA library for classification. I only need classifiers such as LibSVM, NaiveBayes or C4.5 trees.
My labels are stored in a 1D double array. My training and test set are stored in a 2D double array where the rows are the data points and the columns are the features.

Code:

double[] labels = new double[N];
double[][] trainingData = new double[N][D];
double[][] testData = new double[X][D];

N is the number of data points in the training data, X is the number of data points in the test data and D is the number of features.

Now, I have seen that WEKA needs Instances to build the model and Instance for prediction.
How can I get this type of objects from my data?

Second, is it ok if my classes are 0 and 1 or do they have to be -1 and 1?

Populating text field with a Query

$
0
0
Hi All!
I am working on a report in prd 5.4, andi have written a query that takes 2 parameters and pulls up about 20 different fields that need to be populated to 20 text/number fields in the report. How can i associate that query to the fields in report? In other words, i need to specify Name,DOB etc fields which column value to receive from DB?

thanks,

constrain fact table in saiku with mondrian 3 schema?

$
0
0
Hi,

I'm new to the use of mondrian schemas, currently using mondrian 3 schema with saiku. I have a database model which contains a date key in our fact table. This date key is used to distinguish the most recent fact records. Is there any way I can add a constraint to the schema so that when retrieving data from the fact and dimension tables I will only retrieve records with a specific date found in the fact table? Otherwise, I am ending up with a much larger dataset which contains all dates in the fact table, not just the date I'm interested in.

Ideally, I'd like to be able to initialize the date constraint in my queries, not in the schema itself. I just can't seem to find a way to allow me to do this.

Thanks very much

Customizing the Data Integration Design Tool

$
0
0
We have requirement to expose a design surface to the business users to design the jobs. Is there any way to customize the steps in the designer tool (remove all the unwanted steps and rename some of the existing steps)
Also, can we hide the steps based on the user role and permissions?

Execute as single statement doesn't work with mysql

$
0
0
Hello,

I'm trying to execute in a single transaction some mysql instructions. I have tried 'Execute SQL script' and 'Execute row SQL script', but I get an error saying that I have an error in my SQL syntax. If I execute that instructions without single statement option, they work.

Does single statement work or it's a bug?

Value mapper - several fields

$
0
0
Hello everyone. I was wondering if there is a step to map the value of several fieldss to a new field.

For eaxample, if field1 is equal to A and field2 is equal to B thrn newfield is equal to C. Else, if field1 is equal to A and field2 is equal to D then newfield is equal to Z. And so on. It is a if...else statement but with several fields.

I currently use javascript to do this and I was wondering if this can be dine in a different way.

thanks

Sample ETL Process Data for educational purposes

$
0
0
Hey there everybody.

I'm a part of a college project studying ETL process starting from Dimensional Data Model on.

Showing up and asking for help here seemed to me like the best oppurtunity to get help with finding a source of data useful to work with.

So I'm currently looking for data on a specific field that I will work with, create a model to describe the relevant data and prepate transformation tables in Spoon, then run the ETL process and report on findings.

Examples of fields of concern would be: Financial situation & National well being, Currency drop and affect on a crisis, Mining reasons that affect crime rate withing a certain area, Some kind of Geographical data, Demographical data, relationship between Education and happiness or salary of population of a country/state/area.
Pretty much any kind of a closed package of tables (relational database data, Excel tables, CSV files, ...) is good data, as long as a meaningful model can be built and ETL process started and conclusions derived.

It is very welcome to have at least 4 most meaningful dimensions.

Might seem unusual topic but I'm sure there must be people working with data, knowing sources better than someone being thrown in it all.

So I'm asking you to help me with this.

Where can I find & download such data?

Best option/practice to add a new field to a Fact table

$
0
0
Hi all.

I would like like to know what is the best option/practice to add a new field to a Fact table without having to reprocess everything.

Could someone one point me to the right direction?

Thanks in advance.

connection to database for the whole session

$
0
0
Hi,

I am working with Pentaho Data Integration Community.

I am creating a transformation and I create a database connection to a MySQL database. The connection works well, but when I create a second transformación and I am using a database step I can't select the database connection that I have created for the first transformation.
It seems like a database connection is related to a single transformation, but... how can I create a database connection for all objects of my session?
For example, I can select AgileBI (the connection of the installation) from every transformation or jobs

Thanks

Weka Java API - Linear Regression ...

$
0
0
Hi there,

I have been looking at the Java API for Weka, and have been having great difficulty trying to find out how
to pass a CSV file to the regression method. Its seems that there is a method to convert the CSV file to a specific file format called ARFF, which is another text file.

My question is how should the fields of data be arranged, that being many Independent variables for case of Multiple Linear Regression and the one Dependent variable.

Should a row look like this:
x1,x2,x3,x4,x5,Y
Or the reverse
Y,x1,x2,x3,x4,x5

I am asking this because if the only way to get data into the Regression is via this ARFF file structure, then I need to know how to structure the CSV data for it to be transformed properly to ARFF.


This is what I have discovered so far.

BUT, is there a way, a method that takes in like a 2d-array structure as an INPUT to the Regression method of Weka in its java code.


Also once this data is passed in, which methods returns the array of coefficients.

Hope someone can help.

Regards,
Viewing all 16689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>