Hi,
I created a job using Spoon design tool to do prediction with linear regression. The job is as follows:
Start --> get_model_transformation --> predict_transformation --> End Job
where
get_model_transformation is to load train dataset and train it using Knowledge Flow.
Table input --> Knowledge Flow
and the Knowledge Flow is to train the model with linear regression.
Kettle Inject --> Training SetMaker --> LinearRegression --> Serialized Model Saver
predict_transformation is to read the test dataset, calculate the predicted value and then save it to the database.
Table input --> Weka Scoring --> Update
Because in the job, the model is saved first to a file and then the file is read for Weka Scoring, there's an overhead for file I/O.
I want to reduce the file I/O overhead so the job will run faster.
Q: Is there any way to predict with linear regression without saving the model to a file (may be save it in memory)? Any other advice to make the job faster is also welcomed.
Many thanks,
Kusuma
I created a job using Spoon design tool to do prediction with linear regression. The job is as follows:
Start --> get_model_transformation --> predict_transformation --> End Job
where
get_model_transformation is to load train dataset and train it using Knowledge Flow.
Table input --> Knowledge Flow
and the Knowledge Flow is to train the model with linear regression.
Kettle Inject --> Training SetMaker --> LinearRegression --> Serialized Model Saver
predict_transformation is to read the test dataset, calculate the predicted value and then save it to the database.
Table input --> Weka Scoring --> Update
Because in the job, the model is saved first to a file and then the file is read for Weka Scoring, there's an overhead for file I/O.
I want to reduce the file I/O overhead so the job will run faster.
Q: Is there any way to predict with linear regression without saving the model to a file (may be save it in memory)? Any other advice to make the job faster is also welcomed.
Many thanks,
Kusuma