Hello,
I have an R program that I'm converting to a PDI transformation to run in my organization's Pentaho environment. I'm using the R function split() to partition the data into a list of data frames, based on the user ID. In my R code, each partition of userID is passed to a function that performs a calculation on the rows in each partition. Thus, Pentaho should treat each partition separately. A new data frame is returned based on the calculated values. At the end I merge these individual data frames (one per user ID) into a single data frame.
So in PDI I'm attempting to use Partitioning to accomplish the same thing as split() in R. The issue is I don't see how to create thousands of partitions. Is there a better approach to doing this?
Thanks.
I have an R program that I'm converting to a PDI transformation to run in my organization's Pentaho environment. I'm using the R function split() to partition the data into a list of data frames, based on the user ID. In my R code, each partition of userID is passed to a function that performs a calculation on the rows in each partition. Thus, Pentaho should treat each partition separately. A new data frame is returned based on the calculated values. At the end I merge these individual data frames (one per user ID) into a single data frame.
So in PDI I'm attempting to use Partitioning to accomplish the same thing as split() in R. The issue is I don't see how to create thousands of partitions. Is there a better approach to doing this?
Thanks.