Quantcast
Viewing all articles
Browse latest Browse all 16689

Partitioning data with thousands of values

Hello,

I have an R program that I'm converting to a PDI transformation to run in my organization's Pentaho environment. I'm using the R function split() to partition the data into a list of data frames, based on the user ID. In my R code, each partition of userID is passed to a function that performs a calculation on the rows in each partition. Thus, Pentaho should treat each partition separately. A new data frame is returned based on the calculated values. At the end I merge these individual data frames (one per user ID) into a single data frame.

So in PDI I'm attempting to use Partitioning to accomplish the same thing as split() in R. The issue is I don't see how to create thousands of partitions. Is there a better approach to doing this?

Thanks.

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>