Hi,
I've discovered PDI/Spoon for the first time this weekend and I like it. I'd like to use this tool to replace a who load of my crappy vba transformation and parsing code. The workflow, labeling and visibility of the effects of each step in PDI are something I really admire.
I've been replicating my work on one of the "easy" Excel files from the 60 or so I transformed and normalised for a recent project (The purpose was to clean up files before putting them in a homogenized, consolidated, reconciled database). I made quite good progress in a day today using PDI/Spoon butI have 3 requirements I'd like some guidance on in terms of how to replicate using Spoon.
1. Split a string field based on the first occurrence of a space i.,e ' '
I think need to go to school regarding Regex. I've skirted this a few times in the last few years but I probably need to actually understand it now. I tried using "replace in String", Y" to use regex & '/[^ ]*/' in search but I'm not having any luck. What should I be doing differently?
2. "Decumulation"
I've invented a new word - sorry about that. I've got a few files from this project which show for a collection of accounts, balances that are non-zero cumulative instead of "normal" incremental. I'd instead like them to be "normal" incremental. I tried exporting to a H2 table and then importing from the same H2 table to get to "execute SQL script" but since then discovered the execute SQL is not designed for data manipulation. That was my "get out of jail" move... So I need another way to strip out the cumulative effect from a time series of numbers. How else can I do this?
To make this question harder- for some I received certain account lines had cumulative data Except that in between some months the account value fell to zero (sweeps,manual entries etc). So I would want to say "take the current month value for x account away from the last non-zero occurrence". Does this take me into javascript, and can it be done in spoon?
3. "Pushdown"/"Pushup"
Sorry - another made up word. Client accounting data is basically pretty ****ty. A pattern I saw in more than one of these files was that there are certain lines so important you want to push these values down (in a new field) until you hit another value you know about, and then push down that. For instance you have heading "Revenues" and you want to tag the small accounts that sit below this as belonging to "Revenues". The next important line item you get to is "Cost of Goods Sold" and you then want to push this value down in the same new field until you reach say "R&D Expenses". I use the word "Pushup" to describe the scenario where those key values in the file from the client sit below the line items you want to tag - meaning you start from the bottom and work up. Basically you want to use one or the other approaches for any given file like that.
Any assistance appreciated
Regards,
Andy
I've discovered PDI/Spoon for the first time this weekend and I like it. I'd like to use this tool to replace a who load of my crappy vba transformation and parsing code. The workflow, labeling and visibility of the effects of each step in PDI are something I really admire.
I've been replicating my work on one of the "easy" Excel files from the 60 or so I transformed and normalised for a recent project (The purpose was to clean up files before putting them in a homogenized, consolidated, reconciled database). I made quite good progress in a day today using PDI/Spoon butI have 3 requirements I'd like some guidance on in terms of how to replicate using Spoon.
1. Split a string field based on the first occurrence of a space i.,e ' '
I think need to go to school regarding Regex. I've skirted this a few times in the last few years but I probably need to actually understand it now. I tried using "replace in String", Y" to use regex & '/[^ ]*/' in search but I'm not having any luck. What should I be doing differently?
2. "Decumulation"
I've invented a new word - sorry about that. I've got a few files from this project which show for a collection of accounts, balances that are non-zero cumulative instead of "normal" incremental. I'd instead like them to be "normal" incremental. I tried exporting to a H2 table and then importing from the same H2 table to get to "execute SQL script" but since then discovered the execute SQL is not designed for data manipulation. That was my "get out of jail" move... So I need another way to strip out the cumulative effect from a time series of numbers. How else can I do this?
To make this question harder- for some I received certain account lines had cumulative data Except that in between some months the account value fell to zero (sweeps,manual entries etc). So I would want to say "take the current month value for x account away from the last non-zero occurrence". Does this take me into javascript, and can it be done in spoon?
3. "Pushdown"/"Pushup"
Sorry - another made up word. Client accounting data is basically pretty ****ty. A pattern I saw in more than one of these files was that there are certain lines so important you want to push these values down (in a new field) until you hit another value you know about, and then push down that. For instance you have heading "Revenues" and you want to tag the small accounts that sit below this as belonging to "Revenues". The next important line item you get to is "Cost of Goods Sold" and you then want to push this value down in the same new field until you reach say "R&D Expenses". I use the word "Pushup" to describe the scenario where those key values in the file from the client sit below the line items you want to tag - meaning you start from the bottom and work up. Basically you want to use one or the other approaches for any given file like that.
Any assistance appreciated
Regards,
Andy