Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

Repeat a value till the next identified increment in unstructured text

$
0
0
Hi Everyone

It's been a very long time for me since I have used PDI and I am seeing a lot of work has been done since the days of version 2.x Kudos to all the developers and Matt for keeping this project on a good path.

I am slowly getting back into this and things are starting to come back to me. I need to ask for a little help in processing a file I would have probably laughed at years ago, but alas I find myself a babe in the woods again.

In short the file is unstructured text that has been extracted from a PDF. The text file is like a printout with each page of the PDF being marked in the TXT with a --------Page xx--------- at the top of it. I can easily identify the page with a regex, and am in fact using the regex evaluation step to do just that. The page number (xx) is caught using a capture group and placed into a new field (string) pageNum. What I would like to do is to repeat this value on each row until the next pageNum increments the value. I can remember doing something like this in the past but I confess the solution eludes me.

Would anyone be so kind as to point me in the right direction?

Cheers
An extremely grateful Frog

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>