I am trying to use the XML input step to partially parse out the XML file:
Example file:
-----------------
<main>
--------------------------------
What I'm trying to accomplish is that I read and parse into fields account number and salutation, but everything deeper (actions, activities etc) I want to pass "AS IS" as a string field to be written to a dbase column.
So the table layout is:
ACCOUNT | SALUTATION | XML_STUFF
The problem I have is that the XML input step - when it hits the level of "XML_STUFF" strips out all the XML tags and only passes the values as text and empty spaces in-between.
In another instance I was able to bypass this issue by formatting the source XML with CDATA wrapper, but in this instance - the XML comes to me in certain format that I have no control over.
How can I accomplish my goal with kettle?? Can I ??
HELP please !
Thanks :)
Example file:
-----------------
<main>
<account number="1">
<salutation>Mr.</salutation>
<action_1>
<activity>1</activity>
<activity>2</activity>
<activity>3</activity>
<activity>2</activity>
<activity>3</activity>
</action_1>
<action_2>
<topic>so long</topic>
<activity>sad</activity>
<activity>sad</activity>
</action_2>
</account>
<account number="2">
<salutation>Mrs.</salutation>
<account number="2">
<salutation>Mrs.</salutation>
<action_1>
<activity>332</activity>
<activity>22</activity>
<activity>333</activity>
<activity>22</activity>
<activity>333</activity>
</action_1>
<action_2>
<action_2>
<topic>welcome</topic>
<activity>happy</activity>
<activity>happy</activity>
</action_2>
</account>
</main>--------------------------------
What I'm trying to accomplish is that I read and parse into fields account number and salutation, but everything deeper (actions, activities etc) I want to pass "AS IS" as a string field to be written to a dbase column.
So the table layout is:
ACCOUNT | SALUTATION | XML_STUFF
The problem I have is that the XML input step - when it hits the level of "XML_STUFF" strips out all the XML tags and only passes the values as text and empty spaces in-between.
In another instance I was able to bypass this issue by formatting the source XML with CDATA wrapper, but in this instance - the XML comes to me in certain format that I have no control over.
How can I accomplish my goal with kettle?? Can I ??
HELP please !
Thanks :)