Mate - I am processing a json which is hierarchical in nature. It is a survey information. Once survey can have many questions, one questions can have many answers and one answers can have many plugins.
Source data:-
I need to produce normalize data and break it at plugins level. So I need total 5 records (4 answers and 2 plugins for one answer). I am attaching output (survey.xls) for reference.
I am having a working solution but it is not very much clean. I am reading everything from MongoDb in a single row with the help of array. So I am having 5 questions ($.questions[0]._id) then reading 5 answers for each quesitons ($.questions[0].answers[0]._id) and so on. I then normalize this data and filter out null questions, answers, etc.
Instead of that, I want to use something like unwind so that I can get this format directly out of MongoDB Input step. And which should take care of any number of questions or answers.
Any help will be much appreciated.
env:- PDI 5.4.0.1-131 CE, Windows 10, Java build 1.8.0_25-b18, MongoDB 3.2.3
Regards,
Ritesh
Source data:-
Code:
{
"_id": {"$oid": "5673417f677aff45e70001c5"},
"title": "Survey 01",
"questions": [
{
"_id": {"$oid": "56734183677aff45e70001c6"},
"body": "Which browser do you use? Any plugin to mention?",
"answers": [
{"_id": {"$oid": "56734183677aff45e70001c7"},
"body": "Google Chrome",
"plugins": [
{
"_id": {"$oid": "56ce0767988ceb49c5000002"},
"name": "Ad blocker",
"browser_plugins_id": {"$oid": "566886c040041260650003cc"}
},
{
"_id": {"$oid": "56ce076c988ceb49c5000003"},
"name": "Chrome password manager",
"browser_alert_id": {"$oid": "5644fa0a5241491f9abe0200"}
}]
},
{
"_id": {"$oid": "56ce07b2988ceb49c5000006"},
"body": "Internet Explorer 11"
},
{
"_id": {"$oid": "56ce07b4988ceb49c5000007"},
"body": "Mozilla Firefox"
}]
},
{
"_id": {"$oid": "56ce079b988ceb49c5000004"},
"body": "Which website you surf most?",
"answers": [
{
"_id": {"$oid": "56ce079b988ceb49c5000005"},
"title": "Facebook"
}]
}],
"status": "active"
}
I am having a working solution but it is not very much clean. I am reading everything from MongoDb in a single row with the help of array. So I am having 5 questions ($.questions[0]._id) then reading 5 answers for each quesitons ($.questions[0].answers[0]._id) and so on. I then normalize this data and filter out null questions, answers, etc.
Instead of that, I want to use something like unwind so that I can get this format directly out of MongoDB Input step. And which should take care of any number of questions or answers.
Any help will be much appreciated.
env:- PDI 5.4.0.1-131 CE, Windows 10, Java build 1.8.0_25-b18, MongoDB 3.2.3
Regards,
Ritesh