Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

How to load content from a html document?

$
0
0
Hello everyone,
I am new to Kettle (or Pentaho for that matter) and I am looking for a solution to process data from a html webpage. I have been searching in this forum and in some Kettle eBooks, but I am not sure a native (or via module) solution for this exists. Here is my scenario:

Let's take the following webpage as an example: http://www.basketball-reference.com/...013_games.html. If I had this html document local, could I somehow use Kettle to access/parse the page source and load columns "Date", "Visitor", "Score", "Home", "Score" into a target database? (This page is just an example; I know there is a link on top that let's you export this data to csv which would be a no-brainer to process...)
Or would I have to parse the webpage first with an external script, e.g. using Python?

Many thanks for your answers!

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>