The challengeDid I say that
Sparkl is amazing? Probably a few times :)
When we're developing operational applications, utilities, even dashboards there's a common requirement that often comes into play - the ability to have a storage engine available.
Sparkl is itself a plugin that runs on the
Pentaho BA server (Check this
older blog post for the Sparkl fundamentals), and the BA server needs to be connected to a database, and is usually connected to several.
However, this poses a problem if we're distributing our Sparkl app, as we would need to know in advance the configuration of the database so we can connect to it; On top of that, we'd need to do the initialization of the database, create tables, etc.
Francesco Corti took a great approach to it on the AAAR plugin, a pentaho - alfresco integration, where he built a configuration screen to configure all the necessary dependencies
Embedding a databaseBut wouldn't it be cool if we could remove that dependency and somehow have a database that would
just work (tm)?
Not only it would be cool, it actually works as a charm. We did that when we built the
SDR sample form that was part of the
Pentaho 5.2 release.
What we did was using the
h2 database in embedded mode; Just by playing with the connection string, we can create a database on the fly that's persisted on a file. Obviously this is hardly something that should be used to store huge amounts of data. I'm giving you the tool, use it wisely.
So, how does it work? The trick, like I mentioned, is on the connection string. All we need to do is to create a connection in PDI with the following connection string:
jdbc:h2:${cpk.solution.system.dir}/${cnp_db};DB_CLOSE_DELAY=-1;AUTO_SERVER=TRUE;INIT=RUNSCRIPT FROM '${cpk.plugin.dir}/resources/ddl/cnp_db.ddl'
This uses some of the
CPK magic variables;
CPK, which stands for
Community Plugk Kickstarter is the library that powers Sparkl. In runtime it passes some variables that give system environment information, which is going to be very useful for this connection. So using this connection string we can define a new database connection in spoon:
![]() |
New connection in PDI |
The available variables can be seen in the transformation properties of the Sparkl endpoints (when the UI is used to create them).
![]() |
CPK Variables |
In here you see the available variables. By default they are "commented" out (a sparkl convention) and to use them all you need to do is un-comment them. Their meaning is described in the description field. Obviously, the standard parameters can be used, and in here you see that I defined our database name as
_cnp_db. There's another sparkl convention here; this parameter, that holds the database name, starts with an underscore, which means that this parameter is
internal only - can't be passed from the endpoint calls.
Initializing the databaseWe're building applications, utilities, tools; While it's cool to have access to a database connection, it's kind'a pointless to have an
empty database. That's what the following instruction in the init connect string is for:
INIT=RUNSCRIPT FROM '${cpk.plugin.dir}/resources/ddl/cnp_db.ddl'
I literally have in my sparkl app a
cnp_db.ddl file with the initialization script to be run at startup. And it's simply a set of
create X if not exists instructions. My script currently reads:
create table if not exists notifications (
id INT PRIMARY KEY AUTO_INCREMENT,
eventtype VARCHAR(64) NOT NULL,
author VARCHAR(1024) NOT NULL,
rcpt VARCHAR,
title VARCHAR(2048),
message VARCHAR,
style VARCHAR(64) NOT NULL,
link VARCHAR
);
You can complement this with whatever sql instruction that will make sure the system is properly initialized.
Developing the KTR from within spoonAs you won't be surprised to hear, those variables are only populated when the transformation or job is executed from the BA server. It's CPK / Sparkl that's responsible for injecting them; spoon has no idea what
cpk is. But we still need to develop and test the transformation in our environment.
So what we'll do is simply define in kettle.properties the values that would be replaced by
cpk. Here's the snippet of my kettle.properties (I'm working on a plugin called
cnp, but more on that later
)
cpk.plugin.id = cnp
cpk.plugin.dir = /home/pedro/tex/pentaho/project-sugar/solution/system/cnp
cpk.plugin.system.dir = /home/pedro/tex/pentaho/project-sugar/solution/system/cnp/system
cpk.solution.system.dir = /home/pedro/tex/pentaho/project-sugar/solution/system
cpk.webapp.dir = /home/pedro/tex/pentaho/target-dist/server/webapps/pentaho/
cpk.session.username = admin
cpk.session.roles = Administrator,Authenticated
Now we can work as we usually do, and the behaviour will be the same wither from within spoon or running from the server
Exploring the database contentsNot only for the development process but to do some extra debugging, we need access to the database contents. PDI offers some abilities to do database exploration but I don't think that's a replacement for a proper sql client.
But this is exactly why the flag AUTO_SERVER=TRUE is on the init string. This allows us to connect to the
same database using other clients. I'm a user of
SQuirreL for a long time, and in order to use it all I needed to do was using the same connect string
![]() |
Configuring SQuirreL to use our database |
The only care we need is to make sure we use the expanded path, as squirrel has no idea what kettle variables are. So my connect string is:
jdbc:h2:file:/home/pedro/tex/pentaho/project-sugar/solution/system/.cnp_dbx;DB_CLOSE_DELAY=-1;AUTO_SERVER=TRUE;INIT=RUNSCRIPT FROM '/home/pedro/tex/pentaho/project-sugar/solution/system/cnp/resources/ddl/cnp_db.ddl'
This allows me to just use this database as if it was a regular server, from within spoon, the BA server or squirrel.
Putting it all togetherIn the end I was able to achieve what I wanted - a plugin that includes a storage layer and... just works.
![]() |
Plugin using H2 embedded database |
Cheers!
-pedro
More...