Oracle APEX – Let’s integrate with Pentaho! Part 2

25 November 2019, Lech Cieślik

ETL processes are key parts of Business Intelligence systems. Their creation involves many challenges that organizations must face if they want to base decision-making processes on real-time data analysis. And here, tools such as Pentaho Data Integration can help. In applications that we create for our clients, we often face challenges related to the design and implementation of ETL processes. For example, for systems whose main purpose is to prepare and send Standard Audit Files for Tax (SAF-T), the challenge will be to download sales data from files in a format such as xlsx, csv, txt and even MT940 that contain information about bank transfers. For example, ODS systems will be focused on integrating data from many sources where, in addition to files, source data is stored in databases such as PostgreSQL and MS SQL. Among these many use cases, one requirement is common to all: the solution should be ready for quick and easy addition of new and, most importantly, different data sources that can be transformed and loaded into previously created systems. This is exactly what Pentaho DI gives us, which, thanks to its advanced graphical interface, allows you to create ETL processes with no knowledge of programming languages. This is the second and last entry in the series “Let’s integrate with Pentaho”. In the first article we showed how to create a simple transformation using the Pentaho DI tool. This article will about integration of this tool with Oracle APEX.

Carte Server – basic information

Carte is a simple web server that allows for jobs and transformations to be run remotely. In addition, it allows the jobs to be monitored, started and stopped. After downloading Pentaho DI, Carte can be started from the command line via the command:

Where the configuration.xml file can look like this:

In addition, it is worth mentioning that Carte provides its own API.

More about Carte can be found on the websites:

Carte Server and Oracle APEX Web Source Modules

APEX Web Source Modules allows developers to access REST services. In addition, it provides the ability to use them in many Oracle APEX components or in a PL / SQL block, to invoke the service and process the responses. So, we can create our own Web Source Module and use it to integrate with Carte. The best part is that this step is quick and simple; it will take us but a moment.

In Shared Components, in the Data Sources section, select the Web Source Modules option and then click the Create button. A wizard window will appear on the screen that will guide us through the entire process. At the beginning we choose the option to create our Web Source Module, in our example it will be ‘From scratch’ and we move on to the next step. Here we specify the Endpoint Name and URL.

Then we define Base URL and Service URL Path, the address of the service we are interested in.

In the last step, we choose the type of authentication. In our example it will be the Basic option, and the login and password will be set as standard values ​​for Carte (more on password management and Carte security can be found on the Pentaho website).

After creating the Web Source Module, we need to edit it. In the Operations section, select POST. In the Advanced section we set the appropriate Static ID value.

We’ve just created our own Web Source Module which we can call using PL / SQL code using the apex_exec package.

Carte Server and PL / SQL

If we do not decide to use Web Source Modules, we can run the transformation on the Carte server directly from PL/SQL code. Just write the PL/SQL procedure responsible for sending the HTTP request and receiving the response. For its implementation we will use the PL/SQL library mentioned in previous articles – utl_http.

Pan and Kitchen

It is also worth mentioning about Pan and Kitchen. These are command line tools that allow you to run Pentaho transformations and jobs. They are usually used to schedule cron jobs. If we want to use the Pan tool in a PL/SQL block, this can be done using dbms_scheduler according to the following example:

Oracle APEX and Pentaho DI

After configuring Carte and preparing any of the above integration methods, it was time to create an Oracle APEX application that would allow the user to run transformations.

Sample solution:

  • Create a report with the transformations names.
  • Add Run column. Clicking the icon calls the integration process procedure.
  • Add Carte column. Clicking on the icon starts the redirection to http://localhost:8088/kettle/status/ which allows you to manage transformations from the Carte server interface.

Carte GUI

  • Add a report with the content of the employees table.

The final result:

Pentaho Manager – Oracle APEX application

Summary

The first entry in the series ‘Let’s integrate with Pentaho!’ was about the implementation of transformation responsible for extracting data from a CSV file (E), transforming the values of columns (T) and loading data into a table (L), that creating a complete ETL process. In the second article we presented ways to integrate Pentaho DI with the Oracle APEX application using Carte and APEX Web Source Modules, PL/SQL, jobs and the Pan tool. Particularly noteworthy is the first method, which is limited to a few clicks, and its implementation takes literally just a moment. The end result was the creation of a simple Oracle APEX Pentaho Manager application that allows us to run transformations.

Tagged with: , , , , , , ,

Hire us!

Pretius is a software development company.
We create web applications using: Java, Oracle DB, Oracle Apex, AngularJS.
Contact us to talk about how we can help you with your software project!