The most common solution is converting XLSX file to CSV format and then uploading data from file to a database table using custom PL/SQL procedure. This is a simple and proven way, but due to the low flexibility is only suitable for loading files with fixed and predictable data format (the number of columns and their data types). In case of adding new column to the file or deletion of existing, we are forced to modify loading process and / or the target table. If we want to give user the ability to share data in a form of a report, then we have to also remember about update of APEX application.
It doesn’t change the fact, that there is a need of importing data directly from XLSX files and our customers won’t stop asking for it. So if there is a need, we must find a solution, right?
A perfect situation would be a possibility of choosing which data from XLSX file we want to load and which not. Or we could just preview what is in the file that was attached by user, without need of parsing data and loading it to temporary table. With a helping hand comes AS_READ_XLSX packege. I found it on Amis Technology Blog, where you can find some of its basic usage examples.
Package AS_READ_XLSX enables us to draw data directly from the BLOBs inserted into database. It is also possible to query files which are on the filesystem. To parse XML files it uses standard Oracle library. I tested it – it works also on XE database edition.
Sample query might look like this:
1
2
3
|
select
*
from table( as_read_xlsx.read( as_read_xlsx.file2blob( ‘DOC’, ‘Book1.xlsx’ ) ) ); /* for file “Book1.xlsx” which resides in dictionary “DOC” */
|
or
1
2
3
|
select
*
from table( as_read_xlsx.read( v_blob ) ); /* where “v_blob” is a variable that contains BLOB with XLSX file */
|
As a result of such query we will get flattened tree structure of XML file:
As we can see, there is all information we need: number and sheet name, number of a row, number of a column, cell address, data type and the data of appropriate database type. That’s something, but this is not yet the report we’d expect.
Using a bit of SQL acrobatics, we can make the above data become more readable. For this purpose, we will use the PIVOT function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
with xlsx as (
select
ROW_NR,
COL_NR,
case CELL_TYPE
when ‘S’ then STRING_VAL
when ‘N’ then to_char(NUMBER_VAL)
when ‘D’ then to_char(DATE_VAL, ‘YYYY-MM-DD HH24:MI:SS’)
else FORMULA
end CELL_VAL — to make PIVOT works we have to have one data type for this column – in our case CHAR
from (select *
from table( as_read_xlsx.read( (select blob_content from XLSX_FILES where id = :P2_FILE_ID ),’employees’ ) )))
select ad.* fromxlsx
PIVOT
( max(CELL_VAL)
FOR COL_NR in (1 as ID,2 as LAST_NAME,3 as FIRST_NAME,4 as USERID,5 as START_DATE,6 as COMMENTS,7 as MANAGER_ID,8 as TITLE,9 as DEPT_ID,10 as SALARY,11 as COMMISSION_PCT )) ad
where row_nr >1 ; /* we want to omit headers row */
|
If everything is written correctly, we should see report like this:
Having a report in this state, we are only one step before loading data to target table or creating report in APEX. And we achieve this without processing BLOB 🙂
For our example, let’s assume that we want to report based on the first sheet of loaded XLSX file. For simplicity, assume that the first row contains column headers and data is in basic tabular form.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
declare
v_query varchar2(4000);
v_sheet_name varchar2(100);
v_sheet_nr number;
v_cnt number;
begin
select min(sheet_name) intov_sheet_name
from(select * from table( as_read_xlsx.read( (select blob_content from XLSX_FILES where id = :P2_FILE_ID )) )) where sheet_nr = 1;
v_query := ‘with xlsx as (
select
ROW_NR,
COL_NR,
case CELL_TYPE
when ‘‘S’‘ then STRING_VAL
when ‘‘N’‘ then to_char(NUMBER_VAL)
when ‘‘D’‘ then to_char(DATE_VAL, ‘‘YYYY-MM-DD HH24:MI:SS’‘)
else FORMULA
end CELL_VAL
from (select *
from table( as_read_xlsx.read( (select blob_content from XLSX_FILES where id = :P3_FILE_ID ),’”||v_sheet_name||”‘ ) )))
select ad.* from xlsx
PIVOT
( max(CELL_VAL)
FOR COL_NR in (‘;
v_cnt := 0;
for x in (select col_nr, case CELL_TYPE
when ‘S’ then STRING_VAL
when ‘N’ then to_char(NUMBER_VAL)
when ‘D’ then to_char(DATE_VAL, ‘YYYY-MM-DD HH24:MI:SS’)
else FORMULA
end CELL_VAL
from table( as_read_xlsx.read( (select blob_content from XLSX_FILES where id = :P3_FILE_ID ),v_sheet_name ) ) where row_nr = 1)
loop
v_cnt := v_cnt+1;
if v_cnt = 1 then
v_query := v_query||x.col_nr||‘ as ‘||x.cell_val;
else
v_query := v_query||‘,’||x.col_nr||‘ as ‘||x.cell_val;
end if;
end loop;
v_query := v_query||‘ )) ad
where row_nr >1 ‘; /* we want to omit headers row */
return v_query;
end;
|
Main limitations of this method are related to its main advantage, that is reading data directly from XLSX file. Because we are working on uploaded file, not on data in database tables, we can’t use many of database goodies such as indexes. Operations on data types are more difficult, cause using PIVOT function forces us to unification of returned data types – in above example it’s CHAR. What’s more, all operations are performed inside memory, so it can happen that trying to open a very large file we will fill the entire pool of memory. Additionaly, this mechanism will scale poorly – the bigger the file will be, the longer it will take to get results. The structure of data inside the file is also important. If data in spreadsheet is scattered instead of being in table form, then displaying it on report will be a challenge.
This method works well in case of sharing relatively small files in form of compact tabular reports, in which number of rows won’t exceed few hundreds. Another use case is previewing of data from file that user just have uploaded before it is inserted into database table. This package can also be used in process of transferring data from XLSX file to database table. However, if you want to process bigger files, you should consider pre-loading “raw” data to some kind of staging table before doing PIVOT transformation for further operations on data.
I attach a sample APEX application. Its functionality is very limited, but it allows to check how to implement file uploading process, pre-viewing of “raw” data from file and sample report with dynamic PIVOT clause and option to choose from which file and sheet display data. Application contains installation script, which creates any needed objects including AS_READ_XLSX package. If you want to download only package click here. I added to it one change, that APEX forced me to make. When APEX checks if query is valid, it inserts NULL into every bind variable. This resulted in hanging database session every time I wanted to create a report.
For demonstration purpose I assumed that uploaded files will be of simple structure: the data begins from A1 cell and first row contains column headers. You can download my demo file here. I used this file in above examples.