Reading Excel Spreadsheets from MCP

I am currently working on an EAE project where we have some data in Excel spreadsheets that need to go into a DMSII database. In this case the spreadsheet data are only read once and after that maintained in DMSII. Here it is easy enough to save the spreadsheet as a CSV file, copy the file to the MCP platform and read the file with a one time EAE report. But what if you have data coming in on Excel sheets on a regular basis? Do you still want to go through the hassle of opening the file and saving it as a CSV? No need, the Apache Software Foundation (who else?) has a Java API that knows how to read Excel spreadsheets.

The Apache POI project provides a Java API that can read, and write, XLS files (in the HSSF/Horrible Spread Sheet Format as they call it) and XLSX files. To read a spreadsheet from ALGOL, all I needed to do was wrap a little Java code around the POI interface to make it easily accessible from an ALGOL program.

The code presented here was just an excercise for me, to see how complicated it is to read Excel. I can imagine the ALGOL program to be replaced by a library, which can be called from an EAE report. You can copy the Excel file to the MCP platform using FTP or Windows copy. Alternatively, users could be allowed to upload Excel data using their web browser. In that case the Tomcat web server running on your JProcessor can implement a servlet that uses the Apache (again) Commons FileUpload package to accept Excel file uploads from authorized clients. The uploaded files can be written to an MCP pack for batch processing, or they can be processed right away using, for example, JDBC.

You’ll need to download the Apache POI API from http://poi.apache.org/download.html to compile the ExcelReaderImpl class. Documentation of the POI API and examples can also be found on the Apache site.

The method openSpreadsheet() is not very complicated. It instantiates some POI objects and if an exception is thrown, it returns a non zero result.

The getRow() method is a bit more involved because the type of cell contents dictates the method with which to retrieve the cell data. I find the handling of cell of type CELL_TYPE_FORMULA a bit clumsy. First, I used a FormulaEvaluator class to calculate the result of the formala in a cell. However, since the FormulaEvaluator runs on Java, there is no guarantee that the result is exactly the same as the result Excel originally produced. That is why I commented out that bit and opted for the approach where I try and catch until a valid result is returned.

SpreadsheetReader.java This defines a generic interface to read spreadsheets.
ExcelReaderImpl.java This class implements an Excel xls-type file reader
ExcelReaderLibrary.java This class contains static methods to make calls from ALGOL easy

The ALGOL include file was, as usual, generated using the JifGen utility. But, since I’m running Java 7 on my PC and the Java version of my MCP is still version 6, some more path info was needed in the JifGen arguments:

\"program files"\java\jdk1.6.0_25\jre\bin\java -cp /libs/mcp/mcp.jar;/libs/mcp/mcptools.jar;/"program files"/java/jdk1.6.0_25/lib/tools.jar com.unisys.mcp.tools.JifGen -algol -classpath dist/excel2mcp.jar -d mcp/algol  -output EXCEL_ -verbose com.mcpjava.mcp.spreadsheet.ExcelReaderLibrary

The generated ALGOL file and the test driver are nothing special:

EXCEL_INCL.ALG_M
READEXCEL.alg_m

Leave a Reply