Hi:
I wrote a simple clover code to replace an existing function using SQL-Loader. Using an XLS with 3 sheets, the execution time for sql-loader is 2,485 millisecond, for clover it is 5,124 milliseconds. Some XLS file may contain more than 30 sheets, the executime time using clover was very slow.
The graph has an XLS reader and inserts all records of all sheets into a DB table. for each sheet on a Excel file, change the sheetname, init graph object and execute. I need your help optimizing my clover code.
Below is my graph file, the transformation component is a java class MyReformat.java which copies the xls row into DB table.
----------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Graph author="bpilla" created="Fri Mar 21 16:56:02 CET 2008" guiVersion="1.10" id="1206534077725" licenseType="Evaluation license." modified="Tue May 20 11:48:14 EDT 2008" modifiedBy="bpilla" name="Test" revision="1.337">
<Global>
<Metadata fileURL="${cloverDir}${fileSeparator}meta${fileSeparator}xls_import.fmt" id="Metadata0"/>
<Metadata fileURL="${cloverDir}${fileSeparator}meta${fileSeparator}bulkuploaddata.fmt" id="Metadata3"/>
</Global>
<Phase number="0">
<Node cloverFields="SESSIONID;UPLOADTYPESTR;FILEID;UPLOADROWNUM;COLUMN01STR;COLUMN02STR " dbConnection="conn1" dbTable="BULKUPLOADDATA" enabled="enabled" guiHeight="0" guiName="DB Output Table" guiWidth="0" guiX="498" guiY="171" id="DB_OUTPUT_TABLE0" maxErrors="10" sqlQuery="INSERT INTO BULKUPLOADDATA (SESSIONID, UPLOADTYPESTR, FILEID, UPLOADROWNUM, COLUMN01STR, COLUMN02STR) VALUES ($SESSIONID, $UPLOADTYPESTR, $FILEID, $UPLOADROWNUM, $COLUMN01STR, $COLUMN02STR ) " type="org.jetel.component.DBOutputTable"/>
<Node enabled="enabled" guiHeight="0" guiName="Reformat" guiWidth="0" guiX="260" guiY="165" id="REFORMAT0" transformClass="MyReformat" type="org.jetel.component.Reformat" sessionid="${sessionid}" fileid="${fileid}" origFileName="${origFileName}" sheetname="${sheetname}" />
<Node enabled="enabled" fileURL="${copiedFile}" guiHeight="0" guiName="XLS Data Reader" guiWidth="0" guiX="1" guiY="132" id="XLS_READER0" sheetName="${sheetname}" startRow="2" type="org.jetel.component.XLSReader"/>
<Edge debugMode="false" fromNode="REFORMAT0:0" guiBendpoints="" id="edge2" inPort="Port 0 (in)" metadata="Metadata3" outPort="Port 0 (out)" toNode="DB_OUTPUT_TABLE0:0"/>
<Edge debugMode="false" fromNode="XLS_READER0:0" guiBendpoints="" id="edge1" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="REFORMAT0:0"/>
</Phase>
</Graph>
-------------------------------------------------------------------------
my clover code:
//if properties file has not been initialized, set it up now
if (propertiesFile == null) {
propertiesFile = cloverDir + File.separator + "defaultProperties";
EngineInitializer.initEngine(null, propertiesFile, null);
initDBConnection();
}
//in a windows environment, clover needs this
copiedFile = copiedFile.replace("\\", "\\\\");
cloverDir = cloverDir.replace("\\", "\\\\");
String graphFile = cloverDir + File.separator + graphName.replace("\\", "\\\\");
Properties p = new Properties();
p.setProperty("origFileName", origFileName);
p.setProperty("copiedFile", copiedFile);
p.setProperty("sessionid", sessionID);
p.setProperty("cloverDir", cloverDir);
p.setProperty("fileSeparator", File.separator.replace("\\", "\\\\"));
p.setProperty("fileid", String.valueOf(fileID));
TsExcelFile f = new TsExcelFile(copiedFile);
TransformationGraph graph = null;
TransformationGraphXMLReaderWriter graphReader = new TransformationGraphXMLReaderWriter(p);
GraphRuntimeContext runtimeContext = new GraphRuntimeContext();
runtimeContext.setUseJMX(false);
runtimeContext.setVerboseMode(false);
GraphExecutor executor = new GraphExecutor();
for (int i = 0; i < f.getNumsheets(); ++i) {
String sheetName = f.sheetNames[i];
p.setProperty("sheetname", sheetName);
try {
graph = graphReader.read(new FileInputStream(graphFile));
} catch (Exception e) {
logger.error("Failed to read graph !\n" + e.getMessage());
return;
}
graph.addConnection(dbconn);
try {
GraphExecutor.initGraph(graph);
} catch (ComponentNotReadyException e) {
logger.error("Exception occurred: ", e);
return;
}
Future<Result> result;
try {
result = executor.runGraph(graph, runtimeContext);
while (result.isDone()) {
;
}
if (!result.get().equals(Result.FINISHED_OK)) {
logger.error("Failed graph execution!");
return;
}
} catch (Exception e) {
logger.error("Failed graph execution!\n" + e.getMessage());
return;
}
} // end of for
}
Note:
The above code sets the properties of the graph file from Properties object. for each FOR-loop, the sheetname was changed in the Properties object, initialize a new Graph object, execute the Graph.
I also tried to dynamically modify the XLSReader object and set the sheetname. But the graph object did not detect the new sheetname.
Map <String, Node> nodes = graph.getNodes();
XLSReader xls = (XLSReader) nodes.get("XLS_READER0");
xls.setSheetName(sheetName);
xls.init(); <---- graph object did not detect the new sheetname.
I wrote a simple clover code to replace an existing function using SQL-Loader. Using an XLS with 3 sheets, the execution time for sql-loader is 2,485 millisecond, for clover it is 5,124 milliseconds. Some XLS file may contain more than 30 sheets, the executime time using clover was very slow.
The graph has an XLS reader and inserts all records of all sheets into a DB table. for each sheet on a Excel file, change the sheetname, init graph object and execute. I need your help optimizing my clover code.
Below is my graph file, the transformation component is a java class MyReformat.java which copies the xls row into DB table.
----------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<Graph author="bpilla" created="Fri Mar 21 16:56:02 CET 2008" guiVersion="1.10" id="1206534077725" licenseType="Evaluation license." modified="Tue May 20 11:48:14 EDT 2008" modifiedBy="bpilla" name="Test" revision="1.337">
<Global>
<Metadata fileURL="${cloverDir}${fileSeparator}meta${fileSeparator}xls_import.fmt" id="Metadata0"/>
<Metadata fileURL="${cloverDir}${fileSeparator}meta${fileSeparator}bulkuploaddata.fmt" id="Metadata3"/>
</Global>
<Phase number="0">
<Node cloverFields="SESSIONID;UPLOADTYPESTR;FILEID;UPLOADROWNUM;COLUMN01STR;COLUMN02STR " dbConnection="conn1" dbTable="BULKUPLOADDATA" enabled="enabled" guiHeight="0" guiName="DB Output Table" guiWidth="0" guiX="498" guiY="171" id="DB_OUTPUT_TABLE0" maxErrors="10" sqlQuery="INSERT INTO BULKUPLOADDATA (SESSIONID, UPLOADTYPESTR, FILEID, UPLOADROWNUM, COLUMN01STR, COLUMN02STR) VALUES ($SESSIONID, $UPLOADTYPESTR, $FILEID, $UPLOADROWNUM, $COLUMN01STR, $COLUMN02STR ) " type="org.jetel.component.DBOutputTable"/>
<Node enabled="enabled" guiHeight="0" guiName="Reformat" guiWidth="0" guiX="260" guiY="165" id="REFORMAT0" transformClass="MyReformat" type="org.jetel.component.Reformat" sessionid="${sessionid}" fileid="${fileid}" origFileName="${origFileName}" sheetname="${sheetname}" />
<Node enabled="enabled" fileURL="${copiedFile}" guiHeight="0" guiName="XLS Data Reader" guiWidth="0" guiX="1" guiY="132" id="XLS_READER0" sheetName="${sheetname}" startRow="2" type="org.jetel.component.XLSReader"/>
<Edge debugMode="false" fromNode="REFORMAT0:0" guiBendpoints="" id="edge2" inPort="Port 0 (in)" metadata="Metadata3" outPort="Port 0 (out)" toNode="DB_OUTPUT_TABLE0:0"/>
<Edge debugMode="false" fromNode="XLS_READER0:0" guiBendpoints="" id="edge1" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="REFORMAT0:0"/>
</Phase>
</Graph>
-------------------------------------------------------------------------
my clover code:
//if properties file has not been initialized, set it up now
if (propertiesFile == null) {
propertiesFile = cloverDir + File.separator + "defaultProperties";
EngineInitializer.initEngine(null, propertiesFile, null);
initDBConnection();
}
//in a windows environment, clover needs this
copiedFile = copiedFile.replace("\\", "\\\\");
cloverDir = cloverDir.replace("\\", "\\\\");
String graphFile = cloverDir + File.separator + graphName.replace("\\", "\\\\");
Properties p = new Properties();
p.setProperty("origFileName", origFileName);
p.setProperty("copiedFile", copiedFile);
p.setProperty("sessionid", sessionID);
p.setProperty("cloverDir", cloverDir);
p.setProperty("fileSeparator", File.separator.replace("\\", "\\\\"));
p.setProperty("fileid", String.valueOf(fileID));
TsExcelFile f = new TsExcelFile(copiedFile);
TransformationGraph graph = null;
TransformationGraphXMLReaderWriter graphReader = new TransformationGraphXMLReaderWriter(p);
GraphRuntimeContext runtimeContext = new GraphRuntimeContext();
runtimeContext.setUseJMX(false);
runtimeContext.setVerboseMode(false);
GraphExecutor executor = new GraphExecutor();
for (int i = 0; i < f.getNumsheets(); ++i) {
String sheetName = f.sheetNames[i];
p.setProperty("sheetname", sheetName);
try {
graph = graphReader.read(new FileInputStream(graphFile));
} catch (Exception e) {
logger.error("Failed to read graph !\n" + e.getMessage());
return;
}
graph.addConnection(dbconn);
try {
GraphExecutor.initGraph(graph);
} catch (ComponentNotReadyException e) {
logger.error("Exception occurred: ", e);
return;
}
Future<Result> result;
try {
result = executor.runGraph(graph, runtimeContext);
while (result.isDone()) {
;
}
if (!result.get().equals(Result.FINISHED_OK)) {
logger.error("Failed graph execution!");
return;
}
} catch (Exception e) {
logger.error("Failed graph execution!\n" + e.getMessage());
return;
}
} // end of for
}
Note:
The above code sets the properties of the graph file from Properties object. for each FOR-loop, the sheetname was changed in the Properties object, initialize a new Graph object, execute the Graph.
I also tried to dynamically modify the XLSReader object and set the sheetname. But the graph object did not detect the new sheetname.
Map <String, Node> nodes = graph.getNodes();
XLSReader xls = (XLSReader) nodes.get("XLS_READER0");
xls.setSheetName(sheetName);
xls.init(); <---- graph object did not detect the new sheetname.
-
Hi,
the slowest part of the program is initializing XLSReeader - then whole excel file is read into memory. To avoid it, you should set sheetNumber or sheetName attribute as pattern (if you want read all sheets set sheetName="*") - then sheet name is changed dynamically, when all records from previous sheet are read. For pattern format see XLReader
Please sign in to leave a comment.
Comments 1