Customer Portal

running out of memory with mysql jdbc

Comments 8

  • Avatar
    oldforum
    0
    Comment actions Permalink
    setFetchSize to Integer.MIN_VALUE (i.e fetchSize="-2147483648") would work except for one little problem.

    The property is set in the DBInputTable Component but is never passed on to the SQLDataParser which continues to use the default fetch size.

    A simple fix woould be to fix the setFetchSize function in DBInputTable.

    public void setFetchSize(int fetchSize){
    this.fetchSize=fetchSize;
    parser.setFetchSize(fetchSize);
    }


    This would also explain why setFetchSize doesn't seem to work with other drivers as well.


    P
  • Avatar
    oldforum
    0
    Comment actions Permalink
    The fetchSize is now set by DBInputTable within init() method just befor open() method of the parser is called - this way, the fetchSize() should be set for a new JDBC statement created internally by the parser:

    public void init() {
    //set fetch size (if defined)
    if (fetchSize!=0) parser.setFetchSize(fetchSize);
    // try to open file & initialize data parser
    parser.open(this.graph.getDBConnection(dbConnectionName), getOutputPort(WRITE_TO_PORT).getMetadata());

    }

    This change will be part of clover starting with version 1.1.7 (it is currently in CVS).
  • Avatar
    oldforum
    0
    Comment actions Permalink
    Thanks, this helps. You are probably right, since oracle works fine!

    Regards
    Akhil
  • Avatar
    oldforum
    0
    Comment actions Permalink
    I can't tell much to this as I don't have experience with using Clover with MySQL.

    I have used Clover with Oracle,PostgreSQL, Sybase ASA, MS SQL, MS ACCESS (through JDBC-ODBC bridge) and Firebird DB.

    In generall, there is no massive memory allocation done by Clover when loading or unloading data through JDBC. This must be some problem in the mysql JDBC driver.

    You mentioned Oracle - with oracle, you can use batch processing - "batchMode" option. It speeds up significantly loading of data into target tables.

    David.
  • Avatar
    oldforum
    0
    Comment actions Permalink
    Hi,

    I've also run into this issue when querying large data-sets using DB_INPUT_TABLE and mysql.

    I couldn't find an easy way to extend cloverETL to provide this support (without having to re-implement the rest of DB_INPUT_TABLE) so I've taking the interim step of patching a local copy of the source to use a different SQLParser instance that creates the statement using the mysql required options (based on a new parameter to DB_INPUT_TABLE)

    Although this isn't a problem with cloverETL, I wonder whether it would be possible to include a similar change, or re-factor the source to make such a customisation possible without needing a source modification? I'd be happy to assist with any changes / testing.

    Regards,
    Peter
  • Avatar
    oldforum
    0
    Comment actions Permalink
    Hi Parth

    Thanks a bunch for your reply...I did not actually have any resolution! What you are saying makes a lot of sense!

    Since I am working with DW, I decided to spit out results in a file and bulk load -- really fast option!

    Thanks
    Akhil
  • Avatar
    oldforum
    0
    Comment actions Permalink
    Hello Peter and others !

    As usually, I am ready to accept any patch/modification which helps prevent bad things from happening.
    If you could send me the fix/change, I may try to fit it into Clover and ensure that it does not clobber other stuff.

    David.

    PS: Nonetheless the fetch size can be set with the new release of Clover. Although it does not
    work consistently with all JDBC drivers.
  • Avatar
    oldforum
    0
    Comment actions Permalink
    Akhil,
    I don't know if you ever resolved this one. I ran into the problem the first time I used Clover with a 1Gb file.
    Turns out the problem _is_ with the MySQL JDBC driver. From the MySQL Connector/J documentation:
    --------------------------------------------
    By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to 'stream' the results back one row at-a-time.

    To enable this functionality, you need to create a Statement instance in the following manner:

    stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
    java.sql.ResultSet.CONCUR_READ_ONLY);
    stmt.setFetchSize(Integer.MIN_VALUE);
    ----------------------------------------------

    Unfortunately, the JDBC driver doesn't implement the setFetchSize call correctly.Apparently the call has little effect (except to set the fetch size to fetch one row at a time) in JDBC drivers earlier than version 3.2.1. Version 3.2.1 is currently in alpha. The fix mentioned above does work, though at the cost of speed.

    P

Please sign in to leave a comment.