EclipseLink batch insert very very slow

妖精的绣舞 提交于 2019-12-05 00:21:02

问题


I'm trying to optimize my JPA implementation, using EclipseLink. I've added batch operations to it. But it is still taking A LOT of time to do 50 000 inserts. It takes more than 10 times the amount of time it takes to do the exact same insert using raw SQL with JDBC.

To make sure batch operations were in fact working I used Wireshark to check my packets and it is not using batch inserts.

Here's one of the insert packets:

It is not doing:

INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0'), ('sfirosijfhgdoi 0', 'dsufius0')... and so on

I was expecting it to do as above but it is inserting one line per packet and not multiple lines per packet.

Here's is my Entity Class:

@Entity
public class EntityClassTest implements Serializable {
    private static final long serialVersionUID = 1L;

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String name;
    private String lastname;

    public EntityClassTest() {
    }

    public EntityClassTest(Long id, String name, String lastname) {
        this.id = id;
        this.name = name;
        this.lastname = lastname;
    }

    public EntityClassTest(String name, String lastname) {
        this.name = name;
        this.lastname = lastname;
    }

    public Long getId() {
        return id;
    }

    public String getName() {
        return name;
    }

    public String getLastName() {
        return lastname;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setLastName(String lastname) {
        this.lastname = lastname;
    }

    @Override
    public int hashCode() {
        int hash = 0;
        hash += (id != null ? id.hashCode() : 0);
        return hash;
    }

    @Override
    public boolean equals(Object object) {
        // TODO: Warning - this method won't work in the case the id fields are not set
        if (!(object instanceof EntityClassTest)) {
            return false;
        }
        EntityClassTest other = (EntityClassTest) object;
        if ((this.id == null && other.id != null) || (this.id != null && !this.id.equals(other.id))) {
            return false;
        }
        return true;
    }

    @Override
    public String toString() {
        return "database.EntityClassTest [id=" + id + " ]";
    }

}

And here is my persist method that receives a List and persists all the objects inside.

public void insertListToTable(final String persistenceUnit, final List list) throws SQLException {
        final EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory(persistenceUnit);
        final EntityManager entityManager = entityManagerFactory.createEntityManager();
        final EntityTransaction transaction = entityManager.getTransaction();

        try {            
            final int listSize = list.size();
            transaction.begin();

            for (int i = 0; i<listSize; i++) { //Object object : list) {
                final Object object = list.get(i);
                entityManager.persist(object);

                if ( i % 500 == 0 ) { //500, same as the JDBC batch size defined in the persistence.xml
                    //flush a batch of inserts and release memory:
                    entityManager.flush();
                    entityManager.clear();
                }
            }
            transaction.commit();
        }
        catch(Exception e) {
            if (transaction != null) {
                transaction.rollback();
            }
            throw new SQLException(e.getMessage());
        } 
        finally {
            entityManager.close();
        }
    }

And my persistence.xml, where I set 500 as the batch value, file is:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1" xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="ExternalServer" transaction-type="RESOURCE_LOCAL">
    <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
    <!-- List of Entity classes -->
    <class>model.EntityClassTest</class>
    <properties>
      <property name="javax.persistence.jdbc.url" value="jdbc:mysql://myServer:3306/testdb?zeroDateTimeBehavior=convertToNull"/>
      <property name="javax.persistence.jdbc.user" value="testdbuser"/>
      <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver"/>
      <property name="javax.persistence.jdbc.password" value="myPassword"/>
      <property name="javax.persistence.schema-generation.database.action" value="create"/>

      <!-- Weaving -->
      <property name="eclipselink.weaving" value="static"/>
      <!-- SQL dialect / Database type -->
      <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect"/>
      <property name="eclipselink.target-database" value="MySQL"/>
      <!-- Tell the JPA provider to, by default, create the table if it does not exist. -->
      <property name="javax.persistence.schema-generation.database.action" value="create"/>
      <!-- No logging (For development change the value to "FINE") -->
      <property name="eclipselink.logging.level" value="OFF"/>
      <!-- Enable batch writing -->
      <property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
      <!-- Batch size -->
      <property name="eclipselink.jdbc.batch-writing.size" value="500"/>

    </properties>
  </persistence-unit>
</persistence>

So my question is, why is it not batch inserting? I believe I've EclipseLink well configured to do so from what I've been reading around on EclipseLink website and here SO as well.

//////////////////////////// EDIT //////////////////////////

As suggested by Chris's answer, I changed in my EntityClassTest this value @GeneratedValue(strategy = GenerationType.IDENTITY) to @GeneratedValue(strategy = GenerationType.SEQUENCE) and re-run the test and the packets are being sent as before (like the image I posted above). So it didn't fix my problem I'm afraid.

//////////////////////////// EDIT 2 ////////////////////////

I've changed the logging level in the persistence.xml file to FINEST as shown next.

<property name="eclipselink.logging.level" value="FINEST"/>

And here is the log generated. I placed it in a pastebin because it is quite long.

http://pastebin.com/rKihCKMW

It seems to be calling Execute query InsertObjectQuery quite a lot of times.

//////////////////////////// EDIT 3 ////////////////////////

Here's the version for each component I'm using.

+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.6.12                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.6.12-log                   |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Win64                        |
+-------------------------+------------------------------+

Netbeans 8.0

EclipseLink (JPA 2.1)

mysql-connector-java-5.1.24.jar

//////////////////////////// EDIT 4 ////////////////////////

Following CuriousMind's answer I've edited my EntityClassTest id annotation to:

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO, generator="id-seq-gen")
    @SequenceGenerator( name="id-seq-gen", sequenceName="ID_SEQ_GEN", allocationSize=500 )
    private Long id;

But it didn't solve my problem, I'm still getting one single insert per packet (as described by the image above) and on the EclipseLink log I'm getting:

[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--SELECT LAST_INSERT_ID()
[EL Finest]: sequencing: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--assign sequence to the object (1.251 -> database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--Execute query InsertObjectQuery(database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--ClientSession(824177287)--Thread(Thread[main,5,main])--Execute query ValueReadQuery(name="ID_SEQ_GEN" sql="SELECT LAST_INSERT_ID()")
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES (?, ?)
    bind => [sfirosijfhgdoi 2068, dsufius1034]
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--SELECT LAST_INSERT_ID()
[EL Finest]: sequencing: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--assign sequence to the object (1.252 -> database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--UnitOfWork(1985011414)--Thread(Thread[main,5,main])--Execute query InsertObjectQuery(database.EntityClassTest [id=null ])
[EL Finest]: query: 2014-10-19 06:44:02.608--ClientSession(824177287)--Thread(Thread[main,5,main])--Execute query ValueReadQuery(name="ID_SEQ_GEN" sql="SELECT LAST_INSERT_ID()")
[EL Fine]: sql: 2014-10-19 06:44:02.608--ClientSession(824177287)--Connection(1674390738)--Thread(Thread[main,5,main])--INSERT INTO ENTITYCLASSTEST (LASTNAME, NAME) VALUES (?, ?)
    bind => [sfirosijfhgdoi 2244, dsufius1122]

And so on...


回答1:


You are using GenerationType.IDENTITY for sequencing, which requires retrieving the IDs from each insert statement one by one. Try a sequencing scheme that allows preallocation in batches of 500 and you will see improvements: http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Identity_sequencing




回答2:


It's been one year already, probably it's too late to answer. In my case, I found that flush() cause the problem. I call flush() after persist() for each record. This prevent batch writing to make an optimization and results in poor insert performance. After removing flush(), everything went well.




回答3:


It seems the sequence generation has been causing the issue, you refer to this post. It provides approach of preallocated sequence generation.




回答4:


This could be becuase of the sequence/indentity (need to use preallocation), but also for mysql I think you need a connection string property to allow bulk insert:

rewriteBatchedStatements=true

jdbc:mysql://localhost:3306/test?rewriteBatchedStatements=true


来源:https://stackoverflow.com/questions/26416847/eclipselink-batch-insert-very-very-slow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!