Analyzing Connection Closed Exception in Spring/JPA/Mysql/Tomcat app

风格不统一 提交于 2019-11-28 23:32:31

This is woefully late for the OP, but maybe it will help someone else in the future:

I ran into something similar to this in a production environment with long-running batch jobs. The problem is if your code needs a connection longer than the time specified by property:

name="removeAbandonedTimeout" value="60

and you have enabled:

<property name="removeAbandoned" value="true" />

then it will get disconnected during the processing after 60 seconds. One possible workaround (that didn't work for me) is to enable the interceptor:

jdbcInterceptors="ResetAbandonedTimer"

This will reset the abandoned timer for that connection for every read/write that occurs. Unfortunately in my case, the processing would sometimes still take longer than the timeout before anything was read/written to the database. So I was forced to either bump the timeout length, or disable the removeAbandonded (I chose the former solution).

Hope this helps someone else if they run into something similar!

Vahid

I was recently asked to investigate why production system sometimes goes down. I wanted to share my findings since it involves a correlation of events to take a JVM tomcat app with JDBC issues as outlined above to actually crash the app. This is using mysql as a backend so probably most useful for this scenario but if issue hit on another platform cause likely to be the same.

By simply getting connection closed does not imply the application is broken

This is under a grails application but will be relative to all JVM related apps:

tomcat/context.xml db configuration, notice very small db pool and removeAbandonedTimeout="10" ye right we want things to break

<Resource
 name="jdbc/TestDB"  auth="Container" type="javax.sql.DataSource"
              driverClassName="com.mysql.jdbc.Driver"
              url="jdbc:mysql://127.0.0.1:3306/test"
              username="XXXX"
              password="XXXX"
              testOnBorrow="true"
              testWhileIdle="true"
              testOnReturn="true"
              factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
              removeAbandoned="true"
              logAbandoned="true"
              removeAbandonedTimeout="10"
              maxWait="5000"
              initialSize="1"
              maxActive="2"
              maxIdle="2"
              minIdle="2"
              validationQuery="Select 1" />

A quartz job that runs every minute, not that it matters the app I think dies on first attempt:

class Test2Job {
    static  triggers = {
               cron name: 'test2', cronExpression: "0 0/1 * * * ?"
        }
        def testerService
        def execute() {
        println "starting job2 ${new Date()}"
        testerService.basicTest3()

    }

}

Now our testService with comments so please follow comments:

def dataSource

  /**
   * When using this method in quartz all the jdbc settings appear to get ignored
   * the job actually completes notice huge sleep times compared to basicTest
   * strange and very different behaviour.
   * If I add Tester t = Tester.get(1L) and then execute below query I will get
   * connection pool closed error
   * @return
   */
  def basicTest2() {
      int i=1
      while (i<21) {
          def sql = new Sql(dataSource)
          def query="""select id as id  from tester t
                  where id=:id"""
          def instanceList = sql.rows(query,[id:i as Long],[timeout:90])
          sleep(11000)
          println "-- working on ${i}"
          def sql1 = new Sql(dataSource)
          sql1.executeUpdate(
                  "update tester t set t.name=? where t.id=?",
                  ['aa '+i.toString()+' aa', i as Long])

          i++
          sleep(11000)
      }
      println "run ${i} completed"
  }


  /**
   * This is described in above oddity
   * so if this method is called instead you will see connection closed issues
   */
  def basicTest3() {
      int i=1
      while (i<21) {
          def t = Tester.get(i)
          println "--->>>> test3 t ${t.id}"

          /**
           * APP CRASHER - This is vital and most important
           * Without this declared lots of closed connections and app is working
           * absolutely fine,
           * The test was originally based on execRun() which returns 6650 records or something
           * This test query is returned in time and does not appear to crash app
           *
           * The moment this method is called and please check what it is currently doing. It is simply
           * running a huge query which go beyond the time out values and as explained in previous emails MYSQL states
           *
           * The app is then non responsive and logs clearly show application is broke 
           */
          execRun2()


          def sql1 = new Sql(dataSource)
          sleep(10000)
          sql1.executeUpdate("update tester t set t.name=? where t.id=?",['aa '+i.toString()+' aa', t.id])
          sleep(10000)
          i++
      }

  }


  def execRun2() {
      def query="""select new map (t as tester) from Tester t left join t.children c
left join t.children c
                  left join c.childrena childrena
                  left join childrena.childrenb childrenb
                  left join childrenb.childrenc childrenc , Tester t2 left join t2.children c2 left join t2.children c2
                  left join c2.childrena children2a
                  left join children2a.childrenb children2b
                  left join children2b.childrenc children2c
             where ((c.name like (:name) or
                  childrena.name like (:name) or
                  childrenb.name like (:name) or (childrenc is null or childrenc.name like (:name))) or
                  (
                  c2.name like (:name) or
                  children2a.name like (:name) or
                  children2b.name like (:name) or (children2c is null or children2c.name like (:name))
      ))

          """
      //println "query $query"
      def results = Tester.executeQuery(query,[name:'aa'+'%'],[timeout:90])
      println "Records: ${results.size()}"

      return results
  }


  /**
   * This is no different to basicTest2 and yet
   * this throws a connection closed error and notice it is 20 not 20000
   * quite instantly a connection closed error is thrown when a .get is used vs
   * sql = new Sql(..) is a manuall connection
   *
   */
  def basicTest() {
      int i=1
      while (i<21) {
          def t = Tester.get(i)
          println "--- t ${t.id}"
          sleep(20)
          //println "publishing event ${event}"
          //new Thread({
          //    def event=new PurchaseOrderPaymentEvent(t,t.id)
          //    publishEvent(event)
          //} as Runnable ).start()

          i++
      }
  }

It is only when the query then takes longer than expected time but there has to be another element, the query itself then has to hit sit on MYSQL even though it is killed. MYSQL is eating it away processing it.

I think what is going on is

job 1 - hits app -> hits mysql ->    (9/10 left)
         {timeout} -> app killed  -> mysql running (9/10)
 job 2 - hits app -> hits mysql ->    (8/10 left)
         {timeout} -> app killed  -> mysql running (8/10) 
.....
 job 10 - hits app -> hits mysql ->    (10/10 left)
         {timeout} -> app killed  -> mysql running (10/10)
 job 11 - hits app -> 

If by this time job1 has not completed then we have nothing left in the pool well app is simply broke now.. jdbc errors thrown etc.. never mind if it completes after the crash..

You can monitor what is going on by checking mysql It appeared to run for a longer period which goes against what they have suggested this value should be doing, but then again maybe this isn’t really based on any of this and relates to a problem elsewhere.

Whilst testing noticed there were two states: Sending data / Sending to client:

|  92 | root | localhost:58462 | test | Query   |   80 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  95 | root | localhost:58468 | test | Query   |  207 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  96 | root | localhost:58470 | test | Query   |  147 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  97 | root | localhost:58472 | test | Query   |  267 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  98 | root | localhost:58474 | test | Sleep   |   18 |                   | NULL                                                                                                 |
|  99 | root | localhost:58476 | test | Query   |  384 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 100 | root | localhost:58478 | test | Query   |  327 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |

Sseconds later:

|  91 | root | localhost:58460 | test | Query   |   67 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  92 | root | localhost:58462 | test | Query   |  148 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  97 | root | localhost:58472 | test | Query   |  335 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | |
| 100 | root | localhost:58478 | test | Query   |  395 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |

Seconds after that: (all dead)
|  58 | root | localhost       | NULL | Query   |    0 | starting | show processlist |
|  93 | root | localhost:58464 | test | Sleep   |  167 |          | NULL             |
|  94 | root | localhost:58466 | test | Sleep   |  238 |          | NULL             |
|  98 | root | localhost:58474 | test | Sleep   |   74 |          | NULL             |
| 101 | root | localhost:58498 | test | Sleep   |   52 |          | NULL             |

It may be that a script needs to be created to monitor the process list and maybe a deeper result set containing exact queries running to work out which of your queries events is killing your app

The code uses a GenericDao which is extended in every Dao class. The GenericDao uses Spring's JpaTemplate to fetch a EntityManager instance which in turn is used for all DB operations. My understanding is using the JpaTemplate handles the nitty gritty of closing DB connections internally.

This is probably the root of your problem, you shouldn't use the JpaTemplate to get the EntityManager this will give you un unmanaged Entitymanager. In fact you shouldn't be using JpaTemplate at all.

It is recommended to write daos based on the plain EntityManager API and simply inject the EntityManager as you normally would do (with @PersistenceContext).

If you really want to use the JpaTemplate use the execute method and pass in a JpaCallback which will give you a managed EntityManager.

Also make sure that you have setup transactions correctly without proper tx setup connections will not be closed as spring doesn't know that it should close the connection.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!