将发送端sleep时间定为0, 即sleep(0).
事例率也在700Hz持续3分钟后降为0.查看log文件:
1. SFI 的log文件:
WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130745 misses 1 data fragment(s) from: ROS-Eth-07,
WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130749 misses 1 data fragment(s) from: ROS-Eth-07,
WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130753 misses 1 data fragment(s) from: ROS-Eth-07,
2. ROS 的log文件:
Warning could not load libEthClientSequentialReadoutModule.so, trying libROSEthClientSequentialReadoutModule.so
ERROR 2017-Feb-08 15:44:10 [virtual void ROS::RequestHandler::run(...) at ROSCore/src/RequestHandler.cpp:274] Unclassified error: RequestHandler caught exception, deleting Request
was caused by: ERROR 2017-Feb-08 15:44:10 [unknown at unknown:0] No pages available
找到ROSCore/src/RequestHandler.cpp:274:
catch (std::exception& issue) {
274 ENCAPSULATE_ROS_EXCEPTION(newIssue, CoreException, UNCLASSIFIED, issue, "RequestHandler caught exception, deleting Request ");
275 ers::error(newIssue);
276 deleteRequest();
277 }
先找到 ENCAPSULATE_ROS_EXCEPTION的定义:并没有看出啥,只看出来例外的类型是CoreException, errorCode是UNCLASSIFIED.
/** \def ENCAPSULATE_ROS_EXCEPTION(instanceName, exceptionClass, errorCode, oldException, messageContent)
Macro to encapsulate an exception inside a ROSException along with
some explanatory text.
*/
#define ENCAPSULATE_ROS_EXCEPTION(instanceName, exceptionClass, errorCode, oldException, messageContent) \
std::ostringstream instanceName##_tStream; \
instanceName##_tStream << messageContent; \
exceptionClass instanceName(oldException, exceptionClass::errorCode, instanceName##_tStream.str(), ERS_HERE)
#endif
然后再查看try里面的语句
243 try {
244 TS_RECORD(TS_H1,stamp_offset + 2100);
245 int requestOk;
246 //unsigned int tsdata = m_handlerId * 100 + 3000000000;
247 //TS_RECORD(TS_H5,tsdata);
248
249 requestOk = m_request->execute();
250 TS_RECORD(TS_H1,stamp_offset + 2200);
251
252 if (requestOk == Request::REQUEST_OK) {
253 m_requestsHandled++;
254 deleteRequest();
255 TS_RECORD(TS_H1, stamp_offset + 2300);
256 }
257 else if (requestOk == Request::REQUEST_TIMEOUT) {
258 m_requestsTimedOut++;
259 deleteRequest();
260 TS_RECORD(TS_H1, stamp_offset + 2300);
261 }
262 else {
263 TS_RECORD(TS_H1, stamp_offset + 2400);
264 DEBUG_TEXT(DFDB_ROSCORE, 15, "RequestHandler(" << m_handlerId <<")::run: Request non completed: will be requeued.");
265 m_requestsFailed++;
266 m_request = m_requestQueue->swap(m_request);
267 DFThread::yieldOrCancel();
268 TS_RECORD(TS_H1, stamp_offset + 2500);
269 }
270
271 TS_RECORD(TS_H1, stamp_offset + 2990);
272 }
整个try...catch语句是在requesthandler的run函数里面,run函数是requesthandler的执行线程,从request queue里面拿出一个request, 执行这个request,然后再删除这个request.
通过加打印得到:try语句里执行了第一个if语句,证明request被执行后返回了REQUEST_OK,在执行if内部的语句deleteRequest();时,出现了例外。
再查看deleteRequest()函数里面写了什么:
193 void RequestHandler::deleteRequest()
194 {
195 if (m_request != 0) {
197 m_mutex->lock() ; // Memory deallocation has to be protected as
198 delete(m_request); // it is thread-unsafe!
199 m_request=0;
200 m_mutex->unlock();
201 }
202 }
在 if 里面每条语句里面加打印,在requesthandler的析构函数里面加打印,并查看log文件
void RequestHandler::deleteRequest()
194 {
195 if (m_request != 0) {
196 std::cout << "m_request is not 0:1!" << std::endl;
197 m_mutex->lock() ; // Memory deallocation has to be protected as
198 std::cout << "m_request is not 0:2!" << std::endl;
199 delete(m_request); // it is thread-unsafe!
200 std::cout << "m_request is not 0:3!" << std::endl;
201 m_request=0;
202 std::cout << "m_request is not 0:4!" << std::endl;
203 m_mutex->unlock();
204 std::cout << "m_request is not 0:5!" << std::endl;
205 }
206 }
查看ROS-Eth-04的log文件,发现:
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:1" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261133
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:2" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239
[lhaaso@cmm03node01 part_dk_ef]$ grep "~Request" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
130624
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:3" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:4" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:5" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261238
这几条语句的执行次数,发现deleteRequest操作的次数比其前后打印的次数要少很多,证明有很多次delteRequest并没有成功,抛出例外,并导致程序报错了。
(批注:实际上是Request有两个子类,FragmentRequest 和 ReleaseFragment类, RequestHandler执行的时候根据收到SFI的消息种类判断执行哪一种Request。)
来源:https://www.cnblogs.com/zengtx/p/6378547.html