问题
I have a strange problem with my server application. My system is simple: I have 1+ devices and one server app that communicate over a network. Protocol has binary packets with variable length, but fixed header (that contain info about current packet size). Example of packet:
char pct[maxSize] = {}
pct[0] = 0x5a //preambule
pct[1] = 0xa5 //preambule
pct[2] = 0x07 //packet size
pct[3] = 0x0A //command
... [payload]
The protocol is built on the principle of a command-answer.
I use boost::asio for communication - io_service with thread pull (4 threads) + async read/write operation (code example below) and create a "query cycle" - each 200ms by timer:
- query one value from device
- get result, query second value
- get result, start timer again
This work very well on boost 1.53 (Debug and Release). But then i switch to boost 1.54 (especially in Release mode) magic begins. My server successfuly starts, connects to device and starts "query cycle". For about 30-60 seconds everything work well (I receive data, data is correct), but then I start receive asio::error on last read handle (always in one place). Error type: EOF. After recieving the error, I must disconnect from device.
Some time of googling give me info about EOF indicate that other side (device in my case) initiated disconnect procedure. But, according to the logic of the device it can not be true. May somebody explain what's going on? May be i need set some socket option or defines? I see two possible reason:
- my side init disconnect (with some reason, that i don't know) and EOF is answer of this action.
- some socket timeout firing.
My environment:
- OS: Windows 7/8
- Compiler: MSVC 2012 Update 3
Sample code of main "query cycle". Is adapted from official boost chat example All code simplified for reduce space :)
- SocketWorker - low level wrapper for sockets
- DeviceWorker - class for device communication
- ERes - internal struct for error store
- ProtoCmd and ProtoAnswer - wrapper for raw array command and answer (chat_message analog from boost chat example)
- lw_service_proto namespace - predefined commands and max sizes of packets
So, code samples. Socket wrapper:
namespace b = boost;
namespace ba = boost::asio;
typedef b::function<void(const ProtoAnswer answ)> DataReceiverType;
class SocketWorker
{
private:
typedef ba::ip::tcp::socket socketType;
typedef std::unique_ptr<socketType> socketPtrType;
socketPtrType devSocket;
ProtoCmd sendCmd;
ProtoAnswer rcvAnsw;
//[other definitions]
public:
//---------------------------------------------------------------------------
ERes SocketWorker::Connect(/*[connect settings]*/)
{
ERes res(LGS_RESULT_ERROR, "Connect to device - Unknow Error");
using namespace boost::asio::ip;
boost::system::error_code sock_error;
//try to connect
devSocket->connect(tcp::endpoint(address::from_string(/*[connect settings ip]*/), /*[connect settings port]*/), sock_error);
if(sock_error.value() > 0) {
//[work with error]
devSocket->close();
}
else {
//[res code ok]
}
return res;
}
//---------------------------------------------------------------------------
ERes SocketWorker::Disconnect()
{
if (devSocket->is_open())
{
boost::system::error_code ec;
devSocket->shutdown(bi::tcp::socket::shutdown_send, ec);
devSocket->close();
}
return ERes(LGS_RESULT_OK, "OK");
}
//---------------------------------------------------------------------------
//query any cmd
void SocketWorker::QueryCommand(const ProtoCmd cmd, DataReceiverType dataClb)
{
sendCmd = std::move(cmd); //store command
if (sendCmd .CommandLength() > 0)
{
ba::async_write(*devSocket.get(), ba::buffer(sendCmd.Data(), sendCmd.Length()),
b::bind(&SocketWorker::HandleSocketWrite,
this, ba::placeholders::error, dataClb));
}
else
{
cerr << "Send command error: nothing to send" << endl;
}
}
//---------------------------------------------------------------------------
// boost socket handlers
void SocketWorker::HandleSocketWrite(const b::system::error_code& error,
DataReceiverType dataClb)
{
if (error)
{
cerr << "Send cmd error: " << error.message() << endl;
//[send error to other place]
return;
}
//start reading header of answer (lw_service_proto::headerSize == 3 bytes)
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Data(), lw_service_proto::headerSize),
b::bind(&SocketWorker::HandleSockReadHeader,
this, ba::placeholders::error, dataClb));
}
//---------------------------------------------------------------------------
//handler for read header
void SocketWorker::HandleSockReadHeader(const b::system::error_code& error, DataReceiverType dataClb)
{
if (error)
{
//[error working]
return;
}
//decode header (check preambule and get full packet size) and read answer payload
if (rcvAnsw.DecodeHeaderAndGetCmdSize())
{
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Answer(), rcvAnsw.AnswerLength()),
b::bind(&SocketWorker::HandleSockReadBody,
this, ba::placeholders::error, dataClb));
}
}
//---------------------------------------------------------------------------
//handler for andwer payload
void SocketWorker::HandleSockReadBody(const b::system::error_code& error, DataReceiverType dataClb)
{
//if no error - send anwser to 'master'
if (!error){
if (dataClb != nullptr)
dataClb(rcvAnsw);
}
else{
//[error process]
//here i got EOF in release mode
}
}
};
Device worker
class DeviceWorker
{
private:
const static int LW_QUERY_TIME = 200;
LWDeviceSocketWorker sockWorker;
ba::io_service& timerIOService;
typedef std::shared_ptr<ba::deadline_timer> TimerPtr;
TimerPtr queryTimer;
bool queryCycleWorking;
//[other definitions]
public:
ERes DeviceWorker::Connect()
{
ERes intRes = sockWorker.Connect(/*[connect settings here]*/);
if(intRes != LGS_RESULT_OK) {
//[set result to error]
}
else {
//[set result to success]
//start "query cycle"
StartNewCycleQuery();
}
return intRes;
}
//---------------------------------------------------------------------------
ERes DeviceWorker::Disconnect()
{
return sockWorker.Disconnect();
}
//---------------------------------------------------------------------------
void DeviceWorker::StartNewCycleQuery()
{
queryCycleWorking = true;
//start timer
queryTimer = make_shared<ba::deadline_timer>(timerIOService, bt::milliseconds(LW_QUERY_TIME));
queryTimer->async_wait(boost::bind(&DeviceWorker::HandleQueryTimer,
this, boost::asio::placeholders::error));
}
//---------------------------------------------------------------------------
void DeviceWorker::StopCycleQuery()
{
//kill timer
if (queryTimer)
queryTimer->cancel();
queryCycleWorking = false;
}
//---------------------------------------------------------------------------
//timer handler
void DeviceWorker::HandleQueryTimer(const b::system::error_code& error)
{
if (!error)
{
ProtoCmd cmd;
//query for first value
cmd.EncodeCommandCore(lw_service_proto::cmdGetAlarm, 1);
sockWorker.QueryCommand(cmd, boost::bind(&DeviceWorker::ReceiveAlarmCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive first value
void DeviceWorker::ReceiveAlarmCycle(ProtoAnswer adata)
{
//check and fix last bytes (remove \r\n from some commands)
adata.CheckAndFixFooter();
//[working with answer]
if (queryCycleWorking)
{
//query for second value
ProtoCmd cmd;
cmd.EncodeCommandCore(lw_service_proto::cmdGetEnergyLevel, 1);
sockWorker.QueryCommand(cmd, b::bind(&DeviceWorker::ReceiveEnergyCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive second value
void DeviceWorker::ReceiveEnergyCycle(ProtoAnswer edata)
{
//check and fix last bytes (remove \r\n from some commands)
edata.CheckAndFixFooter();
//[working with second value]
//start new "query cycle"
if (queryCycleWorking)
StartNewCycleQuery();
}
};
Any ideas are welcome :)
edit: After several test I see anower picture:
- this issue reproduce on boost 1.54 only (Debug and Release mode, Release - much more faster), with boost 1.53 no more error (maybe i poorly clean my code then rebuild first times....)
- with boost 1.54 and 1 thread (instead of 4) all work well
I also spend some time with debugger and boost source and making some conclusion:
- When i receive EOF my data is already fully received.
- This EOF indicate that is nothing to transfer in this operation, i.e. socket result flag is 0 (no error), but boost operation flag if EOF (transfer bytes == 0)
At this moment I am forced to switch on boost 1.53...
回答1:
I had the exact same problem and I am quite sure that this is a bug of boost::asio 1.54.0
Here is the bug report.
The solution is effectively to get back to 1.53, although there is a patch available for 1.54 in the bug report page.
回答2:
If your application works fine with a single thread invoking io_service::run()
but fails with four threads, you very likely have a race condition. This type of problem is difficult to diagnose. Generally speaking you should ensure your devSocket
has at most one outstanding async_read()
and async_write()
operation. Your current implementation of SocketWorker::QueryCommand()
unconditionally invokes async_write()
which may violate the ordering assumption documented as such
This operation is implemented in terms of zero or more calls to the stream's
async_write_some
function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such asasync_write
, the stream'sasync_write_some
function, or any other composed operations that perform writes) until this operation completes.
The classic solution to this problem is to maintain a queue of outgoing messages. If a previous write is outstanding, append the next outgoing message to the queue. When the previous write completes, initiate the async_write()
for the next message in the queue. When using multiple threads invoking io_service::run()
you may need to use a strand as the linked answer does.
来源:https://stackoverflow.com/questions/17653366/eof-in-boostasync-read-with-thread-pull-and-boost-1-54