问题
I have a function that will read a CSV file line by line. For each line, it will split the line into a vector. The code to do this is
std::stringstream ss(sText);
std::string item;
while(std::getline(ss, item, ','))
{
m_vecFields.push_back(item);
}
This works fine except for if it reads a line where the last value is blank. For example,
text1,tex2,
I would want this to return a vector of size 3 where the third value is just empty. However, instead it just returns a vector of size 2. How can I correct this?
回答1:
bool addEmptyLine = sText.back() == ',';
/* your code here */
if (addEmptyLine) m_vecFields.push_back("");
or
sText += ','; // text1, text2,,
/* your code */
assert(m_vecFields.size() == 3);
回答2:
You could just use boost::split
to do all this for you.
http://www.boost.org/doc/libs/1_50_0/doc/html/string_algo/usage.html#id3207193
It has the behaviour that you require in one line.
Example boost::split Code
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
using namespace std;
int main()
{
vector<string> strs;
boost::split(strs, "please split,this,csv,,line,", boost::is_any_of(","));
for ( vector<string>::iterator it = strs.begin(); it < strs.end(); it++ )
cout << "\"" << *it << "\"" << endl;
return 0;
}
Results
"please split"
"this"
"csv"
""
"line"
""
回答3:
You can use a function similar to this:
template <class InIt, class OutIt>
void Split(InIt begin, InIt end, OutIt splits)
{
InIt current = begin;
while (begin != end)
{
if (*begin == ',')
{
*splits++ = std::string(current,begin);
current = ++begin;
}
else
++begin;
}
*splits++ = std::string(current,begin);
}
It will iterate through the string and whenever it encounters the delimiter, it will extract the string and store it in the splits iterator.
The interesting part is
- when current == begin it will insert an empty string (test case: "text1,,tex2")
- the last insertion guarantees there will always be the correct number of elements.
If there is a trailing comma, it will trigger the previous bullet point and add an empty string, otherwise it will add the last element to the vector.
You can use it like this:
std::stringstream ss(sText);
std::string item;
std::vector<std::string> m_vecFields;
while(std::getline(ss, item))
{
Split(item.begin(), item.end(), std::back_inserter(m_vecFields));
}
std::for_each(m_vecFields.begin(), m_vecFields.end(), [](std::string& value)
{
std::cout << value << std::endl;
});
回答4:
Flexible solution for parsing csv files: where:
source - content of CSV file
delimeter - CSV delimeter eg. ',' ';'
std::vector<std::string> csv_split(std::string source, char delimeter) {
std::vector<std::string> ret;
std::string word = "";
int start = 0;
bool inQuote = false;
for(int i=0; i<source.size(); ++i){
if(inQuote == false && source[i] == '"'){
inQuote = true;
continue;
}
if(inQuote == true && source[i] == '"'){
if(source.size() > i && source[i+1] == '"'){
++i;
} else {
inQuote = false;
continue;
}
}
if(inQuote == false && source[i] == delimeter){
ret.push_back(word);
word = "";
} else {
word += source[i];
}
}
ret.push_back(word);
return ret;
}
回答5:
C++11 makes it exceedingly easy to handle even escaped commas using regex_token_iterator:
std::stringstream ss(sText);
std::string item;
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
std::getline(ss, item)
m_vecFields.insert(m_vecFields.end(), sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator());
Incidentally if you simply wanted to construct a vector<string>
from a CSV string
such as item
you could just do:
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
vector<string> m_vecFields{sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator()};
[Live Example]
Some quick explanation of the regex
is probably in order. (?:[^\\\\,]|\\\\.)
matches escaped characters or non-','
characters. (See here for more info: https://stackoverflow.com/a/7902016/2642059) The *?
means that it is not a greedy match, so it will stop at the first ','
reached. All that's nested in a capture, which is selected by the last parameter, the 1
, to regex_token_iterator
. Finally, (?:,|$)
will match either the ','
-delimiter or the end of the string
.
To make this standard CSV reader ignore empty elements, the regex can be altered to only match strings with more than one character.
const regex re{"((?:[^\\\\,]|\\\\.)+?)(?:,|$)"};
Notice the '+'
has now replaced the '*'
indicating 1 or more matching characters are required. This will prevent it from matching your item
string that ends with a ','
. You can see an example of this here: http://ideone.com/W4n44W
来源:https://stackoverflow.com/questions/11310947/splitting-a-line-of-a-csv-file-into-a-stdvector