Decoding/parsing CSV and CSV-like files in Swift

时光怂恿深爱的人放手 提交于 2020-06-01 07:38:07

问题


I'll have to write a very customised CSV-like parser/decoder. I have looked for open source ones on Github, but not found any that fits my needs. I can solve this, but my question is if it would be a total violation of the key/value decoding, to implement this as a TopLevelDecoder in Swift.

I have keys, but not exactly key/value pairs. In CSV files, there is rather a key for each column of data,

There are a number of problem with the files I need to parse:

  1. Commas are not only for separation of fields, but there are also commas within some fields. Example:
//If I convert to an array
Struct Family {
    let name: String?
    let parents: [String?]
    let siblings: [String?]
}

In this example, both parents' names are within the same field, and needs to be converted into an array, and also the siblings field.

"Name", "Parents","Siblings"
"Danny", "Margaret, John","Mike, Jim, Jane"

In the case of the parents, I could have split that into two fields in a struct like

Struct Family {
    let name: String?
    let mother: String?
    let father: String?
}

but with the Siblings field that doesn't work, since there can be all from zero to many siblings. Therefore I will have to use an array.

There are cases when I will split into two fields though.

  1. All the files I need to parse are not strictly CSV. All of the files have tabular data (comma-or tab-separated), but some of the files have a few rows of comments (sometimes containing metadata) that I need to consider. Those files have a .txt extension, instead of .csv.
## File generated 2020-05-02
"Name", "Parents","Siblings"
"Danny", "Margaret, John","Mike, Jim, Jane"

Therefore I need to peek at the first line(s) to determine if there are such comments, and after that has been parsed I can continue to treat the rest of the file as CSV.

I plan to make it look like any Decoder, from the applications point of view, but internally in my decoder i can handle things like they were a key/value pair, because there is just one set of keys, and that is the first line in the file, if there are no comments in the beginning. I still want to use CodingKeys though.

What are your thoughts? Should I implement in as a decoder (actually TopLevelDecoder in Swift), or would that be an abuse of the idea of key/value decoding? The alternative is to implement this as a parser, but I have to handle several types of files (JSON, GraphQL, CSV and CSV-like files), and I think my application code would be a lot simpler if I could use Decoders for all the types of files.

For JSON there's no problem, since there is already a HSON decoder in Swift. For GraphQL it's not a problem either, because I can write a decoder with an unkeyed container. The problem files are those CSV and CSV-like files.

Some of them have everything in double-quotes, but for the "keys" in the CSV header and for the values. Some only have double-quotes for the keys, but not for the values. Some have comma-separated fields, and some tab-separated. Some have commas within fields, that needs special handling. Some have comments in the beginning of the file, that needs to be skipped, before parsing the rest of the file as CSV.

Some files have two fields in the first column. I have no influence whatsoever of the format of these files, so I just have to deal with it.

If you wonder what files they are, I can tell you that they are files of raw DNA, files with DNA matches, files with common DNA segments with people I have matching DNA with. It's quite a few slightly different files, from several DNA testing companies. I wish they all had used JSON in a standard format, where all keys also were standard for all the companies. But they all have different CSV headers, and other differences.

I also have to decode Gedcom files, which sort of also has key/value coded pairs, but that format too doesn't conform to a pure key/value coding in the files.

ALso: I have searched for others with similar problems, but not exactly the same, so I didn't want to hijack their threads. See this thread Advice for going from CSV > JSON > Swift objects

That was more of a question of how to convert from CSV to JSON and then to internal data structs in Swift. I know I can write a parser to solve this, but I think it would be more elegant to handle all these files with decoders, but I want your thoughts about it.

I was also think of making a new protocol

protocol ColumnCodingKey: CodingKey {
)

I haven't decided yet what to have in the protocol, if anything. It might work by just having it empty like in the example, and then let my decoder conform to it, then it maybe wouldn't be a very big violation of the key/value decoding.

Thanks in advance!


回答1:


CSV files could be parsed using regular expression. To get you started this might save some time. It's hard to know what you really need because it looks like there are many different scenarios, it might grow to even more situations?

Regex expression to parse one line in a CSV file might look something like this
(?:(?:"(?:[^"]|"")*"|(?<=,)[^,]*(?=,))|^[^,]+|^(?=,)|[^,]+$|(?<=,)$)

Here is a detailed description on how it works with a javascript sample Build a CSV parser



来源:https://stackoverflow.com/questions/61578545/decoding-parsing-csv-and-csv-like-files-in-swift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!