问题
I have implemented Cuong's solution here: C# Processing Fixed Width Files
Here is my code:
var lines = File.ReadAllLines(@fileFull);
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
var list = new List<KeyValuePair<int, int>>();
int startIndex = 0;
for (int i = 0; i < widthList.Count(); i++)
{
var pair = new KeyValuePair<int, int>(startIndex, widthList[i]);
list.Add(pair);
startIndex += widthList[i];
}
var csvLines = lines.Select(line => string.Join(",",
list.Select(pair => line.Substring(pair.Key, pair.Value))));
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
@fileFull = File Path & Name
The issue I have is the first line of the input file also contains digits. So it could be AAAAAABBC111111111DD2EEEEEE etc. For some reason the output from Cuong's code gives me CSV headings like 1111RRRR and 222223333.
Does anyone know why this is and how I would fix it?
Header row example:
AAAAAAAAAAAAAAAABBBBBBBBBBCCCCCCCCDEFCCCCCCCCCGGGGGGGGHHHHHHHHIJJJJJJJJKKKKLLLLMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPPPPQQQQ1111RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR222222222333333333444444444555555555666666666777777777888888888999999999S00001111TTTTTTTTTTTTUVWXYZ!"£$$$$$$%&
Converted header row:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCCDEFCCCCCC C C C GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111RRRR RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR2222 222223333 333334444 444445555 555556666 666667777 777778888 888889999 99999S000 0 1111 TTTTTTTTTTTT U V W X Y Z ! ",�,$$$$$$,%,&,"
Jodrell - I implemented your suggestion but the header output is like:
BBBBBBBBBBCCCCCC CCCCCCCCD DEFCCCC GGGGGGGG HHHHHHH IJJJJJJ KKKKLLL LLL MMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPPQQQQ1111RRRRRRRRRRRRRRRRR QQQ 111 RRR 33333333 44444444 55555555 66666666 77777777 88888888 99999999 S0000111 111 TTT UVWXYZ!"�$$ %&
回答1:
As Jodrell already mentioned, your code doesn't work because it assumed that the character representing each column header is distinct. Change the code that parse the header widths would fix it.
Replace:
var widthList = lines.First().GroupBy(c => c)
.Select(g => g.Count())
.ToList();
With:
var widthList = new List<int>();
var header = lines.First().ToArray();
for (int i = 0; i < header.Length; i++)
{
if (i == 0 || header[i] != header[i-1])
widthList.Add(0);
widthList[widthList.Count-1]++;
}
Parsed header columns:
AAAAAAAAAAAAAAAA BBBBBBBBBB CCCCCCCC D E F CCCCCCCCC GGGGGGGG HHHHHHHH I JJJJJJJJ KKKK LLLL MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO PPPP QQQQ 1111 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 222222222 333333333 444444444 555555555 666666666 777777777 888888888 999999999 S 0000 1111 TTTTTTTTTTTT U V W X Y Z ! " £ $$$$$$ % &
回答2:
EDIT
Because the problem annoyed me I wrote some code that handles "
and ,
. This code replaces the header row with comma delimited alternating zeros and ones. Any commas or double quotes in the body are appropriately escaped.
static void FixedToCsv(string sourceFile)
{
if (sourceFile == null)
{
// Throw exception
}
var dir = Path.GetDirectory(sourceFile)
var destFile = string.Format(
"{0}{1}",
Path.GetFileNameWithoutExtension(sourceFile),
".csv");
if (dir != null)
{
destFile = Path.Combine(dir, destFile);
}
if (File.Exists(destFile))
{
// Throw Exception
}
var blocks = new List<KeyValuePair<int, int>>();
using (var output = File.OpenWrite(destFile))
{
using (var input = File.OpenText(sourceFile))
{
var outputLine = new StringBuilder();
// Make header
var header = input.ReadLine();
if (header == null)
{
return;
}
var even = false;
var lastc = header.First();
var counter = 0;
var blockCounter = 0;
foreach(var c in header)
{
counter++;
if (c == lastc)
{
blockCounter++;
}
else
{
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter - 1,
blockCounter));
blockCounter = 1;
outputLine.Append(',');
even = !even;
}
outputLine.Append(even ? '1' : '0');
lastc = c;
}
blocks.Add(new KeyValuePair<int, int>(
counter - blockCounter,
blockCounter));
outputLine.AppendLine();
var lineBytes = Encoding.UTF.GetBytes(outputLine.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
// Process Body
var inputLine = input.ReadLine();
while (inputLine != null)
{
foreach(var block in block.Select(b =>
inputLine.Substring(b.Key, b.Value)))
{
var sanitisedBlock = block;
if (block.Contains(',') || block.Contains('"'))
{
santitisedBlock = string.Format(
"\"{0}\"",
block.Replace("\"", "\"\""));
}
outputLine.Append(sanitisedBlock);
outputLine.Append(',');
}
outputLine.Remove(outputLine.Length - 1, 1);
outputLine.AppendLine();
lineBytes = Encoding.UTF8.GetBytes(outputLne.ToString());
outputLine.Clear();
output.Write(lineBytes, 0, lineBytes.Length);
inputLine = input.ReadLine();
}
}
}
}
1
is repeated in your header row, so your two fours get counted as one eight and everything goes wrong from there.
(There is a block of four 1
s after the Q
s and another block of four 1
s after the 0
s)
Essentialy, your header row is invalid or, at least, doesen't work with the proposed solution.
Okay, you could do somthing like this.
public void FixedToCsv(string fullFile)
{
var lines = File.ReadAllLines(fullFile);
var firstLine = lines.First();
var widths = new List<KeyValuePair<int, int>>();
var innerCounter = 0;
var outerCounter = 0
var firstLineChars = firstLine.ToCharArray();
var lastChar = firstLineChars[0];
foreach(var c in firstLineChars)
{
if (c == lastChar)
{
innerCounter++;
}
else
{
widths.Add(new KeyValuePair<int, int>(
outerCounter
innerCounter);
innerCounter = 0;
lastChar = c;
}
outerCounter++;
}
var csvLines = lines.Select(line => string.Join(",",
widths.Select(pair => line.Substring(pair.Key, pair.Value))));
// Get filePath and fileName from fullFile here.
File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);
}
来源:https://stackoverflow.com/questions/12778173/c-sharp-processing-fixed-width-files-solution-not-working