How to parse a text file with C#

后端未结

关注

 7  1911

面向向阳花

By text formatting I meant something more complicated.

At first I began manually adding the 5000 lines from the text file I\'m asking this question for,into my proje

相关标签:

7条回答

南方客

2020-12-08 15:11

Try regular expressions. You can find a certain pattern in your text and replace it with something that you want. I can't give you the exact code right now but you can test out your expressions using this.

http://www.radsoftware.com.au/regexdesigner/

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-12-08 15:12
You could do something like:
```
using (TextReader rdr = OpenYourFile()) {
    string line;
    while ((line = rdr.ReadLine()) != null) {
        string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
        int theInt = Convert.ToInt32(fields[1]);
    }
}
```
The reason you didn't find relevant result when searching for 'formatting' is that the operation you are performing is called 'parsing'.
0 讨论(0)
发布评论:

提交评论
- 加载中...

执念已碎

2020-12-08 15:14

Another solution, this time making use of regular expressions:

using System.Text.RegularExpressions;

...

Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");

StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
    Match match = parts.Match(line);
    if (match.Success) {
        int number = int.Parse(match.Group(1).Value);
        string path = match.Group(2).Value;

        // At this point, `number` and `path` contain the values we want
        // for the current line. We can then store those values or print them,
        // or anything else we like.
    }
}

That expression's a little complex, so here it is broken down:

^        Start of string
\d+      "\d" means "digit" - 0-9. The "+" means "one or more."
         So this means "one or more digits."
\t       This matches a tab.
(\d+)    This also matches one or more digits. This time, though, we capture it
         using brackets. This means we can access it using the Group method.
\t       Another tab.
.+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
         This is to stop it grabbing everything in sight - it'll only grab as much
         as it needs to for the regex to work.
\t       Another tab.

(item\\[^\t]+\.ddj)
    Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"

0 讨论(0)

渐次进展

2020-12-08 15:14

You could open the file up and use StreamReader.ReadLine to read the file in line-by-line. Then you can use String.Split to break each line into pieces (use a \t delimiter) to extract the second number.

As the number of items is different you would need to search the string for the pattern 'item\*.ddj'.

To delete an item you could (for example) keep all of the file's contents in memory and write out a new file when the user clicks 'Save'.

0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2020-12-08 15:15

One way that I've found really useful in situations like this is to go old-school and use the Jet OLEDB provider, together with a schema.ini file to read large tab-delimited files in using ADO.Net. Obviously, this method is really only useful if you know the format of the file to be imported.

public void ImportCsvFile(string filename)
{
    FileInfo file = new FileInfo(filename);

    using (OleDbConnection con = 
            new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
            file.DirectoryName + "\";
            Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
    {
        using (OleDbCommand cmd = new OleDbCommand(string.Format
                                  ("SELECT * FROM [{0}]", file.Name), con))
        {
            con.Open();

            // Using a DataReader to process the data
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    // Process the current reader entry...
                }
            }

            // Using a DataTable to process the data
            using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
            {
                DataTable tbl = new DataTable("MyTable");
                adp.Fill(tbl);

                foreach (DataRow row in tbl.Rows)
                {
                    // Process the current row...
                }
            }
        }
    }
}

Once you have the data in a nice format like a datatable, filtering out the data you need becomes pretty trivial.

0 讨论(0)

星月不相逢

2020-12-08 15:16

Like it's already mentioned, I would highly recommend using regular expression (in System.Text) to get this kind of job done.

In combo with a solid tool like RegexBuddy, you are looking at handling any complex text record parsing situations, as well as getting results quickly. The tool makes it real easy.

Hope that helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页