How to parse a text file with C#

后端 未结 7 1903
面向向阳花
面向向阳花 2020-12-08 14:57

By text formatting I meant something more complicated.

At first I began manually adding the 5000 lines from the text file I\'m asking this question for,into my proje

相关标签:
7条回答
  • 2020-12-08 15:11

    Try regular expressions. You can find a certain pattern in your text and replace it with something that you want. I can't give you the exact code right now but you can test out your expressions using this.

    http://www.radsoftware.com.au/regexdesigner/

    0 讨论(0)
  • 2020-12-08 15:12

    You could do something like:

    using (TextReader rdr = OpenYourFile()) {
        string line;
        while ((line = rdr.ReadLine()) != null) {
            string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
            int theInt = Convert.ToInt32(fields[1]);
        }
    }
    

    The reason you didn't find relevant result when searching for 'formatting' is that the operation you are performing is called 'parsing'.

    0 讨论(0)
  • 2020-12-08 15:14

    Another solution, this time making use of regular expressions:

    using System.Text.RegularExpressions;
    
    ...
    
    Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");
    
    StreamReader reader = FileInfo.OpenText("filename.txt");
    string line;
    while ((line = reader.ReadLine()) != null) {
        Match match = parts.Match(line);
        if (match.Success) {
            int number = int.Parse(match.Group(1).Value);
            string path = match.Group(2).Value;
    
            // At this point, `number` and `path` contain the values we want
            // for the current line. We can then store those values or print them,
            // or anything else we like.
        }
    }
    

    That expression's a little complex, so here it is broken down:

    ^        Start of string
    \d+      "\d" means "digit" - 0-9. The "+" means "one or more."
             So this means "one or more digits."
    \t       This matches a tab.
    (\d+)    This also matches one or more digits. This time, though, we capture it
             using brackets. This means we can access it using the Group method.
    \t       Another tab.
    .+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
             This is to stop it grabbing everything in sight - it'll only grab as much
             as it needs to for the regex to work.
    \t       Another tab.
    
    (item\\[^\t]+\.ddj)
        Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"
    
    0 讨论(0)
  • 2020-12-08 15:14

    You could open the file up and use StreamReader.ReadLine to read the file in line-by-line. Then you can use String.Split to break each line into pieces (use a \t delimiter) to extract the second number.

    As the number of items is different you would need to search the string for the pattern 'item\*.ddj'.

    To delete an item you could (for example) keep all of the file's contents in memory and write out a new file when the user clicks 'Save'.

    0 讨论(0)
  • 2020-12-08 15:15

    One way that I've found really useful in situations like this is to go old-school and use the Jet OLEDB provider, together with a schema.ini file to read large tab-delimited files in using ADO.Net. Obviously, this method is really only useful if you know the format of the file to be imported.

    public void ImportCsvFile(string filename)
    {
        FileInfo file = new FileInfo(filename);
    
        using (OleDbConnection con = 
                new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
                file.DirectoryName + "\";
                Extended Properties='text;HDR=Yes;FMT=TabDelimited';"))
        {
            using (OleDbCommand cmd = new OleDbCommand(string.Format
                                      ("SELECT * FROM [{0}]", file.Name), con))
            {
                con.Open();
    
                // Using a DataReader to process the data
                using (OleDbDataReader reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        // Process the current reader entry...
                    }
                }
    
                // Using a DataTable to process the data
                using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
                {
                    DataTable tbl = new DataTable("MyTable");
                    adp.Fill(tbl);
    
                    foreach (DataRow row in tbl.Rows)
                    {
                        // Process the current row...
                    }
                }
            }
        }
    } 
    

    Once you have the data in a nice format like a datatable, filtering out the data you need becomes pretty trivial.

    0 讨论(0)
  • 2020-12-08 15:16

    Like it's already mentioned, I would highly recommend using regular expression (in System.Text) to get this kind of job done.

    In combo with a solid tool like RegexBuddy, you are looking at handling any complex text record parsing situations, as well as getting results quickly. The tool makes it real easy.

    Hope that helps.

    0 讨论(0)
提交回复
热议问题