How to separate paragraphs in a string

时光总嘲笑我的痴心妄想 提交于 2019-12-10 16:45:37

问题


Hi guys I really need your help. I was trying to take a multi-line string which was concluded of a few paragraphs and split it into a few individual texts.

I realized that whenever I skip a line there is a sequence of \n\r in there. Afterwards I thought that each new line starts with a \n and end with a \r. Therefor, I wrote the following code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication15
{
   class Program
   {
    struct ParagraphInfo
    {
        public ParagraphInfo(string text)
        {
            int i;
            Text = text;
            i = text.IndexOf('.');
            FirstSentence = text.Substring(0, i);
        }

        public string Text, FirstSentence;
    }

    static void Main(string[] args)
    {
        int tmp = 0;
        int tmp1 = 0;
        string MultiParagraphString = @"AA.aa.

BB.bb.

CC.cc.

DD.dd.

EE.ee.";

        List<ParagraphInfo> Paragraphs = new List<ParagraphInfo>();

        Regex NewParagraphFinder = new Regex(@"[\n][\r]");
        MatchCollection NewParagraphMatches = NewParagraphFinder.Matches(MultiParagraphString);


        for (int i = 0; i < NewParagraphMatches.Count; i++)
        {
            if (i == 0)
            {
                Paragraphs.Add(new ParagraphInfo((MultiParagraphString.Substring(0, NewParagraphMatches[0].Index))));
            }
            else if (i == (NewParagraphMatches.Count - 1))
            {
                tmp = NewParagraphMatches[i].Index + 3;
                tmp1 = MultiParagraphString.Length - NewParagraphMatches[i].Index - 3;
                Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
            }
            else
            {
                tmp = NewParagraphMatches[i].Index + 3;
                tmp1 = NewParagraphMatches[i + 1].Index - NewParagraphMatches[i].Index+3;
                Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
            }
        }

        Console.WriteLine(MultiParagraphString);
        foreach (ParagraphInfo Paragraph in Paragraphs)
        {
            Console.WriteLine(Paragraph.Text);

        }


    }
}
}

when I printed each member of Paragraphs one after another alongside the entire text something rather bizarre came appeared. The output of the Paragraph list was this:

AA.aa.


CC.cc.

DD.


DD.dd.

EE.


EE.ee.


I can not understand why does this keep happening, and moreover I can not figure out why is the output so different each time.

Sorry if it's a mess but I really need some help here. BTW if anyone has a better idea to do it feel free to share..

Thanks


回答1:


You may try the following:

MultiParagraphString.Split(new [] {Environment.NewLine}, 
           StringSplitOptions.RemoveEmptyEntries);

That will return a IEnumerable<String>. If you want to transform them to your structures just use Select:

MultiParagraphString.Split(new [] {Environment.NewLine}, 
           StringSplitOptions.RemoveEmptyEntries)
          .Select(s => new ParagraphInfo(s)).ToList();



回答2:


 string text = richTextBox1.Text;

You can ignore paragraphs by using this:

text = text.Replace((char)10, ' ');

You can detect paragraps by using this:

string[] words = s.split('');
foreach (string word in words)
{
if (word.Contains((char)10))
{
MessageBox.Show("A paragraph is here (with brillant English accent)");
}

Notes: This codes works only when paragraphs are seperated by enter key on text.



来源:https://stackoverflow.com/questions/14564846/how-to-separate-paragraphs-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!