How to access and replace text in certain paragraphs using OPENXML powertools case by case

牧云@^-^@ 提交于 2019-12-11 11:11:56

问题


I am trying to redact some word files using c# and openxml. I need to do controlled replace of the numbers with certain phrase. Each word file contains different amount of info. I want to use OPENXML powertools for this purspose.

I used normal openxml method to replace but it very unreliable and gets random errors such as zero length error.I used regex replace and that seems to work but it replaces it through out the document which is highly undesirable.

Here is some snippet of the code :

private void redact_Replaceall(string wfile)
        {
            try
            {
                using (WordprocessingDocument doc = WordprocessingDocument.Open(wfile, true))
                {
                    var ydoc = doc.MainDocumentPart.GetXDocument();
                    IEnumerable<XElement> content = ydoc.Descendants(W.body);



                    Regex regex = new Regex(@"\d+\.\d{2,3}");
                    int count1 = OpenXmlPowerTools.OpenXmlRegex.Match(content, regex);


                    int count2 = OpenXmlPowerTools.OpenXmlRegex.Replace(content, regex, replace_text, null);

                    statusBar1.Text = "Try 1: Found: " + count1 + ", Replaced: " + count2;


                    doc.MainDocumentPart.PutXDocument();

                }
            }
            catch(Exception e)
            {
                MessageBox.Show("Replace all exprienced error: " + e.Message);
            }

        }

Basically, I want to do this redaction based on content of paragraph. I am able to get the paragraphs using but not the id's

IEnumerable<XElement> content = ydoc.Descendants(W.p);

Here is my approach using the normal openxml method but I get alot of errors depending on the file.

  foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph para in bod.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
                                    {

                                        foreach (var run in para.Elements<Run>())
                                        {
                                            foreach (var text in run.Elements<Text>())
                                            {
                                                string temp = text.Text;
                                                int firstlength = first.Length + 1;
                                                int secondlength = second.Length + 1;
                                                if (text.Text.Contains(first) && !(temp.Length > firstlength))
                                                {
                                                    text.Text = text.Text.Replace(first, "DELETED");

                                                }

                                                if (text.Text.Contains(second) && !(temp.Length > secondlength))
                                                {
                                                    text.Text = text.Text.Replace(second, "DELETED");

                                                }
                                            }
                                        }
                                    }

Here is the last new approach but I am stuck on it

   private void redact_Replacebadones(string wfile)
        {
            try
            {
                using (WordprocessingDocument doc = WordprocessingDocument.Open(wfile, true))
                {
                    var ydoc = doc.MainDocumentPart.GetXDocument();
                  /*  from XElement xele in ydoc.Root.Elements();
                    List<string> lhsElements = xele.Elements("lhs")
                               .Select(el => el.Attribute("id").Value)
                               .ToList();
                               */
                    /// XElement
                    IEnumerable<XElement> content = ydoc.Descendants(W.p);

                   foreach (var p in content )

                    {
                        if (p.Value.Contains("each") && !p.Value.Contains("DELETED"))
                        {

                            string to_overwrite = p.Value;
                            Regex regexop = new Regex(@"\d+\.\d{2,3}");

                            regexop.Replace(to_overwrite, "Deleted");

                            p.SetValue(to_overwrite);

                            MessageBox.Show("NAME :" + p.GetParagraphInfo() +" VValue:"+to_overwrite);
                        }

                    }


                    doc.MainDocumentPart.PutXDocument();

                }
            }
            catch (Exception e)
            {
                MessageBox.Show("Replace each exprienced error: " + e.Message);
            }

        } 

来源:https://stackoverflow.com/questions/37036159/how-to-access-and-replace-text-in-certain-paragraphs-using-openxml-powertools-ca

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!