regex

Parsing Interview Text

社会主义新天地 提交于 2021-02-20 04:14:07
问题 I have a text file of a presidential debate. Eventually, I want to parse the text into a dataframe where each row is a statement, with one column with the speaker's name and another column with the statement. For example: "Bob Smith: Hi Steve. How are you doing? Steve Brown: Hi Bob. I'm doing well!" Would become: name text 1 Bob Smith Hi Steve. How are you doing? 2 Steve Brown Hi Bob. I'm doing well! Question: How do I split the statements from the names? I tried splitting on the colon: data

python regex to find accented words

不羁岁月 提交于 2021-02-20 04:08:23
问题 Please I need help. I've got a problem when trying to find accented words in a text (in Spanish). I have to search in a large text the first paragraph starting with the words 'Nombre vernáculo' For example, the text is like: "Nombre vernáculo registrado en la zona de ..." But accented words are not recoginzed by my python script. I've tryed with: re.compile('/(?<!\p{L})(vern[áa]culo*)(?!\p{L})/') re.compile(r'Nombre vern[a\xc3\xa1]culo\.', re.UNICODE) re.compile ('[A-Z][a-záéíóúñ]+') \p{Lu}]

python regex to find accented words

风流意气都作罢 提交于 2021-02-20 04:07:49
问题 Please I need help. I've got a problem when trying to find accented words in a text (in Spanish). I have to search in a large text the first paragraph starting with the words 'Nombre vernáculo' For example, the text is like: "Nombre vernáculo registrado en la zona de ..." But accented words are not recoginzed by my python script. I've tryed with: re.compile('/(?<!\p{L})(vern[áa]culo*)(?!\p{L})/') re.compile(r'Nombre vern[a\xc3\xa1]culo\.', re.UNICODE) re.compile ('[A-Z][a-záéíóúñ]+') \p{Lu}]

Calculate the string length in sed

心不动则不痛 提交于 2021-02-20 02:40:32
问题 I was forced to calculate the string length in sed . The string is always a nonempty sequence of a 's. sed -n ':c /a/! be; s/^a/1/; s/0a/1/; s/1a/2/; s/2a/3/; s/3a/4/; s/4a/5/; s/5a/6/; s/6a/7/; s/7a/8/; s/8a/9/; s/9a/a0/; /a/ bc; :e p' It's quite long :) So now I wonder if it is possible to rewrite this script more concisely using the y or other sed command? I know that it is better to use awk or another tool. However, this is not a question here. Note that the sed script basically simulates

Calculate the string length in sed

Deadly 提交于 2021-02-20 02:36:29
问题 I was forced to calculate the string length in sed . The string is always a nonempty sequence of a 's. sed -n ':c /a/! be; s/^a/1/; s/0a/1/; s/1a/2/; s/2a/3/; s/3a/4/; s/4a/5/; s/5a/6/; s/6a/7/; s/7a/8/; s/8a/9/; s/9a/a0/; /a/ bc; :e p' It's quite long :) So now I wonder if it is possible to rewrite this script more concisely using the y or other sed command? I know that it is better to use awk or another tool. However, this is not a question here. Note that the sed script basically simulates

RegEx to find credit card numbers with embedded spaces

喜欢而已 提交于 2021-02-20 00:43:46
问题 We currently have a content compliance in place where by we monitor anything that contains a credit card number with no spaces (e.g 5100080000000000 ) What we need is for a reg ex to pick up credit card numbers that are entered with spaces every 4 digits (eg: 5100 0800 0000 0000 ) We've been looking at alternate reg exs but have not yet found one that works for both scenarios mentioned above. The current reg ex we use is below ^((4\d{3})|(5[1-5]\d{2})|(6011)|(34\d{1})|(37\d{1}))-?\d{4}-?\d{4}

Python Regex for Words & single space

不想你离开。 提交于 2021-02-19 22:21:07
问题 I am using re.sub in order to forcibly convert a "bad" string into a "valid" string via regex. I am struggling with creating the right regex that will parse a string and "remove the bad parts". Specifically, I would like to force a string to be all alphabetical, and allow for a single space between words. Any values that disagree with this rule I would like to substitute with ''. This includes multiple spaces. Any help would be appreciated! import re list_of_strings = ["3He2l2lo Wo45rld!",

Word文档开发处理工具Aspose.Words v21.2发布!(含新功能演示)

旧街凉风 提交于 2021-02-19 12:04:55
Aspose.Words for .Net是一种高级Word文档处理API,用于执行各种文档管理和操作任务。API支持生成,修改,转换,呈现和打印文档,而无需在跨平台应用程序中直接使用Microsoft Word。2021 年2月更新来啦,.NET版Aspose.Words更新至v21.2新版本! 主要特点如下: 实现了API以操纵Font对象的主题属性。 添加了在保存时更新CreatedTime属性的选项。 使用新的CustomTimeZoneInfo选项扩展了SaveOptions。 使用新的SmartParagraphBreakReplacement选项扩展了FindReplaceOptions类。 提供了从COM应用程序中的IStream对象加载文档的功能。 >>你可以 下载 Aspose.Words for .NET v21.2测试体验。 具体更新内容 关键 概括 类别 WORDSNET-21363 支持为LINQ Reporting Engine动态添加组合框和下拉列表项 新功能 WORDSNET-6146 允许从OLE对象提取可见的纯文本 新功能 WORDSNET11848 添加保存选项以模仿MS Word行为或不模仿创建,修改和打印日期 新功能 WORDSNET-6125 添加选项以将文档中的图像导出为SVG格式的HTML 新功能 WORDSNET-10148

How to query text-nodes from DOM, find markdown-patterns, replace matches with HTML-markup and replace the original text-node with the new content?

百般思念 提交于 2021-02-19 09:22:12
问题 Markdown-like functionality for tooltips Problem: Using Vanilla JavaScript I want to: Change this: <div> <p> Hello [world]{big round planet we live on}, how is it [going]{verb that means walking}? </p> <p> It is [fine]{a word that expresses gratitude}. </p> </div> To this: <div> <p> Hello <mark data-toggle="tooltip" data-placement="top" title="big round planet we live on">world</mark>, how is it <mark data-toggle="tooltip" data-placement="top" title="verb means walking">world</mark>? </p> <p>

Parse measurements (multiple dimensions) from a given string in Python 3

只愿长相守 提交于 2021-02-19 08:30:06
问题 I'm aware of this post and this library but they didn't help me with these specific cases below. How can I parse measurements like below: I have strings like below; "Square 10 x 3 x 5 mm" "Round 23/22; 24,9 x 12,2 x 12,3" "Square 10x2" "Straight 10x2mm" I'm looking for a Python package or some way to get results like below; >>> a = amazing_parser.parse("Square 10 x 3 x 5 mm") >>> print(a) 10 x 3 x 5 mm Likewise; >>> a = amazing_parser.parse("Round 23/22; 24,9x12,2") >>> print(a) 24,9 x 12,2 I