Splitting string to parts with specific pattern and conditions

百般思念 提交于 2020-01-03 16:58:47

问题


I have the below-like array of about 5k+ strings as output from certain application (for security reasons I may not provide the exact data, but the example format is pretty much similar to the actual data):

kasdfhkasdhfaskdfj42345sdsadkfdkfhasdf5345534askfhsad
asdfasdf66sdafsdfsdf4560sdfasdfasdf
sdfaasdfs96sadfasdf65459asdfasdf
sadfasdf8asdfasdas06666654asdfasdfsd
fasdjfsdjfhgasdf6456sadfasdfasdf9sdfasdfsadf

Simply, I have non-breaking alphanumeric string that consists of 5 parts:

[latin letters][1 or more digits][latin letters][1 or more digits][latin letters]

Length of letter parts, as well as amount of digits is random, overall string length may vary from several to 2-3 hundreds of chars, but the pattern is still as above.

Practically I'm interested in leading and trailing string parts, i.e. [1 or more digits][latin letters][1 or more digits] may be just thrown away, but 2 other strings should be extracted to separate cells.

I tried SUBSTITUTE and SEARCH functions, but I still may not handle random amount of digits. VBA is the last desired approach, however it is acceptable in case pure formulas are useless. Moreover, the solution should be flexible for possible future use with similar patterns - so any right guidance / general approach will be appreciated.


回答1:


If you don't mind using MS Word instead of Excel - there's a very straightforward approach for such tasks which involves built-in Search and Replace routine using wildcards. Assuming data may be opened in Word, do the following:

  1. Press CTRL+H for Replace dialog opening.
  2. Tick Use wildcards option.
  3. The part of your data you want to throw away match to the following pattern: [0-9]{1,}*[0-9]{1,} - which means any digit 1 or more times with any chars between. Depending on your regional settings you'll need ; instead of , here.
  4. Specify as a replacement any char you like, e.g. ^t (Tab) or ; - for further parts splitting.
  5. Perform replacement.
  6. Optionally you may convert the rest to table using Ribbon Insert > Table > Convert Text to Table... feature.

All you need now is to save / paste the result obtained.

Actually, the approach is quite powerful, and many routine text data parsing tasks similar to your may be quickly done without special skills and/or programming. And you don't need any 3rd party tool for this - every PC has Word installed nowadays.

Read more about patterns and applicable cases:

  • Find and Replace using wildcards
  • Finding and replacing characters using wildcards



回答2:


based on this tutorial from the great chandoo (who you should follow if you want to be awesome in excel:

use this formula (notice an array formula, you need to enter it with ENTER+SHIFT+CTRL) to extract

{=MIN(IFERROR(FIND(lstNumbers,G6),""))}

where lstNumbers is a named range in the sheets with cells containing 0-9 (each number in a cell) and e1 the cell containing the data.

this would return the first number and then you could extract the first section with:

=LEFT(E1,G1-1)

where e1 contains the data and g1 the previous formula

to get the end of the numeric section you use:

{=MAX(IFERROR(FIND(lstNumbers,E1),""))}

then you can use mid to extract the numeric section and using len(datacell)- len(from max function) to extract with right (or mid) the rest of the string. where we'll use the same treatment-getting the first number with min, the last with max etc

good luck! this is a real hardone, doing this with a real programming language would be easier perhaps




回答3:


UPDATED:

This array formula will give you the first string portion:

  =LEFT(A1,MATCH(0,1*ISERROR(1*MID(A1,ROW(INDIRECT("$A1:$A"&LEN(A1))),1)),0)-1)

This array formula will give you the last string portion:

  =RIGHT(A1,MATCH(0,1*ISERROR(1*MID(A1,LEN(A1)+1-ROW(INDIRECT("$A1:$A"&LEN(A1))),1)),0)-1)


来源:https://stackoverflow.com/questions/14880438/splitting-string-to-parts-with-specific-pattern-and-conditions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!