How can I split a string at the first occurrence of a letter in Python?

前端未结

关注

 4  924

A have a series of strings in the following format. Demonstration examples would look like this:

71 1 * abwhf

8 askg


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-12-20 01:44
              
            
            
                                                                       
sample1 = '71 1 * abwhf'
sample2 = '8 askg'
sample3 = '*14 snbsb'
sample4 = '00ab'
sample5 = '1234'

def split_at_first_letter(txt):
    for value in txt:
        if value.isalpha():
            result = txt.split(value, 1)
            return [result[0], '{}{}'.format(value, result[1], )]

    return [txt]

print(split_at_first_letter(sample1))
print(split_at_first_letter(sample2))
print(split_at_first_letter(sample3))
print(split_at_first_letter(sample4))
print(split_at_first_letter(sample5))


Result

['71 1 * ', 'abwhf']
['8 ', 'askg']
['*14 ', 'snbsb']
['00', 'ab']
['1234']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-12-20 01:48
              
            
            
                                                                       
Use re.split()

import re

strings = [
    "71 1 * abwhf",
    "8 askg",
    "*14 snbsb",
    "00ab",
]

for string in strings:
    a, b, c = re.split(r"([a-z])", string, 1, flags=re.I)
    print(repr(a), repr(b + c))


Produces:

'71 1 * ' 'abwhf'
'8 ' 'askg'
'*14 ' 'snbsb'
'00' 'ab'


The trick here is we're splitting on any letter but only asking for a single split.  By putting the pattern in parentheses, we save the split character which would normally be lost.  We then add the split character back onto the front of the second string.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-20 01:56
              
            
            
                                                                       
The only way I can think of is to write the function yourself:

import string

def split_letters(old_string):
    index = -1
    for i, char in enumerate(old_string):
        if char in string.letters:
            index = i
            break
    else:
        raise ValueError("No letters found") # or return old_string
    return [old_string[:index], old_string[index:]]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-20 02:08
              
            
            
                                                                       
Using re.search:

import re

strs = ["71 1 * abwhf", "8 askg", "*14 snbsb", "00ab"]


def split_on_letter(s):
    match = re.compile("[^\W\d]").search(s)
    return [s[:match.start()], s[match.start():]]


for s in strs:
    print split_on_letter(s)


The regex [^\W\d] matches all alphabetical characters.

\W matches all non-alphanumeric characters and \d matches all numeric characters. ^ at the beginning of the set inverts the selection to match everything that is not (non-alphanumeric or numeric), which corresponds to all letters.

match searches the string to find the index of the first occurrence of the matching expression. You can slice the original string based on the location of the match to get two lists.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复