Regex Protein Digestion
问题 So, I'm digesting a protein sequence with an enzyme (for your curiosity, Asp-N) which cleaves before the proteins coded by B or D in a single-letter coded sequence. My actual analysis uses String#scan for the captures. I'm trying to figure out why the following regular expression doesn't digest it correctly... (\w*?)(?=[BD])|(.*\b) where the antecedent (.*\b) exists to capture the end of the sequence. For: MTMDKPSQYDKIEAELQDICNDVLELLDSKGDYFRYLSEVASGDN This should give something like: [MTM,