RegEx to match Bitcoin addresses?

前端 未结 8 2201
深忆病人
深忆病人 2020-12-14 06:44

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:

A Bitcoin address, or simply address, is an iden

相关标签:
8条回答
  • 2020-12-14 07:12
    ^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
    

    Based on the new address type Bech32

    0 讨论(0)
  • 2020-12-14 07:19

    Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:

    \bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b

    Here are some other links where I found infos:

    • https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
    • http://r6.ca/blog/20180106T164028Z.html
    0 讨论(0)
  • 2020-12-14 07:20

    As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.

    These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).

    Here are my test cases:

    --------------------------------------------------------
    BitCoin blackmail formats observed (my org and online):
    --------------------------------------------------------
    BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4 
    BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
    BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
    BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
    BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
    
    Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
    Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
    
    --------------------------------------------------------
    Other possible BitCoin test cases I added:
    --------------------------------------------------------
    - What if text comes before and/or after on same line?  Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
        Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
        1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
        Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
    
    - Standalone address:
        1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
    
    --------------------------------------------------------
    Redacted Body content generating FPs from spam emails:
    --------------------------------------------------------
    src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
    
    "cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
    
    src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg" 
    
    src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
    
    href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
    

    Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):

    ^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
    [13][a-km-zA-HJ-NP-Z1-9]{25,34}$
        (Too narrow and misses BitCoin addresses within a paragraph)
    
    (bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
        (Still misses text after BTC on same line and triples execution time)
    
    \W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
        (Too broad and catches URL formats)
    

    The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):

    [13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
    

    One reference point for execution times (shows cost in steps and time): https://regex101.com/

    Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.

    Seth

    0 讨论(0)
  • 2020-12-14 07:21

    Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:

    \b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
    

    Including testnet address:

    \b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
    

    Only testnet:

    \b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
    
    0 讨论(0)
  • 2020-12-14 07:24

    [^OIl] matches any character that's not O, I or l. The problems in your regex are:

    • You don't have a $ at the end, so it'd match any string beginning with a BC address.
    • You didn't count the first character in your {27,34} - that should be {26,33}

    However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

    0 讨论(0)
  • 2020-12-14 07:24
    ^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
    

    will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).

    0 讨论(0)
提交回复
热议问题