Parsing street addresses in Ruby

前端 未结 5 1853
萌比男神i
萌比男神i 2021-02-09 18:08

I am processing addresses into their respective field format for the database. I can get the house number out and the street type but trying to determine best method to get the

5条回答
  •  花落未央
    2021-02-09 18:30

    Carefully check your dataset to make sure if this problem hasn't already been handled for you.

    I spent a fair amount of time first creating a taxonomy of probably street name ending, using regexp conditionals to try to pluck out the street number from the full address strings and everything and it turned out that the attributes table for my shapefiles had already segmented out these components.

    Before you go forward with the process of parsing address strings, which is always a bit of a chore due to the inevitably strange variations (some parcel addresses are for landlocked parcels and have weird addresses, etc), make sure your dataset hasn't already done this for you!!!


    but if you don't, run through the address strings, address.split(" ") creates an array of 'words'. In most cases the first "word" is the street number. That worked for about 95% of my addresses. (NOTE: my :address strings did not contain city, county, state, zip, they were only the local addresses)

    I ran through the entire population of addresses and plucked the last "word" from each address & examined this array & plucked out any "words" that were not "Lane", "Road", "Rd" or whatever. From this list of address endings I created this huge matching regexp object

    streetnm_endings = street_endings.map {|s| /#{s}/ }
    endings_matches = Regexp.union(street_endings)
    

    I ran through each address string, shift-ing out the first array member because, again that was the almost always the street number. And then gsub'd out the street endings to get what should be the street name sans street number or street name endings, which databases do not like generally:

    parcels.each do |p|
      remainder = p.address.split(" ")
      p.streetnum = remainder.shift
      p.streetname = remainder.join(" ").gsub(endings_matches, "")
      p.save
    end
    

    It didn't always work but it worked most of the time.

提交回复
热议问题