Normalize data according to business entity (Legal name, class of business, DNS domain, company type) [closed]

99封情书 提交于 2019-12-19 05:09:14

问题


I'm trying to normalize data and link records according to legal business entity name.

Where can I determine the legal business name, and general information about that company? I will have at least one of the following: Stock symbol, DBA (short name), dns name, or full legal name.

So far I've discovered that with the

  • Relying on whois gives me private, or out of date information
  • Wolfram Alpha API gives me most of what I need for public companies but nothing helpful for private companies like LEGO
  • Parsing webpages for the (c) symbol may help in the resolution process, but doesn't match a name to an authoritative list.

Since all stock symbols are categorized; that one is easy.

How can I convert, normalize, and verify DBA (short name), dns name, or full legal name for non-public or non profit businesses that may even be located overseas?

(e.g. MET Museum as DBA, or metmuseum.org as site, or "Metropolitan Museum of Art" Legal name)


回答1:


I'm not sure this is the best place to ask your question. Maybe your local librarian could help. Anyway, I'm answering because I've done a lot of work along these lines in the past, and because I've found that programmers and database designers often know where to find data--especially authoritative and standard data.

At the local level (in the USA), we accepted whatever the local Chamber of Commerce gave us. At the national level, we bought lists from InfoUSA. Chamber of Commerce data can be pretty flaky; InfoUSA data is very clean.

Dun & Bradstreet is the closest I know of to a one-stop global business registry. They're not cheap.

RBA, a company in the UK, seems to have a really useful introduction with a global perspective. See Official Company Registers. Much of the data there is free.




回答2:


I have been doing some research in this area and found a recent paper which discusses an approach to extract, discover (via clustering) and normalize (by an enhanced edit-distance calculation) organization names. NEMO



来源:https://stackoverflow.com/questions/4835318/normalize-data-according-to-business-entity-legal-name-class-of-business-dns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!