Look for unique ID pattern which easy indexed by search engines

筅森魡賤 提交于 2019-12-02 02:34:52

问题


Like from Microsoft - "KB2756872" or from National Vulnerability Database - "CVE-2010-1428" or from Red Hat - "RHSA-2010:0376" or from OIDs - "1.3.6.1.4.1.311" or from UUID/GUID - "550e8400-e29b-41d4-a716-446655440000".

I want to put several jobs to UIDs. See next...

I develop blog software and have idea to put unique ID in body of each post so can easily identify that copy from local storage is correspond to remote published copy.

Also I want to post to many different blogging services so if one is down articles will be accessible from another. So link can dead but if I add UID - anyone can try web-search to find post on another service!

Also this allow to gather some article spreading statistics. Many sites just replicate content (copy-writing and rewriting bots and people) to broke search engines. With UID I easily can identify such sites...

So my question how is to make UIDs (in which form) so it would be easily indexed by search engines (web, like Google/Yahoo, and corporate, like Lucene/Solr/Sphinx/Xapian/etc).

I know about some limitation of search engine like:

  • only >= 3 chars for each search part
  • it was not indexed dust like gfh6wytrh6wu56he5gahj763

so this task s not easy...

Any advice is appreciated (books/blog articles/etc).


回答1:


You could use Tag URIs, as defined by RFC 4151.

They are globally unique, and everyone who owned a domain name or an email address for at least a day can mint them.

Note that these URIs only identify, they don’t locate. So a Tag URI doesn’t say anything about where something is published.

Let’s say your site’s domain is "example.com". If you create a blog post, you could create the following Tag URI:

tag:example.com,2012-12:cute-cat

Note that the date in this URI is not a publication date! It must be a (past) date on which you owned the domain (resp. email address). If you registered your domain in 2003, you could always use Tag URIs starting with tag:example.com,2004: (not "2003", because "2003" would mean "2003-01-01", which might be a time where you didn’t own the domain yet), followed by a (unique) string under your control. However, if you like you could always use the publication date, of course. But don’t use future dates.




回答2:


You can use year and number based article identifier just like CVE identifiers. Since you need revisions as well, you can append dot after the identifier to clarify the version. For example, for an AWesome Blog Service, AWBS-2012-1.0 would refer to original document, AWBS-2012-1.1 would refer to first revision etc.

However, you need to make sure that AWBSs are unique before you use them. CVEs are assigned manually from the pool. You would probably need some kind of service that assigns AWBS from pool. It could be a simple database query.



来源:https://stackoverflow.com/questions/13904733/look-for-unique-id-pattern-which-easy-indexed-by-search-engines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!