SQL performance searching for long strings

半城伤御伤魂 提交于 2019-12-01 17:54:21

Your idea of hashing long strings to create a token upon which to lookup within a store (cache, or database) is a good one. I have seen this done for extremely large strings, and within high volume environments, and it works great.

"Which hash would you use for this application?"

  • I don't think the encryption (hashing) algorithm really matters, as you are not hashing to encrypt data, you are hashing to create a token upon which to use as a key to look up longer values. So the choice of hashing algorithm should be based off of speed.

"Would you compute the hash in code or let the db handle it?"

  • If it were my project, I would do the hashing at the app layer and then pass it through to look up within the store (cache, then database).

"Is there a radically different approach for storing/searching long strings in a database?"

  • As I mentioned, I think for your specific purpose, your proposed solution is a good one.

Table recommendations (demonstrative only):

user

  • id int(11) unsigned not null
  • name_first varchar(100) not null

user_agent_history

  • user_id int(11) unsigned not null
  • agent_hash varchar(255) not null

agent

  • agent_hash varchar(255) not null
  • browser varchar(100) not null
  • agent text not null

Few notes on schema:

  • From your OP it sounds like you need a M:M relationship between user and agent, due to the fact that a user may be using Firefox from work, but then may switch to IE9 at home. Hence the need for the pivot table.

  • The varchar(255) used for agent_hash is up for debate. MySQL suggests using a varbinary column type for storing hashes, of which there are several types.

  • I would also suggest either making agent_hash a primary key, or at the very least, adding a UNIQUE constraint to the column.

Your hash idea is sound. I've actually used hashing to speed up some searches on millions of records. A hash index will be quicker since each entry is the same size. md5 will likely be fine in your case and will probably give you the shortest hash length. If you are worried about hash collisions, you can add include the length of the agent string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!