What databases do the World Wide Web's biggest sites run on? [closed]

ぃ、小莉子 提交于 2020-04-05 07:03:23

问题


This question is meant to serve as a list of databases and their configurations that the major web sites use and would be a great reference for anyone thinking of scaling their web site to the size of Twitter, Facebook or even Google.

Please keep your answers to a minimum and be sure to cite any sources used.

EDIT:

Also, please bold both the web-site name and the database for easier scanning.


回答1:


Facebook.com

  • MySQL with MyRocks. Used to store user info and social activities such as likes, comments, and shares.
  • Hive (Data warehouse for Hadoop, supports tables and a variant of SQL called hiveQL). Used for "simple summarization jobs, business intelligence and machine learning and many other applications"
  • Cassandra (Multi-dimensional, distributed key-value store). Currently used for Facebook's private messaging.

Currently running 610 (soon to be 1000) Hadoop nodes in a single cluster with Hive datastore. Both Hive and Cassandra have been open-sourced by Facebook.

Facebook stats:

  • More than 200 million active users
  • More than 100 million users log on to Facebook at least once each day
  • More than 30 million users update their statuses at least once each day
  • Average user has 120 friends on the site

Sources:

  • http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/
  • http://www.facebook.com/note.php?note_id=89508453919
  • http://www.facebook.com/press/info.php?statistics
  • http://hadoop.apache.org/hive/
  • http://wiki.apache.org/hadoop/Hive/Design
  • http://www.facebook.com/note.php?note_id=24413138919
  • https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database



回答2:


Stack Overflow - SQL Server.

Jeff Atwood wrote a nice blog post on this

https://blog.stackoverflow.com/2008/09/what-was-stack-overflow-built-with/




回答3:


LinkedIn.com

  • Oracle (Relational Database)
  • MySQL (Relational Database)

Databases replicated on multiple servers for high availability. Each specific Service uses its own domain-specific DB.

LinkedIn stats:

  • 22 million members
  • 4+ million unique visitors/month
  • 40 million page views/day
  • 2 million searches/day

Sources:

  • http://hurvitz.org/blog/2008/06/linkedin-architecture/



回答4:


Flickr uses MySQL.

YouTube uses MySQL but they are moving to Google's BigTable.

Myspace uses SQL Server.

Wikipedia uses MySQL.




回答5:


Microsoft.com

  • SQL Server (no surprise there)

Microsoft.com stats:

  • 250 million unique visits/month.
  • 70 million page views/day.
  • 15,000 connections/second.
  • Maintains an average of 35,000 concurrent connections to a total of 80 Web servers.

Sources:

  • http://technet.microsoft.com/en-us/mscomops/default.aspx



回答6:


Yahoo.com

  • PostgreSQL (modified) - A client can connect to any of the nodes in the cluster (or a policy restricted subset). A query flows from the client to the server it chose to connect with. The SQL compiler on that node compiles and optimizes the query on that single node (no parallelism).

Yahoo.com stats:

  • 24 billion events a day
  • 2-petabyte, claims largest database (Mar 2008)

Source:

  • http://perspectives.mvdirona.com/2008/05/23/PetascaleSQLDBAtYahoo.aspx
  • http://www.computerworld.com/s/article/9087918/Size_matters_Yahoo_claims_2_petabyte_database_is_world_s_biggest_busiest



回答7:


Twitter.com

  • MySQL (Relational Database).
  • Cassandra (Multi-dimensional, distributed key-value store). Twitter is just "beginning to use Cassandra at Twitter" (see second source).

In May 2008, Twitter had 1 MySQL instance for writes with multiple MySQL slave instances for reads.

Twitter stats:

  • Total Users: 1+ million
  • Total Active Users: 200,000 per week
  • Total Twitter Messages: 3 million/day
  • 5% of Twitter users account for 75% of all activity
  • 72.5% of all users joining during the first five months of 2009

Sources:

  • http://blog.twitter.com/2008/05/its-not-rocket-science-but-its-our-work.html
  • http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
  • http://www.sysomos.com/insidetwitter/
  • http://www.techcrunch.com/2008/04/29/end-of-speculation-the-real-twitter-usage-numbers/



回答8:


Digg

  • MySQL (Relational Database) for scaling out reads
  • MemcacheDB (Key-Value Store) for scaling out writes

Both data stores are distributed across multiple servers.

Digg stats:

  • 30M users
  • 26M uniques per month
  • 2 billion requests a month
  • 13,000 requests a second, peak at 27,000 requests a second.

Sources:

  • http://www.krisjordan.com/2008/09/18/joe-stump-scaling-digg-and-other-web-applications/
  • http://highscalability.com/scaling-digg-and-other-web-applications



回答9:


Google uses BigTable: http://research.google.com/archive/bigtable.html




回答10:


PlentyOfFish.com using Microsoft SQL Server:

https://blog.codinghorror.com/scaling-up-vs-scaling-out-hidden-costs/



来源:https://stackoverflow.com/questions/1113381/what-databases-do-the-world-wide-webs-biggest-sites-run-on

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!