BigTable

hbase1.2.4安装

天涯浪子 提交于 2019-12-01 00:09:07
匹配的hadoop2.6.0,假设hadoop已经正确安装并启动,假设zookeeper已经正确安装并启动。 下载hbase cd /opt mkdir hbase wget http://apache.fayea.com/hbase/1.2.4/hbase-1.2.4-bin.tar.gz tar xvzf hbase-1.2.4-bin cd hbase-1.2.4-bin/conf cd到安装目录时,编辑hbase-env.sh,放开如下配置,设置为false表示使用外部zookeeper集群,hbase不进行管理(启动停止等) export HBASE_MANAGES_ZK=false 然后编辑hbase-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information *

Why Google's BigTable referred as a NoSQL database?

雨燕双飞 提交于 2019-11-30 10:30:32
From Wikipedia: Notable production implementations [of NoSQL databases] include Google's BigTable, Amazon's Dynamo and Cassandra. But Google's BigTable does have some variant of SQL, called GQL . What am I missing? NoSQL is an umbrella term for all the databases that are different from 'the standard' SQL databases, such as MySQL, Microsoft SQL Server and PostgreSQL. These 'standard' SQL databases are all relational databases, feature the SQL query language and adhere to the ACID properties . These properties basically boil down to consistency . A NoSQL database is different because it doesn't

大数据技术背景介绍(入门篇)

前提是你 提交于 2019-11-30 07:32:07
1、什么是大数据? 大数据(Big Data),指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合,是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。 大数据的5V特点(IBM提出): Volume(大量)——数据的大小决定所考虑的数据的价值和潜在的信息; Velocity(高速)——指获得数据的速度; Variety(多样)——指数据类型的多样性; Value(价值)——合理运用大数据,以低成本创造高价值; Veracity(真实性)——数据的质量; 2、大数据的意义 现在的社会是一个高速发展的社会,科技发达,信息流通,人们之间的交流越来越密切,生活也越来越方便,大数据就是这个高科技时代的产物。阿里巴巴创办人马云来台演讲中就提到,未来的时代将不是IT时代,而是DT的时代,DT就是Data Technology数据科技,显示大数据对于阿里巴巴集团来说举足轻重。 有人把数据比喻为蕴藏能量的煤矿。煤炭按照性质有焦煤、无烟煤、肥煤、贫煤等分类,而露天煤矿、深山煤矿的挖掘成本又不一样。与此类似,大数据并不在“大”,而在于“有用”。价值含量、挖掘成本比数量更为重要。对于很多行业而言,如何利用这些大规模数据是赢得竞争的关键。大数据的价值体现在以下几个方面: 对大量消费者提供产品或服务的企业可以利用大数据进行精准营

What's your approach for optimizing large tables (+1M rows) on SQL Server?

主宰稳场 提交于 2019-11-30 03:38:29
I'm importing Brazilian stock market data to a SQL Server database. Right now I have a table with price information from three kind of assets: stocks, options and forwards. I'm still in 2006 data and the table has over half million records. I have more 12 years of data to import so the table will exceed a million records for sure. Now, my first approach for optimization was to keep the data to a minimum size, so I reduced the row size to an average of 60 bytes, with the following columns: [Stock] [int] NOT NULL [Date] [smalldatetime] NOT NULL [Open] [smallmoney] NOT NULL [High] [smallmoney]

Tree structures in a nosql database

十年热恋 提交于 2019-11-29 21:56:44
I'm developing an application for Google App Engine which uses BigTable for its datastore. It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/ The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction. Imagine the following tree structure: Every number would be a paragraph. I want to be able

Is BigTable slow or am I dumb?

邮差的信 提交于 2019-11-29 21:31:31
I basically have the classic many to many model. A user, an award, and a "many-to-many" table mapping between users and awards. Each user has on the order of 400 awards and each award is given to about 1/2 the users. I want to iterate over all of the user's awards and sum up their points. In SQL it would be a table join between the many-to-many and then walk through each of the rows. On a decent machine with a MySQL instance, 400 rows should not be a big deal at all. On app engine I'm seeing around 10 seconds to do the sum. Most of the time being spent in Google's datastore. Here is the first

storing massive ordered time series data in bigtable derivatives

為{幸葍}努か 提交于 2019-11-29 18:48:52
I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are. I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every day (although these text files often compress by at least an order of magnitude). This data is basically a handful of numbers, two or three short strings and a timestamp (usually millisecond level). If I had to pick a unique identifier for each row, I would have to pick the whole row (since an exchange may generate multiple values for the same

What database does Google use?

南楼画角 提交于 2019-11-29 18:31:00
Is it Oracle or MySQL or something they have built themselves? splattne Bigtable A Distributed Storage System for Structured Data Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from

Why Google's BigTable referred as a NoSQL database?

给你一囗甜甜゛ 提交于 2019-11-29 15:46:04
问题 From Wikipedia: Notable production implementations [of NoSQL databases] include Google's BigTable, Amazon's Dynamo and Cassandra. But Google's BigTable does have some variant of SQL, called GQL. What am I missing? 回答1: NoSQL is an umbrella term for all the databases that are different from 'the standard' SQL databases, such as MySQL, Microsoft SQL Server and PostgreSQL. These 'standard' SQL databases are all relational databases, feature the SQL query language and adhere to the ACID

Database design - google app engine

房东的猫 提交于 2019-11-29 10:28:17
问题 I am working with google app engine and using the low leval java api to access Big Table. I'm building a SAAS application with 4 layers: Client web browser RESTful resources layer Business layer Data access layer I'm building an application to help manage my mobile auto detailing company (and others like it). I have to represent these four separate concepts, but am unsure if my current plan is a good one: Appointments Line Items Invoices Payments Appointment: An "Appointment" is a place and