BigTable

What's your approach for optimizing large tables (+1M rows) on SQL Server?

[亡魂溺海] 提交于 2019-11-29 00:42:13
问题 I'm importing Brazilian stock market data to a SQL Server database. Right now I have a table with price information from three kind of assets: stocks, options and forwards. I'm still in 2006 data and the table has over half million records. I have more 12 years of data to import so the table will exceed a million records for sure. Now, my first approach for optimization was to keep the data to a minimum size, so I reduced the row size to an average of 60 bytes, with the following columns:

Pro's of databases like BigTable, SimpleDB

穿精又带淫゛_ 提交于 2019-11-28 16:52:36
问题 New school datastore paradigms like Google BigTable and Amazon SimpleDB are specifically designed for scalability, among other things. Basically, disallowing joins and denormalization are the ways this is being accomplished. In this topic, however, the consensus seems to be that joins on large tables don't necessarilly have to be too expensive and denormalization is "overrated" to some extent Why, then, do these aforementioned systems disallow joins and force everything together in a single

storing massive ordered time series data in bigtable derivatives

[亡魂溺海] 提交于 2019-11-28 13:46:40
问题 I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are. I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every day (although these text files often compress by at least an order of magnitude). This data is basically a handful of numbers, two or three short strings and a timestamp (usually millisecond level). If I had to pick a unique identifier for each row

What database does Google use?

牧云@^-^@ 提交于 2019-11-28 13:06:53
问题 Is it Oracle or MySQL or something they have built themselves? 回答1: Bigtable A Distributed Storage System for Structured Data Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in

App Engine BadValueError On Bulk Data Upload - TextProperty being construed as StringProperty

倾然丶 夕夏残阳落幕 提交于 2019-11-28 09:30:43
bulkoader.yaml: transformers: - kind: ExampleModel connector: csv property_map: - property: __key__ external_name: key export_transform: transform.key_id_or_name_as_string - property: data external_name: data - property: type external_name: type model.py: class ExampleModel(db.Model): data = db.TextProperty(required=True) type = db.StringProperty(required=True) Everything seems to be fine, yet when I upload I get this error: BadValueError: Property data is 24788 bytes long; it must be 500 or less. Consider Text instead, which can store strings of any length. For some reason, it thinks data is

Google Bigtable vs BigQuery for storing large number of events

风流意气都作罢 提交于 2019-11-27 21:36:56
问题 Background We'd like to store our immutable events in a (preferably) managed service. Average size of one event is less than 1 Kb and we have between 1-5 events per second. The main reason for storing these events is to be able to replay them (perhaps using table scanning) once we create future services that might be interested in these events. Since we're in the Google Cloud we're obviously looking at Google's services as first choice. I suspect that Bigtable would be a good fit for this but

表格存储TableStore2.0重磅发布,提供更强大数据管理能力

限于喜欢 提交于 2019-11-27 09:52:31
表格存储TableStore是阿里云自研的面向海量结构化和半结构化数据存储的Serverless NoSQL多模型数据库,被广泛用于社交、物联网、人工智能、元数据和大数据等业务场景。表格存储TableStore采用与Google Bigtable类似的宽表模型,天然的分布式架构,能支撑高吞吐的数据写入以及PB级数据存储。 原生的宽表数据模型,存在一些天然的缺陷,例如无法很好的支持属性列的多条件组合查询,或者更高级的全文检索或空间检索。另外在与计算系统的对接上,特别是流计算场景,传统的大数据Lambda架构,需要用户维护多套存储和计算系统,没法很天然的支持数据在存储和计算系统之间的流转。以上这些问题,均在表格存储TableStore在支持阿里巴巴集团内、阿里云公共云以及专有云等业务中逐渐暴露出来。 表格存储TableStore简单可靠的数据模型和架构,开始承担越来越丰富的不同类型的数据存储,例如时序时空数据、元数据、消息数据、用户行为数据和轨迹溯源数据等。越来越多的客户也开始把表格存储TableStore当做一个统一的在线大数据存储平台,所以我们迫切需要支持海量数据中对数据的高效查询、分析和检索。同时也需要考虑如何更贴近业务,抽象出更贴近业务的数据模型,让数据的接入变得更加简单。 在2019年3月6日的阿里云新品发布会上,表格存储TableStore对以下几个方面做了重大升级:

App Engine BadValueError On Bulk Data Upload - TextProperty being construed as StringProperty

浪尽此生 提交于 2019-11-27 03:05:30
问题 bulkoader.yaml: transformers: - kind: ExampleModel connector: csv property_map: - property: __key__ external_name: key export_transform: transform.key_id_or_name_as_string - property: data external_name: data - property: type external_name: type model.py: class ExampleModel(db.Model): data = db.TextProperty(required=True) type = db.StringProperty(required=True) Everything seems to be fine, yet when I upload I get this error: BadValueError: Property data is 24788 bytes long; it must be 500 or

当 Spring Cloud 遇上 SOFAStack | Meetup#2 回顾

老子叫甜甜 提交于 2019-11-26 19:27:17
本文作者:玄北(曹杰),蚂蚁金服 SOFAStack 开源组核心成员。 本文根据 5月26日 SOFA Meetup#2 上海站 《当 Spring Cloud 遇上 SOFAStack》主题分享整理,主要来聊聊 spring-cloud-antfin 包含的主要特性及如何使用 SOFAStack 和 SpringCloud 快读构建微服务系统。 现场回顾视频以及 PPT 见文末链接。 概念 Spring Cloud 是 Spring 社区开源的一套微服务开发框架,帮助开发人员快速构建分布式应用,Spring Cloud 的官网介绍如下: Spring Cloud provides tools for developers to quickly build some of the common patterns in distributed systems (e.g. configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state). 蚂蚁金服从 2007