bigdata

Likes in mongoDB

不想你离开。 提交于 2020-01-05 11:58:36
问题 Here my problem: I'd like to create a network which allows users to upload posts and like them. I think i can store each post in a single collection called 'post', and i don't have problems in doing it. But where can i store likes on each post with related data (time, userfrom etc..)? I can't do it inside post document because maximum size of a document is 16MB, and imagining i'll have to record thousands of likes with related data for each post i can't do this. I could relate each post to a

HRegionServer shows “error telling master we are up”. Showing socket exception: Invalid argument

有些话、适合烂在心里 提交于 2020-01-05 08:37:26
问题 Iam trying to create a hbase cluster in 3 centos machines. Hadoop(v - 2.8.0) is up and running on top I configured HBase(v - 1.2.5).Hbase start up is fine it started HMaster and Region servers but still it shows the follwing error in region servers and in HMaster log it shows no region servers are checked in. 2017-04-20 19:30:33,950 WARN [regionserver/localhost/127.0.0.1:16020] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: java.net

How to load multi-line column data in hive table? Columns having new line characters

一世执手 提交于 2020-01-05 07:44:29
问题 I have a column (not the last column) in Excel file that contains data which is spanning over few lines. Some cells of column is blank and some have single lines entries. When saving as .CSV file or a tab separated .txt from excel, all the multi-line data and few single line entries are getting generated in double quotes, None of the blank fields are in quotes. Some of the single line entries are not within quotes. Is it possible to store the data with this same structure in a hive table? If

Not able to connect to HBase from Windows

安稳与你 提交于 2020-01-05 07:22:11
问题 I am trying to run a HBase Java Client Program from Windows. All I have is 1) A Java Program without any compiler error 2) hbase-site.xml (No other HDFS or HBase config files I have. Only the above one.) When I run the program I get the following error-given in the last block. Do I miss something? Both I am giving here. <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>IP Address1,IPAddress2

Partition a very large INNER JOIN SQL query

梦想的初衷 提交于 2020-01-05 05:19:23
问题 The sql query is fairly standard inner join type. For example comparing n tables to see which customerId's exist in all n tables would be a basic WHERE ... AND type query. The problem is the size of the tables are > 10 million records. The database is denormalized. Normalization is not an option. The query either takes to long to complete or never completes. I'm not sure if it's relevant but we are using spring xd job modules for other types of queries. I'm not sure how to partition this sort

Partition a very large INNER JOIN SQL query

ⅰ亾dé卋堺 提交于 2020-01-05 05:19:16
问题 The sql query is fairly standard inner join type. For example comparing n tables to see which customerId's exist in all n tables would be a basic WHERE ... AND type query. The problem is the size of the tables are > 10 million records. The database is denormalized. Normalization is not an option. The query either takes to long to complete or never completes. I'm not sure if it's relevant but we are using spring xd job modules for other types of queries. I'm not sure how to partition this sort

How to incrementally create an sparse matrix on python?

﹥>﹥吖頭↗ 提交于 2020-01-05 04:40:07
问题 I am creating a co-occurring matrix, which is of size 1M by 1M integer numbers. After the matrix is created, the only operation I will do on it is to get top N values per each row (or column. as it is a symmetric matrix). I have to create matrix as sparse to be able to fit it in memory. I read input data from a big file, and update co-occurance of two indexes (row, col) incrementally. The sample code for Sparse dok_matrix specifies that I should declare the size of matrix before hand. I know

Is SparkSQL RDBMS or NOSQL?

北慕城南 提交于 2020-01-05 04:04:06
问题 Recently, I was having a discussion with my friend over the features of SparkSQL when we came across this question. Are they ACID transactions? Does SparkSQL follow CAP theorem? I am a little new to this field, help me out. Thanks in advance. 回答1: SparkSQL is a query language and not a storage like Hive or MYSQL. Although it can register table which can be used by others, its only temporary. SparkSQL supports what the underlying databases support. 回答2: SparkSQL follows the Relational database

Hive(Bigdata)- difference between bucketing and indexing

怎甘沉沦 提交于 2020-01-04 21:35:10
问题 What is the main difference between bucketing and indexing of a table in Hive? 回答1: The main difference is the goal: Indexing The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like 'WHERE tab1.col1 = 10' load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed. Indexes become even more essential when the

Row Locking in HBase single row transaction support

ぃ、小莉子 提交于 2020-01-04 11:05:25
问题 In HBase, For providing single row transaction support it uses Row Locking Concept. Suppose, for example Put p=new Put("/*Row Key*/"); This statement will lock the row. so, until we complete the table.put(p) the lock won't gets released. So, in between if i start a new put i.e Put p1=new Put("/ Row Key "); the p1 put should not work since the row has already been locked but in HBase 0.94 when i tried it's working. Regarding Row Lock Link Where i had seen about Row Lock Is there any thing