bigdata | 易学教程

Likes in mongoDB

阅读更多关于 Likes in mongoDB

问题 Here my problem: I'd like to create a network which allows users to upload posts and like them. I think i can store each post in a single collection called 'post', and i don't have problems in doing it. But where can i store likes on each post with related data (time, userfrom etc..)? I can't do it inside post document because maximum size of a document is 16MB, and imagining i'll have to record thousands of likes with related data for each post i can't do this. I could relate each post to a

HRegionServer shows “error telling master we are up”. Showing socket exception: Invalid argument

阅读更多关于 HRegionServer shows “error telling master we are up”. Showing socket exception: Invalid argument

问题 Iam trying to create a hbase cluster in 3 centos machines. Hadoop(v - 2.8.0) is up and running on top I configured HBase(v - 1.2.5).Hbase start up is fine it started HMaster and Region servers but still it shows the follwing error in region servers and in HMaster log it shows no region servers are checked in. 2017-04-20 19:30:33,950 WARN [regionserver/localhost/127.0.0.1:16020] regionserver.HRegionServer: error telling master we are up com.google.protobuf.ServiceException: java.net

How to load multi-line column data in hive table? Columns having new line characters

阅读更多关于 How to load multi-line column data in hive table? Columns having new line characters

问题 I have a column (not the last column) in Excel file that contains data which is spanning over few lines. Some cells of column is blank and some have single lines entries. When saving as .CSV file or a tab separated .txt from excel, all the multi-line data and few single line entries are getting generated in double quotes, None of the blank fields are in quotes. Some of the single line entries are not within quotes. Is it possible to store the data with this same structure in a hive table? If

Not able to connect to HBase from Windows

阅读更多关于 Not able to connect to HBase from Windows

问题 I am trying to run a HBase Java Client Program from Windows. All I have is 1) A Java Program without any compiler error 2) hbase-site.xml (No other HDFS or HBase config files I have. Only the above one.) When I run the program I get the following error-given in the last block. Do I miss something? Both I am giving here. <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>IP Address1,IPAddress2

Partition a very large INNER JOIN SQL query

阅读更多关于 Partition a very large INNER JOIN SQL query

问题 The sql query is fairly standard inner join type. For example comparing n tables to see which customerId's exist in all n tables would be a basic WHERE ... AND type query. The problem is the size of the tables are > 10 million records. The database is denormalized. Normalization is not an option. The query either takes to long to complete or never completes. I'm not sure if it's relevant but we are using spring xd job modules for other types of queries. I'm not sure how to partition this sort

Partition a very large INNER JOIN SQL query

阅读更多关于 Partition a very large INNER JOIN SQL query

How to incrementally create an sparse matrix on python?

阅读更多关于 How to incrementally create an sparse matrix on python?

问题 I am creating a co-occurring matrix, which is of size 1M by 1M integer numbers. After the matrix is created, the only operation I will do on it is to get top N values per each row (or column. as it is a symmetric matrix). I have to create matrix as sparse to be able to fit it in memory. I read input data from a big file, and update co-occurance of two indexes (row, col) incrementally. The sample code for Sparse dok_matrix specifies that I should declare the size of matrix before hand. I know

Is SparkSQL RDBMS or NOSQL?

阅读更多关于 Is SparkSQL RDBMS or NOSQL?

问题 Recently, I was having a discussion with my friend over the features of SparkSQL when we came across this question. Are they ACID transactions? Does SparkSQL follow CAP theorem? I am a little new to this field, help me out. Thanks in advance. 回答1: SparkSQL is a query language and not a storage like Hive or MYSQL. Although it can register table which can be used by others, its only temporary. SparkSQL supports what the underlying databases support. 回答2: SparkSQL follows the Relational database

Hive(Bigdata)- difference between bucketing and indexing

阅读更多关于 Hive(Bigdata)- difference between bucketing and indexing

问题 What is the main difference between bucketing and indexing of a table in Hive? 回答1: The main difference is the goal: Indexing The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like 'WHERE tab1.col1 = 10' load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed. Indexes become even more essential when the

Row Locking in HBase single row transaction support

阅读更多关于 Row Locking in HBase single row transaction support

问题 In HBase, For providing single row transaction support it uses Row Locking Concept. Suppose, for example Put p=new Put("/*Row Key*/"); This statement will lock the row. so, until we complete the table.put(p) the lock won't gets released. So, in between if i start a new put i.e Put p1=new Put("/ Row Key "); the p1 put should not work since the row has already been locked but in HBase 0.94 when i tried it's working. Regarding Row Lock Link Where i had seen about Row Lock Is there any thing