bigdata

set-up our own vertex id in titan using java API

旧时模样 提交于 2019-12-13 04:41:42
问题 I want to set up my own id to vertex like the way mentioned below. BaseConfiguration configuration = new BaseConfiguration(); configuration.setProperty("storage.backend", "hbase"); configuration.setProperty("storage.hostname", "slave05"); configuration.setProperty("storage.port", "2181"); configuration.setProperty("storage.tablename", "REC_GRAPH1"); TitanGraph graph = TitanFactory.open(configuration); Vertex vertex = graph.addVertex(200); graph.commit(); But I'm not able to.. I guess I'm

HDP 2.4, How to collect hadoop mapreduce log using flume in one file and what is the best practice

淺唱寂寞╮ 提交于 2019-12-13 03:44:22
问题 We are using HDP 2.4 and have many map reduce jobs written in various ways ( java MR / Hive / etc. ) . The logs are collect in hadoop file system under the application ID. I want to collect all the logs of application and append in single file (hdfs or OS files of one machine) so that I can analyze my application log in a single location with out hassle . Also advise me the best way to achieve in HDP 2.4 ( Stack version info => HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log

Transferring Millions of rows from teradata to mySQL

徘徊边缘 提交于 2019-12-13 03:38:31
问题 I have to transfer around 5 million rows of data from Teradata to MySQL. Can anyone please suggest me the fastest way to do this over the network, without using the filesystem. I am new to Teradata and MySQL. I want to run this transfer as a batch job on weekly basis, so I am looking for the solution which can be fully automated. Any suggestions or hints will be greatly appreciated. I have already written the code using JDBC to get the records from the Teradata and insert them into the MySQL.

HiveServer2 generate a lot of directories in hdfs /tmp/hive/hive

限于喜欢 提交于 2019-12-13 03:00:21
问题 We create new claster with Hiveserver2 (on Hortonworks HDP2.2 distribution). After some time we have more than 1048576 directories in /tmp/hive/hive on hdfs, because hive server generates it in this location. Someone has got similar problem? Logs from hiveserver: 2015-08-31 06:48:15,828 WARN [HiveServer2-Handler-Pool: Thread-1104]: conf.HiveConf (HiveConf.java:initialize(2499)) - HiveConf of name hive.heapsize does not exist 2015-08-31 06:48:15,829 WARN [HiveServer2-Handler-Pool: Thread-1104]

Getting error as Failed to create data storage when trying to load the data from HDFS with MovieLens data

爱⌒轻易说出口 提交于 2019-12-13 02:26:09
问题 I am trying to load data from HDFS to Pig but I am getting error as Failed to create Data Storage. The command that I executed was: movies = LOAD 'hdfs://localhost:9000/Movie_Lens/ratings' USING PigStorage(':') AS (user_id, dummy1, movie_id, dummy2, movie_rating, dummy3, timestamp); I tried to find the mentioned problem in stack overflow but the link that I got are not related to HDFS and Pig, they are related to HDFS and HBase or Pig and HBase. The detail of the log file is mentioned below.

Part of this bucket may contain partial data - kibana Issue

一曲冷凌霜 提交于 2019-12-13 02:24:39
问题 I am facing problem while visualizing graph over kibana as its not displaying all the items from my bucket and giving warning as below Part of this bucket may contain partial data. Here is screen shot for same. Not sure what i am doing wrong. Kindly help to get it resolved. 回答1: You've asked Kibana to use "years" as the x-axis. Since 2016 isn't done yet, any data for 1/1-now would be in the "2016" bucket, but it's "not complete". Make sense? 来源: https://stackoverflow.com/questions/35745552

Nxlog im_dbi is not working

我是研究僧i 提交于 2019-12-13 01:26:12
问题 I am able to insert data into PostgreSQL using nxlog(om_dbi). But I am not able to select data(or fetch data) from PostgreSQL using nxlog. I tried many options nothing is working. And in nxlog document also for IM_DBI module description has only "FIXME" mentioned. Document Link: http://nxlog.org/documentation/nxlog-community-edition-reference-manual-v20928#im_dbi Please help me to solve this. Logs: <Input dbiin> Module im_dbi SavePos TRUE SQL SELECT * FROM NEW_TABLE Driver pgsql Option host

How to find items in a collections which are not in another collection with MongoDB

你离开我真会死。 提交于 2019-12-13 01:09:43
问题 I want to query my mongodb to perform a non-match between 2 collections. Here is my structure : CollectionA : _id, name, firstname, website_account_key, email, status CollectionB : _id, website_account_key, lifestage, category, target, flag_sistirt I'am trying to find Items in B, for which, there is no lines in A (website_account_key is unique and allows to find elements in B for each A [one-to-one]) I tried to do : dataA_ids = db.dataA.find().map(function(a){return a.website_account_key;})

Reading labview binary files in Matlab?

别来无恙 提交于 2019-12-13 00:35:16
问题 I have large .bin files (10GB-60GB) created by Labview software, the .bin files represent the output of two sensors used from experiments that I have done. The problem I have is importing the data into Matlab, the only way I have achieved this so far is by converting the .bin files to .txt files in Labview software then Importing the data into MATLAB using the following code: Nlines = 1e6; % set number of lines to sample per cycle sample_rate = (1); %sample rate DECE= 1000;% decimation factor

Number of Cycles from list of values, which are mix of positives and negatives in Spark and Scala

别说谁变了你拦得住时间么 提交于 2019-12-13 00:19:56
问题 Have an RDD with List of values, which are mix of positives and negatives. Need to compute number of cycles from this data. For example, val range = List(sampleRange(2020,2030,2040,2050,-1000,-1010,-1020,Starting point,-1030,2040,-1020,2050,2040,2020,end point,-1060,-1030,-1010) the interval between each value in above list is 1 second. ie., 2020 and 2030 are recorded in 1 second interval and so on. how many times it turns from negative to positive and stays positive for >= 2 seconds. If >= 2