large-data

How can I analyse ~13GB of data?

徘徊边缘 提交于 2019-12-09 05:08:15
问题 I have ~300 text files that contain data on trackers, torrents and peers. Each file is organised like this: tracker.txt time torrent time peer time peer ... time torrent ... I have several files per tracker and much of the information is repeated (same information, different time). I'd like to be able to analyse what I have and report statistics on things like How many torrents are at each tracker How many trackers are torrents listed on How many peers do torrents have How many torrents to

How to pass a string larger than 200 character to a stored procedure via param

六眼飞鱼酱① 提交于 2019-12-09 03:58:34
问题 I got stuck with one problem, in my code i have to make a sum request of all article that is present in my datatable, i concatenate all article ID in one string like 'a1,a2,a3' and this is supposed to work. But i have large ID and around 150 article, so the string i try to pass to the stored procedure is around 1300 characters and this is truncate at 200 characters when it goes to the stored procedure. Do you know any solution to pass a large string to a stored procedure without SQL Server to

SQL query on H2 database table throws ArrayIndexOutOfBoundsException

穿精又带淫゛_ 提交于 2019-12-08 22:12:59
问题 I have a H2 database on which some queries work, while others are throwing an ArrayIndexOutOfBoundsException . For example: SELECT COLUMN_1 FROM MY_TABLE; // works fine SELECT COUNT(COLUMN_1) FROM MY_TABLE; // gives following error message: [Error Code: 50000, SQL State: HY000] General error: "java.lang.ArrayIndexOutOfBoundsException"; SQL statement: SELECT COUNT(COLUMN_1) FROM MY_TABLE [50000-167] What is the cause for this eror message? 回答1: The reason for the error message was a corrupt

Java : a method to do multiple calculations on arrays quickly

扶醉桌前 提交于 2019-12-08 13:34:33
问题 Sorry if this isn't the right place for these questions, but I'm in need of some base help. I have a class called Differential and that has a member variable mValues (List>). What I want to do is iterate through all the values and compare them with each other. This means for five values I am doing 10 comparisons. However this is going to be used for at least 20,000 lists. There are three calculations that I want to do and I was wondering how I should approach this. My current idea comes from

dcast for huge dataframe [R]

断了今生、忘了曾经 提交于 2019-12-08 10:50:21
问题 Assume a DF of: pnr <- c(1, 1, 1, 2, 2, 3, 4, 5, 5) diag <- c("a", "a", NA, "b", "a", NA, "c", "a", "f") year <- rep(2007, 9) ht <- data.frame(pnr, diag, year) Now I need to reshape such that: require(reshape2) md <- melt(ht, id = c("pnr", "year")) output <- dcast(md, pnr ~ value) Output is now in the format I want. But when I run this on a large data frame, 13million rows, it will crash R-studio. Is there some smart way to split a dataframe, do the dcast , and tie back? EDIT : The solutions

Fastest way of doing field comparisons in the same table with large amounts of data in oracle

為{幸葍}努か 提交于 2019-12-08 07:00:25
问题 I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison. Here is what I have: I have a vb.net program that parses the

Apache solr adding/editing/deleting records frequently

谁说胖子不能爱 提交于 2019-12-08 02:37:44
问题 I'm thinking about using Apache Solr. In my db I will have around 10.000.000 records.The worst case where I will use it has around 20 searchable/sortable fields. My problem is that these fields may change values frequently during the day. For example in my db I might change some fields at the same time of 10000 records and this may happen 0, 1 or 1000 times a day etc. The point is that each time I update a value in the db I want it to be updated in solr too so I can search with the updated

Angular.Js Performance, large dataset, ng-repeat, html table with filters and two way binding

雨燕双飞 提交于 2019-12-08 01:55:04
问题 So I have a simple layout of a page which includes a panel of filters and a html table of records using ng-repeat. I am using MVC5 and an angularJs controller I may have to deal with up to 100000 records. Filters will occur for most of the columns including dates and text fields The records need to deal with two way binding (user has to select records which will be returned to the server). I'd like get opinions on the best design ideas for this....i.e. Would you load all the data to the

Fastest way of doing field comparisons in the same table with large amounts of data in oracle

↘锁芯ラ 提交于 2019-12-07 15:14:23
I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison. Here is what I have: I have a vb.net program that parses the data and inserts it into an extract table I run a procedure to do a full outer join on the two tables

Method for copying large amounts of data in C#

自闭症网瘾萝莉.ら 提交于 2019-12-07 14:19:56
问题 I am using the following method to copy the contents of a directory to a different directory. public void DirCopy(string SourcePath, string DestinationPath) { if (Directory.Exists(DestinationPath)) { System.IO.DirectoryInfo downloadedMessageInfo = new DirectoryInfo(DestinationPath); foreach (FileInfo file in downloadedMessageInfo.GetFiles()) { file.Delete(); } foreach (DirectoryInfo dir in downloadedMessageInfo.GetDirectories()) { dir.Delete(true); } } //======================================