large-data

updating line in large text file using scala

老子叫甜甜 提交于 2020-01-05 12:32:26
问题 i've a large text file around 43GB in .ttl contains triples in the form : <http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://la.dbpedia.org/resource/Mahatma_Gandhi> . <http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://lad.dbpedia.org/resource/Mohandas_Gandhi> . and i want to find the fastest way to update a specific line inside the file without rewriting all next text. either by updating it or deleting it and appending it

Querying Large Table in sql server 2008 [duplicate]

核能气质少年 提交于 2020-01-05 08:59:23
问题 This question already has answers here : Improve SQL Server query performance on large tables (9 answers) Closed 6 years ago . We have a table with 250 Million records (unique 15 digit number. Clustered Unique Index column) which will be queried at least by 0.7 to 0.9 Million requests on an average/day. We have multi applications accessing this table. Each application will try to compare 500,000 data against this 260 Million records. We have application that will add more data to this Large

Serialize/Deserialize Large DataSet

此生再无相见时 提交于 2020-01-05 04:15:33
问题 I have a reporting tool that sends query requests to a server. After the query is done by the server the result is sent back to the requesting reporting tool. The communication is done using WCF. The queried data, stored in a DataSet object, is very large and is usually round about 100mb big. To fasten the transmission I serialize (BinaryFormatter) and compress the DataSet.The transmitted object between the server and reporting tool is a byte array. However after a few requests the reporting

Best way to search for partial words in large MySQL dataset

你离开我真会死。 提交于 2020-01-04 05:52:28
问题 I've looked for this question on stackoverflow, but didn't found a really good answer for it. I have a MySQL database with a few tables with information about a specific product. When end users use the search function in my application, it should search for all the tables, in specific columns. Because the joins and many where clauses where not performing really well, I created a stored procedure, which splits all the single words in these tables and columns up, and inserts them in the table.

Optimize Python: Large arrays, memory problems

安稳与你 提交于 2020-01-04 03:32:06
问题 I'm having a speed problem running a python / numypy code. I don't know how to make it faster, maybe someone else? Assume there is a surface with two triangulation, one fine (..._fine) with M points, one coarse with N points. Also, there's data on the coarse mesh at every point (N floats). I'm trying to do the following: For every point on the fine mesh, find the k closest points on coarse mesh and get mean value. Short: interpolate data from coarse to fine. My code right now goes like that.

Not able to print very large strings in java (neither in Eclipse nor in cmd)

拈花ヽ惹草 提交于 2020-01-03 15:56:44
问题 I am working with very large strings whose length range from 0 to 2*10^5. When I try to print the strings on the console or using command line via System.out.println , nothing shows up. Only strings/substrings with 4096 characters show up. Also, I get no errors. I also tried to print the characters one at a time using System.out.print(chararray[i]) but to no avail. I even tried tried to use the below but it did not work. StringWriter stringWriter = new StringWriter(); BufferedWriter

Log-computations in Python

时光毁灭记忆、已成空白 提交于 2019-12-30 07:49:21
问题 I'm looking to compute something like: Where f(i) is a function that returns a real number in [-1,1] for any i in {1,2,...,5000} . Obviously, the result of the sum is somewhere in [-1,1] , but when I can't seem to be able to compute it in Python using straight forward coding, as 0.5 5000 becomes 0 and comb(5000,2000) becomes inf , which result in the computed sum turning into NaN . The required solution is to use log on both sides. That is using the identity a × b = 2 log(a) + log(b) , if I

Log-computations in Python

最后都变了- 提交于 2019-12-30 07:48:06
问题 I'm looking to compute something like: Where f(i) is a function that returns a real number in [-1,1] for any i in {1,2,...,5000} . Obviously, the result of the sum is somewhere in [-1,1] , but when I can't seem to be able to compute it in Python using straight forward coding, as 0.5 5000 becomes 0 and comb(5000,2000) becomes inf , which result in the computed sum turning into NaN . The required solution is to use log on both sides. That is using the identity a × b = 2 log(a) + log(b) , if I

file based merge sort on large datasets in Java

久未见 提交于 2019-12-29 05:22:10
问题 given large datasets that don't fit in memory, is there any library or api to perform sort in Java? the implementation would possibly be similar to linux utility sort. 回答1: Java provides a general-purpose sorting routine which can be used as part of the larger solution to your problem. A common approach to sort data that's too large to all fit in memory is this: 1) Read as much data as will fit into main memory, let's say it's 1 Gb 2) Quicksort that 1 Gb (here's where you'd use Java's built

What causes a Python segmentation fault?

a 夏天 提交于 2019-12-27 11:03:44
问题 I am implementing Kosaraju's Strong Connected Component(SCC) graph search algorithm in Python. The program runs great on small data set, but when I run it on a super-large graph (more than 800,000 nodes), it says "Segmentation Fault". What might be the cause of it? Thank you! Additional Info: First I got this Error when running on the super-large data set: "RuntimeError: maximum recursion depth exceeded in cmp" Then I reset the recursion limit using sys.setrecursionlimit(50000) but got a