analysis

Ruby Text Analysis

泄露秘密 提交于 2019-11-29 07:37:14
问题 Is there any Ruby gem or else for text analysis? Word frequency, pattern detection and so forth (preferably with an understanding of french) 回答1: the generalization of word frequencies are Language Models, e.g. uni-grams (= single word frequency), bi-grams (= frequency of word pairs), tri-grams (=frequency of world triples), ..., in general: n-grams You should look for an existing toolkit for Language Models — not a good idea to re-invent the wheel here. There are a few standard toolkits

Find out the real file type

我只是一个虾纸丫 提交于 2019-11-29 04:46:44
问题 I am working on an ASP web page that handles file uploads. Only certain types of files are allowed to be uploaded, like .XLS, .XML, .CSV, .TXT, .PDF, .PPT, etc. I have to decide if a file really has the same type as the extension shows. In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file. What techniques would you use to analyze these uploaded files? Where can I get the best information

How are exponents calculated?

主宰稳场 提交于 2019-11-29 02:20:34
I'm trying to determine the asymptotic run-time of one of my algorithms, which uses exponents, but I'm not sure of how exponents are calculated programmatically. I'm specifically looking for the pow() algorithm used for double-precision, floating point numbers. I've had a chance to look at fdlibm's implementation. The comments describe the algorithm used: * n * Method: Let x = 2 * (1+f) * 1. Compute and return log2(x) in two pieces: * log2(x) = w1 + w2, * where w1 has 53-24 = 29 bit trailing zeros. * 2. Perform y*log2(x) = n+y' by simulating muti-precision * arithmetic, where |y'|<=0.5. * 3.

Search times for binary search tree

人盡茶涼 提交于 2019-11-28 19:38:22
Does anyone know how to figure out search time for a binary search tree(i.e. worst-case, best-case, and average-case)? For a non-self-balancing tree (possible but unusual for a search tree), worst case is O(n), which is for the degenerate binary tree (a linked list). In this case, you have to search, on average, half the list before finding your desired element. Best case is O(log 2 n) for a perfectly balanced tree, since you cut the search space in half for every tree level. Average case is somewhere in between those two and depends entirely on the data :-) Since you rarely get to control the

Algorithms or libraries for textual analysis, specifically: dominant words, phrases across text, and collection of text

╄→гoц情女王★ 提交于 2019-11-28 15:01:33
问题 I'm working on a project where I need to analyze a page of text and collections of pages of text to determine dominant words. I'd like to know if there is a library (prefer c# or java) that will handle the heavy lifting for me. If not, is there an algorithm or multiple that would achieve my goals below. What I want to do is similar to word clouds built from a url or rss feed that you find on the web, except I don't want the visualization. They are used all the time for analyzing the

Reccurrence T(n) = T(n^(1/2)) + 1

允我心安 提交于 2019-11-28 10:18:06
I've been looking at this reccurrence and wanted to check if I was taking the right approach. T(n) = T(n^(1/2)) + 1 = T(n^(1/4)) + 1 + 1 = T(n^(1/8)) + 1 + 1 + 1 ... = 1 + 1 + 1 + ... + 1 (a total of rad n times) = n^(1/2) So the answer would come to theta bound of n^(1/2) hint: assume n = 2 2 m or m = log 2 log 2 n, and you know 2 2 m-1 * 2 2 m-1 = 2 2 m so, if you define S(m)=T(n) your S will be: S(m) = S(m-1)+1 → S(m) = Θ(m) → S(m)=T(n) = Θ(log 2 log 2 n) extend it for the general case. In recursion like T(n) = T(n/2) + 1, in each iteration, we reduce the height of the tree to half. This

Find out the real file type

痴心易碎 提交于 2019-11-28 07:49:37
I am working on an ASP web page that handles file uploads. Only certain types of files are allowed to be uploaded, like .XLS, .XML, .CSV, .TXT, .PDF, .PPT, etc. I have to decide if a file really has the same type as the extension shows. In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file. What techniques would you use to analyze these uploaded files? Where can I get the best information about the format of these files? One way would be to check for certain signatures or magic numbers in the

Where is the Query Analyzer in SQL Server Management Studio 2008 R2?

末鹿安然 提交于 2019-11-27 20:24:05
I have some SQL thats getting run and it is taking to long to return the results / parse / display, etc. in a asp.net c# application. I have SQL Server Management Studio 2008 R2 installed to connect to a remote SQL Server 2000 machine. Is there a Query Analyzer or profiler I can use to see whats going on? I'm not sure if I'm sending too many requests, if the requests are taking too long, if there are additional indexes I can add to speed things up etc. EDIT: Any free tools out there that are replacements for the Microsoft tools? Default locations: Programs > Microsoft SQL Server 2008 R2 > SQL

Difference between average case and amortized analysis

孤街浪徒 提交于 2019-11-27 17:50:31
I am reading an article on amortized analysis of algorithms. The following is a text snippet. Amortized analysis is similar to average-case analysis in that it is concerned with the cost averaged over a sequence of operations. However, average case analysis relies on probabilistic assumptions about the data structures and operations in order to compute an expected running time of an algorithm. Its applicability is therefore dependent on certain assumptions about the probability distribution of algorithm inputs. An average case bound does not preclude the possibility that one will get “unlucky”

Showing an image with pylab.imshow()

混江龙づ霸主 提交于 2019-11-27 17:41:54
问题 I'm relatively new to all this and I started to do the tutorial on image analysis here: http://www.pythonvision.org/basic-tutorial I have installed all the modules but I didn't get very far before hitting a snag. when trying to perform the pylab.imshow(dna) step it returns the following error: In [10]: pylab.imshow(dna) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-10-fc86cadb4e46> in <module>() ----> 1