probability

Combining individual probabilities in Naive Bayesian spam filtering

断了今生、忘了曾经 提交于 2019-12-03 06:55:52
问题 I'm currently trying to generate a spam filter by analyzing a corpus I've amassed. I'm using the wikipedia entry http://en.wikipedia.org/wiki/Bayesian_spam_filtering to develop my classification code. I've implemented code to calculate probability that a message is spam given that it contains a specific word by implementing the following formula from the wiki: My PHP code: public function pSpaminess($word) { $ps = $this->pContentIsSpam(); $ph = $this->pContentIsHam(); $pws = $this-

scikit-learn return value of LogisticRegression.predict_proba

天涯浪子 提交于 2019-12-03 04:55:43
What exactly does the LogisticRegression.predict_proba function return? In my example I get a result like this: [[ 4.65761066e-03 9.95342389e-01] [ 9.75851270e-01 2.41487300e-02] [ 9.99983374e-01 1.66258341e-05]] From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are n_samples , but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are n_classes . That certainly can't be, since I only have two classes (namely +1 and

How to do weighted random sample of categories in python

房东的猫 提交于 2019-12-03 04:14:18
问题 Given a list of tuples where each tuple consists of a probability and an item I'd like to sample an item according to its probability. For example, give the list [ (.3, 'a'), (.4, 'b'), (.3, 'c')] I'd like to sample 'b' 40% of the time. What's the canonical way of doing this in python? I've looked at the random module which doesn't seem to have an appropriate function and at numpy.random which although it has a multinomial function doesn't seem to return the results in a nice form for this

What is O value for naive random selection from finite set?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-03 03:23:06
This question on getting random values from a finite set got me thinking... It's fairly common for people to want to retrieve X unique values from a set of Y values. For example, I may want to deal a hand from a deck of cards. I want 5 cards, and I want them to all be unique. Now, I can do this naively, by picking a random card 5 times, and try again each time I get a duplicate, until I get 5 cards. This isn't so great, however, for large numbers of values from large sets. If I wanted 999,999 values from a set of 1,000,000, for instance, this method gets very bad. The question is: how bad? I'm

Python equivalent for MATLAB's normplot?

牧云@^-^@ 提交于 2019-12-03 03:21:01
Is there a python equivalent function similar to normplot from MATLAB? Perhaps in matplotlib? MATLAB syntax: x = normrnd(10,1,25,1); normplot(x) Gives: I have tried using matplotlib & numpy module to determine the probability/percentile of the values in array but the output plot y-axis scales are linear as compared to the plot from MATLAB. import numpy as np import matplotlib.pyplot as plt data =[-11.83,-8.53,-2.86,-6.49,-7.53,-9.74,-9.44,-3.58,-6.68,-13.26,-4.52] plot_percentiles = range(0, 110, 10) x = np.percentile(data, plot_percentiles) plt.plot(x, plot_percentiles, 'ro-') plt.xlabel(

R: Calculate and interpret odds ratio in logistic regression

别说谁变了你拦得住时间么 提交于 2019-12-03 02:51:53
问题 I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively). My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point. I want to know how the probability of taking the product changes as Thoughts changes. The logistic regression equation is: glm(Decision ~ Thoughts, family = binomial, data = data) According to this

Probability of 64bit Hash Code Collisions

笑着哭i 提交于 2019-12-03 02:31:44
The book Numerical Recipes offers a method to calculate 64bit hash codes in order to reduce the number of collisions. The algorithm is shown at http://www.javamex.com/tutorials/collections/strong_hash_code_implementation_2.shtml and is copied here for reference: private static final createLookupTable() { byteTable = new long[256]; long h = 0x544B2FBACAAF1684L; for (int i = 0; i < 256; i++) { for (int j = 0; j < 31; j++) { h = (h >>> 7) ^ h; h = (h << 11) ^ h; h = (h >>> 10) ^ h; } byteTable[i] = h; } return byteTable; } public static long hash(CharSequence cs) { long h = HSTART; final long

Is this a good or bad 'simulation' for Monty Hall? How come?

有些话、适合烂在心里 提交于 2019-12-03 02:27:15
问题 Through trying to explain the Monty Hall problem to a friend during class yesterday, we ended up coding it in Python to prove that if you always swap, you will win 2/3 times. We came up with this: import random as r #iterations = int(raw_input("How many iterations? >> ")) iterations = 100000 doors = ["goat", "goat", "car"] wins = 0.0 losses = 0.0 for i in range(iterations): n = r.randrange(0,3) choice = doors[n] if n == 0: #print "You chose door 1." #print "Monty opens door 2. There is a goat

Generate Random Boolean Probability

江枫思渺然 提交于 2019-12-03 02:19:37
I only know how I can generate a random boolean value (true/false). The default probability is 50:50 But how can I generate a true false value with my own probability? Let's say it returns true with a probability of 40:60 or 20:80 etc... E. Moffat Well, one way is Random.Next(100) <= 20 ? true : false , using the integer value of NextInt to force your own probability. I can't speak to the true 'randomness' of this method though. More detailed example: Random gen = new Random(); int prob = gen.Next(100); return prob <= 20; Peter O. You generate a random number up to 100 exclusive and see if it

What is the probability of collision with a 6 digit random alphanumeric code?

*爱你&永不变心* 提交于 2019-12-03 01:54:07
I'm using the following perl code to generate random alphanumeric strings (uppercase letters and numbers, only) to use as unique identifiers for records in my MySQL database. The database is likely to stay under 1,000,000 rows, but the absolute realistic maximum would be around 3,000,000. Do I have a dangerous chance of 2 records having the same random code, or is it likely to happen an insignificantly small number of times? I know very little about probability (if that isn't already abundantly clear from the nature of this question) and would love someone's input. perl -le 'print map { ("A"..