statistics | 易学教程

Instance of scipy.stats.rv_discrete subclass throws error on pmf() method

阅读更多关于 Instance of scipy.stats.rv_discrete subclass throws error on pmf() method

问题 I want to create a subsclass of scipy.stats.rv_discrete to add some additional methods. However, when I try to access the pmf() method of the subclass, an error is raised. Please see the following example: import numpy as np from scipy import stats class sub_rv_discrete(stats.rv_discrete): pass xk = np.arange(2) pk = (0.5, 0.5) instance_subclass = sub_rv_discrete(values=(xk, pk)) instance_subclass.pmf(xk) This results in: Traceback (most recent call last): File "<ipython-input-48-129655c38e6a

How to compare feature selection regression-based algorithm with tree-based algorithms?

阅读更多关于 How to compare feature selection regression-based algorithm with tree-based algorithms?

问题 I'm trying to compare which feature selection model is more eficiente for a specific domain. Nowadays the state of the art in this domain (GWAS) is regression-based algorithms (LR, LMM, SAIGE, etc), but I want to give a try with tree-based algorithms (I'm using LightGBM LGBMClassifier with boosting_type='gbdt' as the cross-validation selected for me as most efficient one). I managed to get something like: Regression based alg --------------------- Features P-Values f1 2.49746e-21 f2 5.63324e

How can I sample a multivariate log-normal distribution in Python?

阅读更多关于 How can I sample a multivariate log-normal distribution in Python?

问题 Using Python, how can I sample data from a multivariate log-normal distribution? For instance, for a multivariate normal, there are two options. Let's assume we have a 3 x 3 covariance matrix and a 3-dimensional mean vector mu. # Method 1 sample = np.random.multivariate_normal(mu, covariance) # Method 2 L = np.linalg.cholesky(covariance) sample = L.dot(np.random.randn(3)) + mu I found numpy's numpy.random.lognormal, but that only seems to work for univariate samples. I also noticed scipy's

plotting a histogram on a Log scale with Matplotlib

阅读更多关于 plotting a histogram on a Log scale with Matplotlib

问题 I have a pandas DataFrame that has the following values in a Series x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1] I was instructed to plot two histograms in a Jupyter notebook with Python 3.6. No sweat right? x.plot.hist(bins=8) plt.show() I chose 8 bins because that looked best to me. I have also been instructed to plot another histogram with the log of x. x.plot.hist(bins=8) plt.xscale('log') plt.show() This

How to make a bell curve in VB.net [closed]

阅读更多关于 How to make a bell curve in VB.net [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I have a program that builds a histogram chart and I want to add a bell curve that represents the ideal curve based on the goal I got from engineering. here is a snapshot to give you an idea of what im working with. I formatted the seconds to represent hh:mm:ss. In this example: bins: 20 the goal: 672 seconds

Looping through parallel lists/arrays in an SPSS macro

阅读更多关于 Looping through parallel lists/arrays in an SPSS macro

问题 I would like to write a SPSS macro to perform three operations: generate a custom table, clean the output window, export table. As you know SPSS macro facility allows to use two types of loops: 'numeric' like ( !do !i = !x !to !y ) and 'list'/'for each' like ( !do !i !in (!1) ). My goal is to create a macro with a call as below: col v1 v2 / "Sheet A" "Sheet B". working this way (with a 'list' like loop): Get first variable name (v1) Put it in the ctables macro section Get first sheet name

Looping through parallel lists/arrays in an SPSS macro

阅读更多关于 Looping through parallel lists/arrays in an SPSS macro

Side by Side BarPlot

阅读更多关于 Side by Side BarPlot

问题 I'm trying to create this kind of "side by side" barplot with seaborn and pandas . this is how I create data frame: dfs = pd.DataFrame(data={'investors': ['first','second','third'], 'stocks': [23, 123, 54], 'bonds': [54, 67, 123], 'real estate': [45, 243, 23]}) And here is barplot code: sns.factorplot(x='investors', y='bonds', data=dfs, kind='bar') Can anyone please help? Thanks 回答1: Use melt on your dateframe then plot it with seaborn. dfs = pd.DataFrame(data={'investors': ['first','second',

Python Earth Mover Distance of 2D arrays

阅读更多关于 Python Earth Mover Distance of 2D arrays

问题 I would like to compute the Earth Mover Distance between two 2D arrays (these are not images). Right now I go through two libraries: scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html) and pyemd (https://pypi.org/project/pyemd/). #define a sampeling method def sampeling2D(n, mu1, std1, mu2, std2): #sample from N(0, 1) in the 2D hyperspace x = np.random.randn(n, 2) #scale N(0, 1) -> N(mu, std) x[:,0] = (x[:,0]*std1) + mu1 x[:,1] = (x[:,1]*std2) +

Eliminating outliers by standard deviation in SQL Server

阅读更多关于 Eliminating outliers by standard deviation in SQL Server

问题 I am trying to eliminate outliers in SQL Server 2008 by standard deviation. I would like only records that contain a value in a specific column within +/- 1 standard deviation of that column's mean. How can I accomplish this? 回答1: If you are assuming a bell curve distribution of events, then only 68% of values will be within 1 standard deviation away from the mean (95% are covered by 2 standard deviations). I would load a variable with the standard deviation of your range (derived using stdev