data-analysis | 易学教程

What to do with missing values when plotting with seaborn?

阅读更多关于 What to do with missing values when plotting with seaborn?

问题 I replaced the missing values with NaN using lambda following function: data = data.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x) ,where data is the dataframe I am working on. Using seaborn afterwards,I tried to plot one of its attributes,alcconsumption using seaborn.distplot as follows: seaborn.distplot(data['alcconsumption'],hist=True,bins=100) plt.xlabel('AlcoholConsumption') plt.ylabel('Frequency(normalized 0->1)') It's giving me the following error:

When should I use C++ instead of SQL?

阅读更多关于 When should I use C++ instead of SQL?

问题 I am a C++ programmer who occasionally uses MySQL to work with databases, but my SQL knowledge is rather limited. However I am surely willing to change that. At the moment I am trying to do analysis(!) on the data I have in my database solely with SQL queries. But I am about to give up, and instead import the data to C++ and do the analysis with C++ code. I have discussed this with my colleagues, and they also push me to use C++, saying that SQL is not meant for complex analysis but mainly

How to plot two DataFrame on same graph for comparison

阅读更多关于 How to plot two DataFrame on same graph for comparison

问题 I have two DataFrames (trail1 and trail2) with the following columns: Genre, City, and Number Sold. Now I want to create a bar graph of both data sets for a side by side comparison of Genre vs. total Number Sold. For each genre, I want to two bars: one representing trail 1 and the other representing trail 2. How can I achieve this using Pandas? I tried the following approach which did NOT work. gf1 = df1.groupby(['Genre']) gf2 = df2.groupby(['Genre']) gf1Plot = gf1.sum().unstack().plot(kind=

Fourier transform with python

阅读更多关于 Fourier transform with python

I have a set of data . It is obviously have some periodic nature. I want to find out what frequency it has by using the fourier transformation and plot it out. Here is a shot of mine, but it seems not so good. This is the corresponding code, I don't konw why it fails: import numpy from pylab import * from scipy.fftpack import fft,fftfreq import matplotlib.pyplot as plt dataset = numpy.genfromtxt(fname='data.txt',skip_header=1) t = dataset[:,0] signal = dataset[:,1] npts=len(t) FFT = abs(fft(signal)) freqs = fftfreq(npts, t[1]-t[0]) subplot(211) plot(t[:npts], signal[:npts]) subplot(212) plot

Pandas: convert datetime timestamp to whether it's day or night?

阅读更多关于 Pandas: convert datetime timestamp to whether it's day or night?

问题 I am trying to determine if its a day or night based on list of timestamps. Will it be correct if I just check the hour between 7:00AM to 6:00PM to classify it as "day", otherwise "night"? Like I have done in below code. I am not sure of this because sometimes its day even after 6pm so whats the accurate way to differentiate between day or night using python? sample data: (timezone= utc/zulutime) timestamps = ['2015-03-25 21:15:00', '2015-06-27 18:24:00', '2015-06-27 18:22:00', '2015-06-27 18

Converting list in panda dataframe into columns

阅读更多关于 Converting list in panda dataframe into columns

问题 city state neighborhoods categories Dravosburg PA [asas,dfd] ['Nightlife'] Dravosburg PA [adad] ['Auto_Repair','Automotive'] I have above dataframe I want to convert each element of a list into column for eg: city state asas dfd adad Nightlife Auto_Repair Automotive Dravosburg PA 1 1 0 1 1 0 I am using following code to do this : def list2columns(df): """ to convert list in the columns of a dataframe """ columns=['categories','neighborhoods'] for col in columns: for i in range(len(df)): for

plotting a timeseries graph in python using matplotlib from a csv file

阅读更多关于 plotting a timeseries graph in python using matplotlib from a csv file

问题 I have some csv data in the following format. Ln Dr Tag Lab 0:01 0:02 0:03 0:04 0:05 0:06 0:07 0:08 0:09 L0 St vT 4R 0 0 0 0 0 0 0 0 0 L2 Tx st 4R 8 8 8 8 8 8 8 8 8 L2 Tx ss 4R 1 1 9 6 1 0 0 6 7 I want to plot a timeseries graph using the columns ( Ln , Dr , Tg , Lab ) as the keys and the 0:0n field as values on a timeseries graph. I have the following code. #!/usr/bin/env python import matplotlib.pyplot as plt import datetime import numpy as np import csv import sys with open("test.csv", 'r'

ECG Data Analysis on a real-time signal in Python

阅读更多关于 ECG Data Analysis on a real-time signal in Python

I am using Python to produce an electrocardiogram (ECG) from signals obtained by an Arduino. I want to perform some analysis on it, what type of analysis I do not know yet that is something I have yet to decide. However my question is, is it possible to do this analysis on a real time flow of data coming through the serial port, or is it easier/better to save the data first to suppose a text file and then perform analysis on it. Right now I can't wrap my head round how to do it. An extra note: I would at the very minimum like to detect the peaks of the signal (R wave) and the R-R interval (so

ECG Data Analysis on a real-time signal in Python

阅读更多关于 ECG Data Analysis on a real-time signal in Python

问题 I am using Python to produce an electrocardiogram (ECG) from signals obtained by an Arduino. I want to perform some analysis on it, what type of analysis I do not know yet that is something I have yet to decide. However my question is, is it possible to do this analysis on a real time flow of data coming through the serial port, or is it easier/better to save the data first to suppose a text file and then perform analysis on it. Right now I can't wrap my head round how to do it. An extra note

Pandas DataFrame find the max after Groupby two columns and get counts

阅读更多关于 Pandas DataFrame find the max after Groupby two columns and get counts

问题 I have a dataframe df as following: userId pageId tag 0 3122471 e852 18 1 3122471 f3e2 18 2 3122471 7e93 18 3 3122471 2768 6 4 3122471 53d9 6 5 3122471 06d7 15 6 3122471 e31c 15 7 3122471 c6f3 2 8 1234123 fjwe 1 9 1234123 eiae 4 10 1234123 ieha 4 After using df.groupby(['userId', 'tag'])['pageId'].count() to group the data by userId and tag . I will get: userId tag 3122471 2 1 6 2 15 2 18 3 1234123 1 1 4 2 Now I want to find the tag that each user has the most. Just as following: userId tag