data-visualization | 易学教程

Simple logistic regression with Statsmodels: Adding an intercept and visualizing the logistic regression equation

阅读更多关于 Simple logistic regression with Statsmodels: Adding an intercept and visualizing the logistic regression equation

问题 Using Statsmodels, I am trying to generate a simple logistic regression model to predict whether a person smokes or not (Smoke) based on their height (Hgt). I have a feeling that an intercept needs to be included into the logistic regression model but I am not sure how to implement one using the add_constant() function. Also, I am unsure why the error below is generated. This is the dataset, Pulse.CSV: https://drive.google.com/file/d/1FdUK9p4Dub4NXsc-zHrYI-AGEEBkX98V/view?usp=sharing The full

Bar chart with rounded corners in Matplotlib?

阅读更多关于 Bar chart with rounded corners in Matplotlib?

问题 How can I create a bar plot with rounded corners, like shown in this image? Can it be done with matplotlib? 回答1: It looks like there's no way to directly add rounded corners to a bar chart. But matplotlib does provide a FancyBboxPatch class a demo of which is available here. So in order to create a plot like shown in the question we could first make a simple horizontal bar chart: import pandas as pd import numpy as np # make up some example data np.random.seed(0) df = pd.DataFrame(np.random

Animated dot histogram, built observation by observation (using gganimate in R)

阅读更多关于 Animated dot histogram, built observation by observation (using gganimate in R)

问题 I would like to sample points from a normal distribution, and then build up a dotplot one by one using the gganimate package until the final frame shows the full dotplot. A solution that works for larger datasets ~5,000 - 20,000 points is essential. Here is the code I have so far: library(gganimate) library(tidyverse) # Generate 100 normal data points, along an index for each sample samples <- rnorm(100) index <- seq(1:length(samples)) # Put data into a data frame df <- tibble(value=samples,

Plot data from pandas DataFrame, colour of points dependant on a column

阅读更多关于 Plot data from pandas DataFrame, colour of points dependant on a column

问题 I have a pandas DataFrame with 3 columns, shown below. col1 value flag 1 0 0 2 0.03915 0 3 0.13 1 I want to create a scatterplot from this dataframe where col1 is the x axis and value is the y axis, and the part I'm struggling to do is, for the rows that have flag=0 I want the color of this point to be blue and similarly if flag=1 I want to color the point red. Is there a simple to to check the flag column per row and color the point accordingly? 回答1: You can use the built-in df.plot.scatter(

Plot another point on top of swarmplot

阅读更多关于 Plot another point on top of swarmplot

问题 I want to plot a "highlighted" point on top of swarmplot like this The swarmplot don't have the y-axis, so I have no idea how to plot that point. import seaborn as sns sns.set(style="whitegrid") tips = sns.load_dataset("tips") ax = sns.swarmplot(x=tips["total_bill"]) 回答1: This approach is predicated on knowing the index of the data point you wish to highlight, but it should work - although if you have multiple swarmplots on a single Axes instance it will become slightly more complex. import

Height values for each point in a plot

阅读更多关于 Height values for each point in a plot

问题 I have a data of protein-protein interactions in a data frame entitled: s1m. Each DB and AD pair make an interaction and I can plot it as well: > head(s1m) DB_num AD_num [1,] 2 8153 [2,] 7 3553 [3,] 8 4812 [4,] 13 7838 [5,] 24 3315 [6,] 24 6012 Plot of the data looks like: I then used code I found on this site to plot filled contour lines: ## compute 2D kernel density, see MASS book, pp. 130-131 require(MASS) z <- kde2d(s1m[,1], s1m[,2], n=50) plot(s1m, xlab="X label", ylab="Y label", pch=19,

TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' while plotting a seaborn.regplot

阅读更多关于 TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' while plotting a seaborn.regplot

问题 I'm trying to plot a regplot using seaborn and i'm not unable to plot it and facing TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe' . My data has 731 rows and 16 column - >>> bike_df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 731 entries, 0 to 730 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 instant 731 non-null int64 1 dteday 731 non-null object 2 season 731 non-null int64 3

Connected points in ggplot boxplot

阅读更多关于 Connected points in ggplot boxplot

问题 I'm trying to create a simple boxplot with connected lines similar to the one described in this question: Connect ggplot boxplots using lines and multiple factor. However, the interaction term in that example produces an error: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? I would like to connect each point using the index variable. Here is the code: group <- c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")

Display seaborn plots at some point later in code

阅读更多关于 Display seaborn plots at some point later in code

问题 Let's say at some point in my code, I have following two graphs: i.e. graph_p_changes and graph_p_contrib line_grapgh_p_changes = df_p_change[['year','interest accrued', 'trade debts', 'other financial assets']].melt('year', var_name='variables', value_name='p_changes') graph_p_changes = sns.factorplot(x="year", y="p_changes", hue='variables', data=line_grapgh_p_changes, height=5, aspect=2) graph_p_changes.set(xlabel='year', ylabel='percentage change in self value across the years') line

How to deal with a lot of plots in R

阅读更多关于 How to deal with a lot of plots in R

问题 I have a for loop which produces 60 plots. I would like to save all this plots in only one file. If I set par(mfrow=c(10,6)) it says : Error in plot.new() : figure margins too large What can I do? My code is as follows: pdf(file="figure.pdf") par(mfrow=c(10,6)) for(i in 1:60){ x=rnorm(100) y=rnorm(100) plot(x,y) } dev.off() 回答1: Your default plot, as stated in the loop, does not use the space very effectively. If you look at just a single plot, you can see it has large margins, both between