How to make multiline graph with matplotlib subplots and pandas?

后端 未结 1 822
春和景丽
春和景丽 2021-01-06 14:42

I\'m fairly new at coding (completely self taught), and have started using it at at my job as a research assistant in a cancer lab. I need some help setting up a few line gr

相关标签:
1条回答
  • 2021-01-06 15:10

    I wrote a subplot function that may give you a hand. I modified the data a tad to help illustrate the plotting functionality.

    gene,yaxis,xaxis,pt #,gene #
    ASXL1-3,34,1,3,1
    ASXL1-3,3,98,3,1
    IDH1-3,24,1,3,11
    IDH1-3,7,98,3,11
    RUNX1-3,38,1,3,21
    RUNX1-3,2,98,3,21
    U2AF1-3,33,1,3,26
    U2AF1-3,0,98,3,26
    ASXL1-3,39,1,4,1
    ASXL1-3,8,62,4,1
    ASXL1-3,0,119,4,1
    IDH1-3,27,1,4,11
    IDH1-3,12,62,4,11
    IDH1-3,1,119,4,11
    RUNX1-3,42,1,4,21
    RUNX1-3,3,62,4,21
    RUNX1-3,1,119,4,21
    U2AF1-3,16,1,4,26
    U2AF1-3,1,62,4,26
    U2AF1-3,0,119,4,26
    

    This is the subplotting function...with some extra bells and whistles :)

    def plotByGroup(df, group, xCol, yCol, title = "", xLabel = "", yLabel = "", lineColors = ["red", "orange", "yellow", "green", "blue", "purple"], lineWidth = 2, lineOpacity = 0.7, plotStyle = 'ggplot', showLegend = False):
        """
        Plot multiple lines from a Pandas Data Frame for each group using DataFrame.groupby() and MatPlotLib PyPlot.
        @params
            df          - Required  - Data Frame    - Pandas Data Frame
            group       - Required  - String        - Column name to group on           
            xCol        - Required  - String        - Column name for X axis data
            yCol        - Required  - String        - Column name for y axis data
            title       - Optional  - String        - Plot Title
            xLabel      - Optional  - String        - X axis label
            yLabel      - Optional  - String        - Y axis label
            lineColors  - Optional  - List          - Colors to plot multiple lines
            lineWidth   - Optional  - Integer       - Width of lines to plot
            lineOpacity - Optional  - Float         - Alpha of lines to plot
            plotStyle   - Optional  - String        - MatPlotLib plot style
            showLegend  - Optional  - Boolean       - Show legend
        @return
            MatPlotLib Plot Object
    
        """
        # Import MatPlotLib Plotting Function & Set Style
        from matplotlib import pyplot as plt
        matplotlib.style.use(plotStyle)
        figure = plt.figure()                   # Initialize Figure
        grouped = df.groupby(group)             # Set Group
        i = 0                                   # Set iteration to determine line color indexing
        for idx, grp in grouped:
            colorIndex = i % len(lineColors)    # Define line color index
            lineLabel = grp[group].values[0]    # Get a group label from first position
            xValues = grp[xCol]                 # Get x vector
            yValues = grp[yCol]                 # Get y vector
            plt.subplot(1,1,1)                  # Initialize subplot and plot (on next line)
            plt.plot(xValues, yValues, label = lineLabel, color = lineColors[colorIndex], lw = lineWidth, alpha = lineOpacity)
            # Plot legend
            if showLegend:
                plt.legend()
            i += 1
        # Set title & Labels
        axis = figure.add_subplot(1,1,1)
        axis.set_title(title)
        axis.set_xlabel(xLabel)
        axis.set_ylabel(yLabel)
        # Return plot for saving, showing, etc.
        return plt
    

    And to use it...

    import pandas
    
    # Load the Data into Pandas
    df = pandas.read_csv('data.csv')    
    
    #
    # Plotting - by Patient
    #
    
    # Create Patient Grouping
    patientGroup = df.groupby('pt #')
    
    # Iterate Over Groups
    for idx, patientDF in patientGroup:
        # Let's give them specific titles
        plotTitle = "Gene Frequency over Time by Gene (Patient %s)" % str(patientDf['pt #'].values[0])
        # Call the subplot function
        plot = plotByGroup(patientDf, 'gene', 'xaxis', 'yaxis', title = plotTitle, xLabel = "Days", yLabel = "Gene Frequency")
        # Add Vertical Lines at Assay Timepoints
        timepoints = set(patientDf.xaxis.values)
        [plot.axvline(x = timepoint, linewidth = 1, linestyle = "dashed", color='gray', alpha = 0.4) for timepoint in timepoints]
        # Let's see it
        plot.show()
    

    And of course, we can do the same by gene.

    #
    # Plotting - by Gene
    #
    
    # Create Gene Grouping
    geneGroup   = df.groupby('gene')
    
    # Generate Plots for Groups
    for idx, geneDF in geneGroup:
        plotTitle = "%s Gene Frequency over Time by Patient" % str(geneDf['gene'].values[0])
        plot = plotByGroup(geneDf, 'pt #', 'xaxis', 'yaxis', title = plotTitle, xLab = "Days", yLab = "Frequency")
        plot.show()
    

    If this isn't what you're looking for, provide a clarification and I'll take another crack at it.

    0 讨论(0)
提交回复
热议问题