Plotly: How to define the structure of a sankey diagram using a pandas dataframe?

后端 未结 1 449
天涯浪人
天涯浪人 2020-12-03 00:04

This may sound like a very broad question, but if you\'ll let me describe some details I can assure you it\'s very specific. As well as discouragin

相关标签:
1条回答
  • 2020-12-03 00:29

    This problem looks really strange, but only until you will analyze how the sankey plot in plotly is created:

    When you create the sankey plot, you send to it:

    1. Nodes list
    2. Links list

    These lists are bounded with each other. When you create the 5-length node list, any edge will know about 0,1,2,3,4 in its starts and ends. In your program, you creates node wrongly - you create the list of links and then go through it and create nodes. Look at your diagram. It has two black nodes with undefined inside. And what is the length of your dataset... Yes, 5. Your node indices ends on 4 and no target nodes are really defined. You add the sixth list in your dataset and - bingo! - there are nodes[5] exists! Just try to add another new line in your dataset:

    [1,7,1,'#FF0000','WAKA','rgba(219, 233, 246,0.5)']

    And you will see that another black bar is colored to red. You have five nodes (because you have 5 links and you create node by iterating for links list), but links target indices are 5,6,7. You can fix it with two ways:

    1. Change Target's in your dataset to 2,3,4
    2. Create nodes and links separately (right way)

    I hope I helped you in your problem and in plot creation understanding (what is more important IMO).

    Edit: Here is the example of separate nodes/links creation (note that node part in data_trace uses only nodes_df data, link part in data_trace uses only links_df data and nodes_df and links_df length are not equal):

    import pandas as pd
    import numpy as np
    import plotly.graph_objs as go
    from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
    init_notebook_mode(connected=True)
    
    nodes = [
        ['ID', 'Label', 'Color'],
        [0,'Remain+No – 28','#F27420'],
        [1,'Leave+No – 16','#4994CE'],
        [2,'Remain+Yes – 21','#FABC13'],
        [3,'Leave+Yes – 14','#7FC241'],
        [4,'Didn’t vote in at least one referendum – 21','#D3D3D3'],
        [5,'46 – No','#8A5988']
    ]
    links = [
        ['Source','Target','Value','Link Color'],
        [0,3,20,'rgba(253, 227, 212, 0.5)'],
        [0,4,3,'rgba(242, 116, 32, 1)'],
        [0,2,5,'rgba(253, 227, 212, 0.5)'],
        [1,5,14,'rgba(219, 233, 246, 0.5)'],
        [1,3,1,'rgba(73, 148, 206, 1)'],
        [1,4,1,'rgba(219, 233, 246,0.5)'],
        [1,2,10,'rgba(8, 233, 246,0.5)'],
        [1,3,5,'rgba(219, 77, 246,0.5)'],
        [1,5,12,'rgba(219, 4, 246,0.5)']
    ]
    
    nodes_headers = nodes.pop(0)
    nodes_df = pd.DataFrame(nodes, columns = nodes_headers)
    links_headers = links.pop(0)
    links_df = pd.DataFrame(links, columns = links_headers)
    
    data_trace = dict(
        type='sankey',
        domain = dict(
          x =  [0,1],
          y =  [0,1]
        ),
        orientation = "h",
        valueformat = ".0f",
        node = dict(
          pad = 10,
          thickness = 30,
          line = dict(
            color = "black",
            width = 0
          ),
          label =  nodes_df['Label'].dropna(axis=0, how='any'),
          color = nodes_df['Color']
        ),
        link = dict(
          source = links_df['Source'].dropna(axis=0, how='any'),
          target = links_df['Target'].dropna(axis=0, how='any'),
          value = links_df['Value'].dropna(axis=0, how='any'),
          color = links_df['Link Color'].dropna(axis=0, how='any'),
      )
    )
    
    layout =  dict(
        title = "Scottish Referendum Voters who now want Independence",
        height = 772,
        font = dict(
          size = 10
        ),    
    )
    
    fig = dict(data=[data_trace], layout=layout)
    iplot(fig, validate=False)
    

    Edit 2: Let's dive in even more deeply :) Nodes and links in sankey diagrams are nearly fully independent. The only info that bounds them - indices in source-targets in links. So we can create many nodes and no links for them (just replace nodes/links in Edit1 code with it):

    nodes = [
        ['ID', 'Label', 'Color'],
        [0,'Remain+No – 28','#F27420'],
        [1,'Leave+No – 16','#4994CE'],
        [2,'Remain+Yes – 21','#FABC13'],
        [3,'Leave+Yes – 14','#7FC241'],
        [4,'Didn’t vote in at least one referendum – 21','#D3D3D3'],
        [5,'46 – No','#8A5988'],
        [6,'WAKA1','#8A5988'],
        [7,'WAKA2','#8A5988'],
        [8,'WAKA3','#8A5988'],
        [9,'WAKA4','#8A5988'],
        [10,'WAKA5','#8A5988'],
        [11,'WAKA6','#8A5988'],
    
    ]
    links = [
        ['Source','Target','Value','Link Color'],
        [0,3,20,'rgba(253, 227, 212, 0.5)'],
        [0,4,3,'rgba(242, 116, 32, 1)'],
        [0,2,5,'rgba(253, 227, 212, 0.5)'],
        [1,5,14,'rgba(219, 233, 246, 0.5)'],
        [1,3,1,'rgba(73, 148, 206, 1)'],
        [1,4,1,'rgba(219, 233, 246,0.5)'],
        [1,2,10,'rgba(8, 233, 246,0.5)'],
        [1,3,5,'rgba(219, 77, 246,0.5)'],
        [1,5,12,'rgba(219, 4, 246,0.5)']
    ]
    

    And these nodes will not appear in diagram.

    We can create only links without nodes:

    nodes = [
        ['ID', 'Label', 'Color'],
    ]
    links = [
        ['Source','Target','Value','Link Color'],
        [0,3,20,'rgba(253, 227, 212, 0.5)'],
        [0,4,3,'rgba(242, 116, 32, 1)'],
        [0,2,5,'rgba(253, 227, 212, 0.5)'],
        [1,5,14,'rgba(219, 233, 246, 0.5)'],
        [1,3,1,'rgba(73, 148, 206, 1)'],
        [1,4,1,'rgba(219, 233, 246,0.5)'],
        [1,2,10,'rgba(8, 233, 246,0.5)'],
        [1,3,5,'rgba(219, 77, 246,0.5)'],
        [1,5,12,'rgba(219, 4, 246,0.5)']
    ]
    

    And we will have only links from nowhere to nowhere.

    If you want to add (1) a new source with a link, you should add a new list in nodes, calculate its index (it is why I have ID column) and add a new list in links with Source equal to node index.

    If you want to add (2) a new target for existing nodes - just add a new list in links and write its Source and Target properly:

        [1,100500,10,'rgba(219, 233, 246,0.5)'],
        [1,100501,10,'rgba(8, 233, 246,0.5)'],
        [1,100502,10,'rgba(219, 77, 246,0.5)'],
        [1,100503,10,'rgba(219, 4, 246,0.5)']
    

    (Here I created 4 new links for 4 new targets. Source is the node with index 1 for all of them).

    (3+4): Sankey diagrams doesn't differ sources and targets. All of them are just nodes for Sankey. Every node can be both a source and a target. Look at it:

    nodes = [
        ['ID', 'Label', 'Color'],
        [0,'WAKA WANNA BE SOURCE','#F27420'],
        [1,'WAKA WANNA BE TARGET','#4994CE'],
        [2,'WAKA DON\'T KNOW WHO WANNA BE','#FABC13'],
    
    ]
    links = [
        ['Source','Target','Value','Link Color'],
        [0,1,10,'rgba(253, 227, 212, 1)'],
        [0,2,10,'rgba(242, 116, 32, 1)'],
        [2,1,10,'rgba(253, 227, 212, 1)'],
    ]
    

    Here you will have the 3-column Sankey diagram. The 0 node is a source, the 1 is a target and the 2 node is a source for 1 and a target for 2.

    0 讨论(0)
提交回复
热议问题