3 interesting features of NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.”

NetworkX lets the user create a graph and then study it. For example – find the shortest path between nodes, find node degree, find the maximal clique, find coloring of a graph and so on. In this post, I’ll present a few features I find interesting and are maybe less known.

Multigraphs

Multigraph is a graph that can store multiedges. Multiedges are multiple edges between two nodes (it is different from hypergraph where an edge can connect any number of nodes and no just two). NetworkX has 4 graph types – the well-known commonly used directed and undirected graph and 2 multigraphs –  nx.MultiDiGraph for directed multigraph and nx.MultiGraph for undirected multigraph.

In the example below, we see that if the graph type is not defined correctly, functionalities such as degree calculation may yield the wrong value –

import networkx as nx</pre>
G = nx.MultiGraph() G.add_nodes_from([1, 2, 3]) G.add_edges_from([(1, 2), (1, 3), (1, 2)]) print(G.degree()) #[(1, 3), (2, 2), (3, 1)] H = nx.Graph() H.add_nodes_from([1, 2, 3]) H.add_edges_from([(1, 2), (1, 3), (1, 2)]) print(H.degree()) #[(1, 2), (2, 1), (3, 1)] 

Create a graph from pandas dataframe

Pandas is the swiss knife of every data scientist, so naturally, it would be a good idea to create a graph from pandas dataframe. The other way around is also possible. See the documentation here. The example below shows how to create a multigraph from a pandas dataframe where each edge has a weight property.

import pandas as pd</pre>
df = pd.DataFrame([[1, 1, 4], [2, 1, 5], [3, 2, 6], [1, 1, 3]], columns=['source', 'destination', 'weight']) print(df) # source destination weight # 0 1 1 4 # 1 2 1 5 # 2 3 2 6 # 3 1 1 3 G = nx.from_pandas_edgelist(df, 'source', 'destination', ['weight'], create_using=nx.MultiGraph) print(nx.info(G)) # Name: # Type: MultiGraph # Number of nodes: 3 # Number of edges: 4 # Average degree: 2.6667 

Graph generators

One of the features I find the most interesting and powerful. The graph generator interface allows creating several types we just one line of code. Some of the graphs are deterministic given a parameter (e.g complete graph of k nodes) while some are random (e.g. binomial graph). Below are a few examples of deterministic graphs and random graphs. The examples below are the tip of the iceberg of the graph generator capabilities.

Complete graph – creates a graph with n nodes and an edge between every two nodes.

Empty graph – creates a graph with n nodes and no edges.

Star graph – create a graph with one central node connected to n external nodes.

G = nx.complete_graph(n=9)
print(len(G.edges()), len(G.nodes()))
# 36 9
H = nx.complete_graph(n=9, create_using=nx.DiGraph)
print(len(H.edges()), len(H.nodes()))
# 72 9
J = nx.empty_graph(n=9)
print(len(J.edges()), len(J.nodes()))
# 0 9
K = nx.star_graph(n=9)
print(len(K.edges()), len(K.nodes()))
# 9 10

Binomial Graph – create a graph with n nodes and each edge is created with probability p (alias for gnp_random_graph and erdos_renyi_graph).

G1 = nx.binomial_graph(n=9, p=0.5, seed=1)
G2 = nx.binomial_graph(n=9, p=0.5, seed=1)
G3 = nx.binomial_graph(n=9, p=0.5)
print(G1.edges()==G2.edges(), G1.edges()==G3.edges())
# True False

Random regular graph – creates a graph with n nodes, edges are created randomly and each node has degree d.

G = nx.random_regular_graph(d=4, n=10)
nx.draw(G)
plt.show()

Random regula graph

Random tree – create a uniformly random tree of n nodes.

G = nx.random_tree(n=10)
nx.draw(G)
plt.show()

random_tree

All the code in this post can be found here

Additional Resource

Official site

SO questions

https://www.datacamp.com/community/tutorials/networkx-python-graph-tutorial

https://www.geeksforgeeks.org/directed-graphs-multigraphs-and-visualization-in-networkx/amp/

Visualization – Data scientist toolkit

Data scientist are said to have better development knowledge than the average statistician and better statistic knowledge than the average developer. However, together with those skills one also needs marketing skills – the ability to communicate your, no so simple job and results to other people. Those people can be the CTO or VP R&D, team members, customers or sales and marketing people. They don’t necessarily share your knowledge or dive into the details as fast as you.

One of the best ways to make data and results accessible is creating visualizations, automatically of course. In this post I’ll review several visualizations tools, mostly for Python with some additional side kicks.

Matplotlib – probably the most known python visualization package. Includes most of the standard charts – bar charts, pie charts, scatters, ability to embed images, etc. Since there are many users using it there are many questions, examples and documentations around the web. However, the downside for me is that it is more complex than it should be. I have used it in several projects and I don’t yet acquired the intuition to fully utilize.

Matplotlib have several extensions including –

graphviz – Designated for drawing graphs. Graph drawing software with python package. pygraphviz is a python package for graphviz which provides a drawing layer and graph layout algorithms. The first downside of this is that you need to download the graphviz software. I have done it several times on several different machines (most of the consist of ubuntu) it never passed smoothly and I was not able to do it only from the command line which make it problematic if one wants to deploy it on remote machines. I believe that it could be done but at the moment I find this process only as an irksome overhead.

Side kicks –

  • PyDot – Implements DOT graph description language. PyDot is basically an interface to interact with PyGraphviz dot layout. The main advantage of the dot files and data is the advantage in standardization – one can create dot file in one process and use it in other process. DOT is an intuitive language which focuses on drawing the graph and not on calculating the graph. I would say that it is the last step in the chain.
  • Networkx – a package for working and manipulating graph. Implements many graph algorithms such as shortest path, clustering, minimum spanning tree, etc. The graphs created in Networkx can be drawn using either matplotlib or pygraphviz and can also create dot files.

Vincent – A relatively new python visualization package. Vincent translates Python to Vega which is a visualization  grammar. I like it because it is easy, interactive and simple to output either as JSON or as HTML . However, I’m not sure that both Vincent and Vega are mature enough at this point to answer all the needs. It is important to mention that Vega is actually a wrapper above D3 which is an amazing tool with growing community.

Additional related tools I’m not (yet) experienced with –

  • xlsxwriter – creating excel files (xlsx format) including embedding charts on those files.
  • plot.ly – very talked about tool for collaborating data and graphing tool which have a Python client. I try to keep my data as private as possible and don’t want to be dependent on internet connection (for example – creating graph with a lot of data) so this is the down side for me in this tool. However, the social \ collaborative aspect of this product is also an important part and the graphing is only one aspect of it.
  • Google charts – same downside as plot.ly – I like to be as independent as possible. However, comparing to plot.ly it looks more mature and has far more options, chart types than plot.ly at this stage and there is also a sand box to play with it. Plot.ly has advantages over Google charts in the ease of usage for non programmers.
  • Bokeh – Nice, interactive charts on large data sets. Maybe the next big thing for plotting in Python.