Top Python Visualization Libraries for Data Scientists

Introduction to essential data visualization libraries for data scientists. Find the best package to visualize your data.

Top Python Visualization Libraries for Data Scientists

Visualization is one of the essential tools in every data scientist's toolbox. Every data scientist should have access to varying types of plots and graphs.

We will go through some of the most popular visualization libraries for Python. Namely, we will have a look at

  • Matplotlib
  • Seaborn
  • Plotly
  • Bokeh

All of these are well-established open-source libraries under active development and lively community support.

Demonstration  

We plot a scatter plot using all of the compared visualization libraries. The same data is used to test all the libraries. The data is generated using NumPy and stored in a Pandas data frame.

# Import Pandas and NumPy
import pandas as pd
import numpy as np

# Create a data frame with random 'x', 'y' and 'size' columns
df = pd.DataFrame(data={
    'x': 10*np.random.randn(100, ), 
    'y': 10*np.random.randn(100, ),
    'size': 10*np.abs(np.random.randn(100, ))})

Matplotlib

Matplotlib is the defacto library for Python numerical visualization. It was first released in 2003 and has been under active development ever since.

Matlotlib is the replacement of Matlab plotting engine the same as NumPy is replacement of Matlab linear algebra engine.

Matplotlib provides support for all the basic plot types: line charts, bar charts,  box diagrams, scatter plots, and pie charts. It is highly customizable and can produce publication-quality figures. Matplotlib is pretty much universally supported. You can use it with NumPy, Pandas, Jupyter, etc... It is safe to assume that you can rely on Matplotlib support for most cases.

matplotlib.pyplot provides a familiar interface for those who are accustomed to the Matlab plot interface. This way you can utilize similar functions to Matlab to visualize your data. Check the following cheat sheet to explore the features

Matplotlib cheat sheet from https://matplotlib.org/

One downside is that if you need to modify the look and feel down to the last detail, the notation can be quite verbose.

Matplotlib visualization example

import matplotlib.pyplot as plt

plt.scatter(df['x'], df['y'], s=df['size'])
plt.show()

Seaborn

Seaborn is particularly focused on plotting statistical data. It is built on top of Matplotlib. So it shares most of the features provided by Matplotlib, such, as, great support from all the necessary tools for data scientists. Unsurprisingly, the statistics and data frame library Pandas has good support for Seaborn.

Seaborn is best suite for visualization statistical data. Thus, it is great accompanion for every data scientist.

One of the nice things about Seaborn is that it treats Pandas data frames as first-class citizens, and has direct support for them. This means that you don't usually have to extract the data from specific columns, but rather just pass the whole data frame to the plotting method.  

Check the following tutorial for a more hands-on feel on the Seaborn library.

User guide and tutorial — seaborn 0.11.2 documentation

Seaborn visualization example

import seaborn as sns

sns.scatterplot(data=df, x='x', y='y', size='size')

Plotly

Plotly is a language-agnostic visualization, which is great if you are working on your data in multiple languages, such as Python, JavaScript, Julia, Matlab and R. Then, you can use a common visualization library.

Plotly is developed by a company and, thus, it has built extensive support for enterprise applications.

Even though it is backed by a company it is still an open-source tool. If you are working in big companies like Nvidia or Tesla, you might see Plotly as the goto-tool in corporate data illustrations.  

There are many good qualities of corporate-backed software: documentation tends to be up-to-date and you can get better support.

In Jupyter, Plotly provides interactive tools to explore the data. These can become surprisingly handy when you are not fully sure what is the best way to illustrate big data sets.  

Check the Plotly fundamentals below for further examples

Fundamentals
Plotly’s

Plotly visualization example

import plotly.express as px
fig = px.scatter(df, x='x', y='y', size='size')
fig.show()

Bokeh

Bokeh is more focused on interactive visualization. You can build functional data visualization with a graphical user interface to control varying parameters in your data.

Bokeh is steered towards functional visualization including interactive tools, dashboards, and streaming data sets.

If your work needs interaction or real-time data illustration, Bokeh is the tool for you.

Check the Bokeh demos site for interactive examples.

Bokeh Demos
Examples of static and interactive Bokeh visualizations.

Bokeh visualization example

import bokeh 

from bokeh.plotting import figure, output_file, show

bokeh.io.output_notebook()

p = figure(width=400, height=400)

# add a circle renderer with a size, color, and alpha
p.circle(df['x'], df['y'], size=df['size'], color="navy")

show(p)

Summary

All of these libraries have their pros and cons. The best match really depends on what you are currently working on. Our approach is usually the following. If Matplotlib provides all you need use it. This is because it has the best support across the frameworks and environments. Otherwise, go with the library that provides the tools that you need.

Further reading

Check the following great resources for further reading material.

Matplotlib

Data Visualization using Matplotlib – A Beginner’s Guide
The article covers the concepts of Data Visualization using Matplotlib through examples and codes, and the steps involves in the process.

Seaborn

Visualization with Seaborn | Python Data Science Handbook
Data Visualization Using Seaborn For Beginners - Analytics Vidhya
Seaborn is a Python data visualization library based on Matplotlib. In this article, we’ll see how to make interesting plots using Seaborn.

Plotly

Data Visualization Using Seaborn For Beginners - Analytics Vidhya
Seaborn is a Python data visualization library based on Matplotlib. In this article, we’ll see how to make interesting plots using Seaborn.

Bokeh

Interactive Data Visualization in Python With Bokeh – Real Python
This Python tutorial will get you up and running with Bokeh, using examples and a real-world dataset. You’ll learn how to visualize your data, customize and organize your visualizations, and add interactivity.

Repositories

GitHub - matplotlib/matplotlib: matplotlib: plotting with Python
matplotlib: plotting with Python. Contribute to matplotlib/matplotlib development by creating an account on GitHub.
GitHub - mwaskom/seaborn: Statistical data visualization in Python
Statistical data visualization in Python. Contribute to mwaskom/seaborn development by creating an account on GitHub.
GitHub - plotly/plotly.py: The interactive graphing library for Python (includes Plotly Express)
The interactive graphing library for Python (includes Plotly Express) :sparkles: - GitHub - plotly/plotly.py: The interactive graphing library for Python (includes Plotly Express)
GitHub - bokeh/bokeh: Interactive Data Visualization in the browser, from Python
Interactive Data Visualization in the browser, from Python - GitHub - bokeh/bokeh: Interactive Data Visualization in the browser, from Python

Books