09 Jul
09Jul

Data Visualization using Python

There are lots of libraries in Python to do the data visualization. Out of those today we will explore and get the information about two libraries:

1. Matplotlib

-  The most widely used library for plotting in the Python community

-  It was designed to closely resemble MATLAB, a proprietary programming language developed in the 1980s

- Because matplotlib was the first Python data visualization library, many other libraries are built on top of it or designed to work in tandem with it during analysis.

- One of Matplotlib’s most important features is its ability to play well with many operating systems and graphics backends. Matplotlib supports dozens of backends and output types, which means you can count on it to work regardless of which operating system you are using or which output format you wish. This cross-platform, everything-to-everyone approach has been one of the great strengths of Matplotlib. It has led to a large user base, which in turn has led to an active developer base and Matplotlib’s powerful tools and ubiquity within the scientific Python world.

General Matplotlib Tips

Before we dive into the details of creating visualizations with Matplotlib, there are a few useful things you should know about using the package.

1. Importing Matplotlib

import matplotlib as mpl
import matplotlib.pyplot as plt

 

2. Setting Styles

plt.style.use('classic')


3. show ()or No show()How to Display Your Plots

Plotting from a script: plt.show()


4. Saving Figures to File

fig.savefig('my_figure.png')


- For more details refer here

Few types of plots using Matplotlib.pyplot :

Line Plot

Multiple subplots in one figure

Histograms

Bar charts


Pie charts


Tables


Scatter plots


Log plots


Polar plots

Scatter plots

GUI Widget

And many more, please refer detailes on my GitHub

Disadvantages:

  • Prior to version 2.0, Matplotlib's defaults are not exactly the best choices. It was based off of MATLAB circa 1999, and this often shows.
  • Matplotlib's API is relatively low level. Doing sophisticated statistical visualization is possible, but often requires a lot of boilerplate code.
  • Matplotlib predated Pandas by more than a decade, and thus is not designed for use with Pandas DataFrames. In order to visualize data from a Pandas DataFrame, you must extract each Series and often concatenate them together into the right format. It would be nicer to have a plotting library that can intelligently use the DataFrame labels in a plot.
  • An answer to these problems is Seaborn. Lets learn about it more.

2. Seaborn

-  It’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern.

-  Since Seaborn is built on top of matplotlib, you’ll need to know matplotlib to tweak Seaborn’s defaults.

- Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrames.

- For more details refer here.

Few types of plots using Seaborn

Bar plot

Box plot

Dist plot

Facet grid

Heat map

Ivplot

Implot

Join plot

Kde plot

Swarm plot

And many more, please refer details on my GitHub


Follow and Subscribe:

https://twitter.com/vaishalilambe

https://www.youtube.com/user/vaishali17infy/

https://github.com/vaishalilambe