Introduction:
Data visualization is an elementary part of a data scientist’s toolkit. The process aims to understand and present huge amount of complex data in the form of graphs or charts. A wide variety of tools exist for data visualization. The one tool we are about to discuss here is python’s matplotlib library.
Matplotlib is a comprehensive library for visualizing data in form of bar charts, scatter plots or line charts. The flexibility of the library makes it a preferred choice for data analysts and scientists. However, if your goal is to elaborate interactive visualizations for the web, it may not be the right choice. Matplotlib is python’s third-party library so in order to use it we first need to install it. Type the following in terminal:
pip install matplotlib
OR
python -m pip install matplotlib
pyplot:
Next we need to import matplotlib.pyplot module using the alias plt
, which is the alias used by convention for this submodule. This module maintains an internal state in which we can build up a visualization and then save it using savefig or display it using show.
Example:
from matplotlib import pyplot as plt
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')
# add a title
plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")
plt.show()

Histogram:
A histogram represents data in form of groups where each group is represented by a bar. The X-axis represents the bin ranges while the Y-axis represents information about frequency. The hist() function is used to compute and create histogram of x.
Syntax:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype=’bar’, align=’mid’, orientation=’vertical’, rwidth=None, log=False, color=None, label=None, stacked=False, \*, data=None, \*\*kwargs)
Example:
import matplotlib.pyplot as plt
import pandas as pd
# Reading the tips.csv file
data = pd.read_csv('tips.csv')
# initializing the data
x = data['total_bill']
# plotting the data
plt.hist(x)
# Adding title to the plot
plt.title("Tips Dataset")
# Adding label on the y-axis
plt.ylabel('Frequency')
# Adding label on the x-axis
plt.xlabel('Total Bill')
plt.show()

Scatter Plots:
Scatter plots are used when the data is scattered all over the graph and is not confined to a range. It plots two or more variables that are located at different coordinates and each variable is represented by a different color. The matplotlib method scatter() library is used to draw a scatter plot.
Syntax:
matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None, c=None, marker=None, cmap=None, vmin=None, vmax=None, alpha=None, linewidths=None, edgecolors=None
Example:
import
matplotlib.pyplot as plt
import
pandas as pd
# Reading the tips.csv file
data
=
pd.read_csv(
'tips.csv'
)
# initializing the data
x
=
data[
'day'
]
y
=
data[
'total_bill'
]
# plotting the data
plt.scatter(x, y)
# Adding title to the plot
plt.title(
"Tips Dataset"
)
# Adding label on the y-axis
plt.ylabel(
'Total Bill'
)
# Adding label on the x-axis
plt.xlabel(
'Day'
)
plt.show()

These were few examples of how matplotlib can be used for data visualizations. Further exploration can be done by visiting matplotlib Gallery This provides a good idea of the sorts of things that can be done with matplotlib.