Data visualization is an elementary part of a data scientist’s toolkit. The process aims to understand and present huge amount of complex data in the form of graphs or charts. A wide variety of tools exist for data visualization. The one tool we are about to discuss here is python’s matplotlib library.

Matplotlib is a comprehensive library for visualizing data in form of bar charts, scatter plots or line charts. The flexibility of the library makes it a preferred choice for data analysts and scientists. However, if your goal is to elaborate interactive visualizations for the web, it may not be the right choice. Matplotlib is python’s third-party library so in order to use it we first need to install it. Type the following in terminal:

pip install matplotlib

OR

python -m pip install matplotlib

Next we need to import matplotlib.pyplot module using the alias plt, which is the alias used by convention for this submodule. This module maintains an internal state in which we can build up a visualization and then save it using savefig or display it using show.

from matplotlib import pyplot as plt

years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')

# add a title
plt.title("Nominal GDP")

# add a label to the y-axis
plt.ylabel("Billions of $")

plt.show()

histogram represents data in form of groups where each group is represented by a bar. The X-axis represents the bin ranges while the Y-axis represents information about frequency. The hist() function is used to compute and create histogram of x.

Syntax:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False, weights=None, cumulative=False, bottom=None, histtype=’bar’, align=’mid’, orientation=’vertical’, rwidth=None, log=False, color=None, label=None, stacked=False, \*, data=None, \*\*kwargs)
import matplotlib.pyplot as plt
import pandas as pd
 
# Reading the tips.csv file
data = pd.read_csv('tips.csv')
 
# initializing the data
x = data['total_bill']
 
# plotting the data
plt.hist(x)
 
# Adding title to the plot
plt.title("Tips Dataset")
 
# Adding label on the y-axis
plt.ylabel('Frequency')
 
# Adding label on the x-axis
plt.xlabel('Total Bill')
 
plt.show()

Scatter plots are used when the data is scattered all over the graph and is not confined to a range. It plots two or more variables that are located at different coordinates and each variable is represented by a different color. The matplotlib method scatter()  library is used to draw a scatter plot.

Syntax:

matplotlib.pyplot.scatter(x_axis_data, y_axis_data, s=None, c=None, marker=None, cmap=None, vmin=None, vmax=None, alpha=None, linewidths=None, edgecolors=None

importmatplotlib.pyplot as plt

importpandas as pd

# Reading the tips.csv file

data =pd.read_csv('tips.csv')

# initializing the data

x =data['day']

y =data['total_bill']

# plotting the data

plt.scatter(x, y)

# Adding title to the plot

plt.title("Tips Dataset")

# Adding label on the y-axis

plt.ylabel('Total Bill')

# Adding label on the x-axis

plt.xlabel('Day')

plt.show()

These were few examples of how matplotlib can be used for data visualizations. Further exploration can be done by visiting matplotlib Gallery This provides a good idea of the sorts of things that can be done with matplotlib.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *