Most of the time we need to look at our data from different angles. We need to use different Visualizations to represent our data so that we can identify some pattern from it. There is no one-fit chart for every problem, you need to look at your problem statement and decide on what all visualizations charts will be useful.

Let us look into some of the visualizations charts and under which situation we can use them.

A pie chart is a circular representation (complete section as a whole) which is splitted into subsections illustrating the proportions. The sum of all the subsections will be equal to 100%

When we are trying to compare sub sections among each other, or when our objective is to understand the proportion of each sub section

It is useful for the case when you have categorical data. It is not useful for continuous data values. For continuous values, histograms are preferred.

**Code:**

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv("graphical_data.csv")

data.groupby(['Gender']).size().plot(kind = 'pie', colors=['tomato', 'skyblue'], textprops={'fontsize': 25}, autopct='%.2f')

plt.xlabel("Proportion of Male Vs. Female", size=20)

plt.ylabel("")

plt.tight_layout()

plt.show()

**Output:**

In this dataset, it is clearly visible that 65.5% of the population is Female population and Male population is just 34.5%

There is a limitation with bar chart that if the percentages are not shown explicitly then it becomes really hard to interpret which subsection is having more proportion over others.

Take a look at the above image, since the proportion percentages are not mentioned in the chart so it becomes difficult for us to answer which course has a high proportion over others.

Therefore, in such scenarios pie charts are less preferred and we have bar charts for this purpose.

**Code:**

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv("graphical_data.csv")

data.groupby(['Education']).size().plot(kind = 'bar', color="tomato")

plt.xlabel("Education", size=20)

plt.xticks(fontsize = 15)

plt.show()

**Output:**

A Histogram is a graphical representation that buckets the data into ranges. Histograms are similar to Bar charts but Bar charts are mostly suitable for categorical data, whereas Histograms are suitable for continuous data values which are clubbed together to form buckets.

**Code:**

import pandas as pd

import matplotlib.pyplot as plt

data = pd.read_csv("graphical_data.csv")

plt.hist(data['Salary'], bins = [10000, 15000, 20000, 25000, 30000, 35000], color='orange', edgecolor = "black")

plt.ylabel("Count", size=20)

plt.yticks(fontsize = 15)

plt.xlabel("Salary", size=20)

plt.xticks(fontsize = 15)

plt.show()

**Output:**

A scatter plot is a graphical visualization which is used to understand / show the relationship between two variables.

Take a look at the above image. The first image shows that there is a positive relation between two variables. As we move towards the right on X axis, the trend is moving upwards on Y axis.

Whereas, the second chart shows that there is a negative relation between two variables. As we are moving towards the right on X axis, the trend is moving downwards on Y axis.

Third chart shows that there is no relation between two variables. Neither there is an increasing trend nor there is a decreasing trend in the chart.

**Code:**

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

df = pd.read_csv("graphical_data.csv")

sns.regplot(data = df, x='Age', y='Salary')

plt.ylabel("Salary", size=20)

plt.yticks(fontsize = 15)

plt.xlabel("Age", size=20)

plt.xticks(fontsize = 15)

plt.show()

**Output:**

We have used ‘regplot’ from seaborn library. It helps us to plot a scatter line along with a regression line. In this chart it is clear that there is a positive relation between Age and Salary variables. When a person is of higher age, his Salary is expected to be more.

We can also check the relation between two variables through correlation function.

**Code:**

data['Age'].corr(data['Salary'])

**Output:**

0.694 # Correlation varies from -1 to 1, 0.69 shows that there is good positive relation

Courses | Blogs | Cheat Sheet | News Letter | About Us | Login | Contact | Privacy policy | Cookie policy

© Padhai Time 2022 | All Rights Reserved

We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.

Our Privacy policy can be found by clicking here