Data Visualization Tutorial For Beginners | Big Data Analytics Tutorial | Simplilearn

πŸš€ Add to Chrome – It’s Free - YouTube Summarizer

Category: Data Visualization

Tags: DataMatplotlibPlotsPythonVisualization

Entities: Jupyter NotebookMatplotlibNumPyPythonSciPySeabornSimply Learn

Building WordCloud ...

Summary

    Data Visualization Fundamentals
    • Data visualization presents data in pictorial or graphical formats, facilitating easier analysis for stakeholders and decision makers.
    • It simplifies complex quantitative information and helps explore big data efficiently.
    • Key considerations include clarity, accuracy, and efficiency in data representation.
    Importance and Benefits
    • Data visualization identifies trends and patterns, aiding in decision-making processes.
    • It highlights areas needing attention and reveals hidden patterns within data.
    Visualization Techniques and Tools
    • Python is a powerful tool for data visualization, with libraries like Matplotlib, Seaborn, and others.
    • Matplotlib is a two-dimensional plotting library that offers high-quality graphics and supports multiple platforms.
    • Libraries such as Matplotlib provide control over plot styles, including line properties, fonts, and axis properties.
    Creating Plots with Matplotlib
    • Creating a plot involves importing libraries, defining data sets, setting plot parameters, and displaying the plot.
    • Subplots can display multiple plots in a single window, allowing for organized data presentation.
    Types of Data Visualizations
    • Common plot types include histograms, scatter plots, heat maps, and pie charts, each serving different analytical purposes.
    • Heat maps offer insights into two-dimensional data, highlighting risk-prone areas and supporting cluster analysis.
    Actionable Takeaways
    • Utilize data visualization to simplify and communicate complex data effectively.
    • Choose appropriate visualization tools and techniques for your data type and analysis needs.
    • Ensure clarity, accuracy, and efficiency in your visualizations to convey the right message.
    • Explore Python libraries like Matplotlib and Seaborn for versatile and high-quality visualizations.
    • Incorporate multiple plot types to gain comprehensive insights into your data.

    Transcript

    00:00

    [Music] let's now start this lesson by defining what data visualization is data visualization is the technique to

    00:16

    present the data in a pictorial or graphical format it enables stakeholders and decision makers to analyze data visually the data in graphical format allows them to identify new trends and patterns easily well you might think why data

    00:32

    visualization is important let's explain with an example you are a sales manager in a leading global organization the organization plans to study the sales details of each product across all regions and countries this is to identify the product which

    00:47

    has the highest sales in a particular region and up the production this research will enable the organization to increase the manufacturing of that product in the particular region the data involved for this research might be huge and complex

    01:03

    the research on this large numeric data is difficult and time consuming when it is performed manually when this numeric data is plotted on a graph or converted to charts it's easy to identify the patterns and predict the result accurately

    01:18

    the main benefits of data visualization are as follows it simplifies the complex quantitative information it helps analyze and explore big data easily it identifies the areas that need attention or improvement it identifies the relationship between

    01:35

    data points and variables it explores new patterns and reveals hidden patterns in the data there are three major considerations for data visualization they are clarity accuracy and efficiency first ensure the data set is complete

    01:53

    and relevant this enables the data scientist to use the new pattern's yield from the data in the relevant places second ensure using appropriate graphical representation to convey the right message third use efficient visualization

    02:08

    technique which highlights all the data points there are some basic factors that one would need to be aware of before visualizing the data visual effect coordination system data types and scale informative interpretation

    02:25

    visual effect includes the usage of appropriate shapes colors and size to represent the analyzed data the coordinate system helps to organize the data points within the provided coordinates the data types and scale choose the type of data such as numeric or categorical

    02:43

    the informative interpretation helps create visuals in an effective and easily interpretable manner using labels title legends and pointers so far you have learned what data visualization is and how it helps interpret results with large and complex

    02:59

    data with the help of the python programming language you can perform this data visualization you'll learn more about how to visualize data using the python programming language in the subsequent screens

    03:15

    many new python data visualization libraries are introduced recently such as matplot library vispy boca seaborne pigel folium and networks the matplot library has emerged as the main data visualization library

    03:32

    let's now learn about this matplot library in detail matplot library is a python two-dimensional plotting library for data visualization and creating interactive graphics or plots using python's matplot library the data

    03:50

    visualization of large and complex data becomes easy there are several advantages of using matplot library to visualize data they are as follows it's a multi-platform data visualization tool built on the numpy and scipy

    04:06

    framework therefore it's fast and efficient it possesses the ability to work well with many operating systems and graphic back-ends it possesses high quality graphics and plots to print and view for a range of graphs such as histograms bar charts pie

    04:24

    charts scatter plots and heat maps with jupiter notebook integration the developers have been free to spend their time implementing features rather than struggling with cross-platform compatibility it has large community support and

    04:40

    cross-platform support as it is an open source tool it has full control over graph or plot styles such as line properties fonts and axis properties let's now try to understand a plot

    04:57

    a plot is a graphical representation of data which shows relationship between two variables or the distribution of data look at the example shown on the screen this is a two-dimensional line plot of the random numbers on the y-axis and the range on the x-axis

    05:15

    the background of the plot is called grid the text first plot denotes the title of the plot and text line one denotes the legend you can create a plot using four simple

    05:30

    steps import the required libraries define or import the required data set set the plot parameters display the created plot let's consider the same example plot

    05:46

    used earlier follow the steps below to obtain this plot the first step is to import the required libraries here we have imported numpy and pi plot and style from matplot library numpy is used to generate the random

    06:02

    numbers and the pi plot which is built in python library is used to plot numbers and style classes used for setting the grid style matplot library inline is required to display the plot within jupiter notebook

    06:18

    the second step is to define or import the required data set here we have defined the data set random number using numpy random method note that the range is 10. we have used the print method to view the created random numbers the third

    06:34

    step is to set the plot parameters in this step we set the style of the plot labels of the coordinates title of the plot the legend and the line width in this example we have used ggplot as the plot style

    06:49

    the plot method is used to plot the graph against the random numbers in the plot method the word g denotes the plot line color as green label denotes the legend label and it's named as line one also the line width is set to two note

    07:06

    that we have labeled the x-axis as range and the y-axis as labels and set the title as first plot the last step is to display the created plot use the legend method to plot the graph based on the set conditions and the show

    07:22

    method to display the created plot let's now learn how to create a two-dimensional plot consider the following example a nutri worldwide firm wants to know how many people visit its website at a

    07:38

    particular time this analysis helps it control and monitor the website traffic this example involves two variables namely users and time therefore this is a two dimensional or 2d plot

    07:53

    take a look at the program that creates a 2d plot object web customers is a list on the number of users and time hours indicates the time from this we understand that there are 123 customers on the website at 7 am

    08:09

    645 customers on the website at 8 am and so on the gg plot is used to set the grid style and the plot method is used to plot the website customers against time don't forget to map plot library in line

    08:25

    to display or view the plot on the jupiter notebook the website traffic curve is plotted and the graph is shown on the screen it's also possible to change the line style of the plot to change the line style of the plot use define the line

    08:41

    style as dashed in the plot method observe the output graph changes to a dashed line also note that the color is defined as blue using matplot library it's also possible to set the desired axis to interpret the required result

    08:57

    use the axis method to set the axis in this example shown on the screen the x-axis is set to range from 6.5 to 17.5 and the y-axis is set to range from 50 to 2000 let's now understand how to set the

    09:14

    transparency level of the line and to annotate a plot alpha is an attribute which controls the transparency of the line lower the alpha value more transparent the line here the alpha value is defined as 0.4

    09:29

    the annotate method is used to annotate the graph the syntax for annotate method is shown on the screen the keyword max is the attribute that denotes the annotation text h a indicates the horizontal alignment va indicates the vertical alignment

    09:47

    xy text indicates the text position and x y indicates the arrow position the keyword arrow props indicates the properties of the arrow in this example the arrow property is defined as the green color

    10:02

    the output graph is shown on the screen so far you've learned how to set line width title x-axis and y-axis label title of the plot legend line color and annotate the graph for a single plot the plot we created for website traffic

    10:19

    in the previous screens is for only one day let's now learn how to create multiple plots say for three days using the same example the data set number of user for monday tuesday and wednesday is defined with respect to its time distribution

    10:36

    use different color and line width for each day to distinguish the plot in this example we have used red for monday green for tuesday in blue for wednesday the output graph is shown on the screen a subplot is used to display

    10:51

    multiple plots in the same window with a subplot you can arrange plots in a regular grid all you need to do is specify the number of rows columns and plot the syntax for subplot is shown on the screen it divides the current window into an m

    11:07

    by n grid and creates an axis for a subplot in the position specified by p for example subplot 2 1 2 creates two subplots which are stacked vertically on a grid if you want to plot four graphs in one

    11:23

    window then the syntax used should be subplot 2 1 4 layout and spacing adjustment are two important factors to be considered while creating subplots use plt subplots adjust method with the parameters h space and w space to adjust

    11:41

    the distances between the subplot and move them around on the grid in this demo you can see how to create two subplots that will display side by side in a single frame two subplots stacked one on top of the

    11:57

    other or vertically split in a single frame and four subplots displayed in a single frame first import matplotlib plot and style

    12:18

    type percentage matplotlib inline to view the plot in jupiter notebook define the parameters such as temperature wind humidity precipitation data and time data

    12:35

    you can see the data being typed here next to create two subplots to be displayed side by side in a given frame for one two one and one two two

    12:53

    specify the figure size subplot space title the color for time and temperature data which is blue here and line style and width

    13:37

    similarly specify the color for wind which is red its line style and width

    13:59

    you can see the temperature and wind subplot charts displayed side by side in a given frame here to create subplots 4 2 1 1 and 2 1 2 specify the parameters

    14:28

    this will create two subplots stacked one on top of the other or vertically split in a given frame let's use humidity and precipitation data to plot the graphs specify the title color line style and

    14:46

    line width for both the graphs

    15:17

    you can see the two subplots stacked one on top of the other with two different colors indicating precipitation and humidity here the two graphs are separate

    15:34

    finally let's draw four subplots four two two one 2 2 2 2 2 3 and 2 2 4 that will display in a given frame

    15:58

    specify the title subplot data color line style and line width for all four subplots

    17:14

    you can see the four subplots displayed in a single frame in this demo you learned how to create subplots displayed side by side vertically split subplots and four subplots displayed in a single frame using matplotlib

    17:30

    you can create different types of plots using matplot library histogram scatter plot heat map pie chart error bar histograms histograms are graphical representations of a probability distribution in fact a

    17:47

    histogram is a kind of bar chart using matplot library and its bar chart function you can create histogram charts a histogram chart has several advantages some of them are as follows it displays the number of values within

    18:03

    a specified interval it's suitable for large data sets as they can be grouped within the intervals scatter plots a scatter plot is used to graphically display the relationship between variables a basic plot can be created using the

    18:19

    plot method however if you need more control of a plot it's recommended that you use the scatter method provided by matplot library it has several advantages it shows the correlation between variables it's suitable for large data sets

    18:35

    it's easy to find clusters it's possible to represent each piece of data as a point on the plot in this demo you'll learn how to generate a histogram and scanner plot using matplotlib let's import a data set called boston

    18:51

    dataset which we will use to create the histogram and scanner plot from the scikit-learn library let's import matplotlib pi plot

    19:15

    type percentage matplotlib inline to view the plot in jupiter notebook let's use the data in boston real estate data set to create the histogram and scatter plot

    19:31

    load this data you can view this data by using the print command

    19:51

    now define the x-axis for the data which is boston real estate data likewise define the y-axis for the data which is boston real estate data with the target extension

    20:12

    specify the plot style figure style number of bins and labels of the x-axis and y-axis use the show method to display the histogram created by you

    21:01

    specify the style size data sets and labels of the scatter plot that you want to create use the show method to display the scatter plot created by you

    21:48

    heat maps a heat map is a better way to visualize two-dimensional data using heat maps you can gain deeper and quicker insight into data than those afforded by other types of plots it has several advantages

    22:03

    it draws attention to the risky prone area it uses the entire data set to draw bigger and more meaningful insights it's used for cluster analysis and can deal with large data sets

    22:24

    in this demonstration you'll learn how to generate a heat map for a data set using matplotlib let's import the required libraries matplotlib pipot and seaborn

    22:46

    type percentage matplotlib inline to view the plot in jupiter notebook let's load the flights data set from the built-in data sets of seaborne library

    23:06

    use head to view the top five records of the data set we have to arrange the columns to generate the heat map let's use the pivot method to arrange the columns month year and passengers

    23:32

    let's view the flight data set that's now ready to generate the heat map let's use the heat map method and pass slight data as an argument

    23:49

    this will generate the heat map which you can see here in this demo you learned how to create and display a heat map pie charts pie charts are typically used to show percentage or proportional data note that usually the percentage represented

    24:06

    by each category is provided next to the corresponding slice of the pie matplot library provides the pie method to make pie charts it has several advantages it summarizes a large data set in visual form

    24:21

    it displays the relative proportions of multiple classes of data the size of the circle is made proportional to the total quantity in this demonstration you'll learn how to create a pie chart and display it

    24:40

    first import matplotlib pie plot type percentage matplotlib inline to view the plot in jupiter notebook type the job data within parentheses

    24:57

    using single quotes separated by commas specify the labels as i t finance marketing admin

    25:13

    hr and operations

    25:28

    specify the slice i t to explode use the show method to display the pie

    25:44

    chart you can see the pie chart with the slices labels and i t the largest slice error bars an error bar is used to show the graphical representation of the

    25:59

    variability of data it's used mainly to point out errors it builds confidence about the data analysis by unleashing the statistical differences between the two groups of data it has several advantages it shows the variability in data and

    26:16

    indicates the errors it depicts the precision in the data analysis it demonstrates how well a function and model are used in the data analysis it defines the underlying data seaborn is a python visualization

    26:31

    library based on matplot library it provides a high level interface for drawing attractive statistical graphics it was originally developed at stanford university and is widely used for plotting and visualizing data there are several advantages

    26:47

    it possesses built-in themes for better visualizations it has tools built-in statistical functions which reveal hidden patterns in the data set it has functions to visualize matrices of data which become very important when visualizing large data sets

    27:08

    hey want to become an expert in big data then subscribe to the simply learn channel and click here to watch more such videos to nerd up and get certified in big data click here