Lets try out to plot a Pareto Chart.in Google Sheets.
Why? Well see the below text for Pareto Chart from wikipedia:
A Pareto chart, named after Vilfredo Pareto, is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line.
The left vertical axis is the frequency of occurrence, but it can alternatively represent cost or another important unit of measure. The right vertical axis is the cumulative percentage of the total number of occurrences, total cost, or total of the particular unit of measure. Because the reasons are in decreasing order, the cumulative function is a concave function. To take the example below, in order to lower the amount of late arrivals by 78%, it is sufficient to solve the first three issues.
The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors. In quality control, it often represents the most common sources of defects, the highest occurring type of defect, or the most frequent reasons for customer complaints, and so on. Wilkinson (2006) devised an algorithm for producing statistically based acceptance limits (similar to confidence intervals) for each bar in the Pareto chart.
These charts can be generated by simple spreadsheet programs, such as Apache OpenOffice/LibreOffice Calc [1] and Microsoft Excel,[2] visualization tools such as Tableau Software,[3] specialized statistical software tools, and online quality charts generators.
The Pareto chart is one of the seven basic tools of quality control.[4]
See, no mention of Google Sheets in the above description. The author(s) has mentioned simple spreadsheet programs, and then goes on the specifically mention the spreadsheet programs. Anyways, we can also use many other spreadsheet programs, but why not try with something every Google user has access to.
I will take inspiration from the sample data provided in this chart:
https://en.wikipedia.org/wiki/File:Pareto.PNG
The sample data is a hypothetical data outlining the causes of employees coming late to office.
First, create a new Google Sheet.
Then insert/collect the data:
Cause | No. of times |
Child Care | 25 |
Public Transportation | 21 |
Weather | 12 |
Overslept | 8 |
Traffic | 65 |
Emergency | 4 |
Total | 135 |
Now sort the 'No. of times' column in the descending order:
To do this, I started my selection from the first
no. of times number to the last
cause.
Now, suppose,
cause is column
D and
no. of times is column
E.
Then, after the selection, on the main menu, click
Data -> Sort range by column E, Z -> A.
The result is as follows:
Cause | No. of times |
Traffic | 65 |
Child Care | 25 |
Public Transportation | 21 |
Weather | 12 |
Overslept | 8 |
Emergency | 4 |
Total | 135 |
Now calculate each cause's contribution in
% times.
To do this calculate the percentage using the total no. of times. For example, for the cause
Traffic, the percentage calculation will be: ((65*100)/135) = 48.15.
Cause | No. of times | % times |
Traffic | 65 | 48.15 |
Child Care | 25 | 18.52 |
Public Transportation | 21 | 15.56 |
Weather | 12 | 8.89 |
Overslept | 8 | 5.93 |
Emergency | 4 | 2.96 |
Total | 135 | 100.00 |
Calculate
cumulative % from % times column.
This is easily done. Just add the
% times. The last number in the
Cumulative % column should be
100.
Cause | No. of times | % times | Cumulative % |
Traffic | 65 | 48.15 | 48.15 |
Child Care | 25 | 18.52 | 66.67 |
Public Transportation | 21 | 15.56 | 82.22 |
Weather | 12 | 8.89 | 91.11 |
Overslept | 8 | 5.93 | 97.04 |
Emergency | 4 | 2.96 | 100.00 |
Total | 135 | 100.00 | |
Edit 20 Jan 2020: The values in the Cumulative % column: The first value is the same as the value in the % times column, '48.15'. The second value is '48.15 + 18.52 = 66.67', the third value is '66.67 + 15.56 = 82.22', the fourth value is '82.22 + 8.89 = 91.11', the fifth value is '91.11 + 5.93 = 97.04', and the sixth value is '97.04 + 2.96 = 100.00'.
Now, we create the chart.
Click in a
blank cell that is in a row and column that have no data. Because, if you select a cell whose row/column has some data, Sheets shows some recommended charts and you are required to then customize/modify it. And, we do not what to do this in this case.
On the main menu, click
Insert -> Chart
The
Chart Editor is shown with the Recommendations tab selected.
In the
Chart Types tab, on the left there are the chart types listed. There are a lot of chart types shown under their headings. The headings are: Line, Area, Column, Bar, Pie, Scatter, Map, and Other.
Now, on the left, there is just one box with a table like icon to its right. The table like icon is called the
Select data range icon. Click this icon.
The
Chart Editor is now hidden and a new dialog box,
What data? is shown.
Move the
What data? dialog box to a place so that you are able to view your data.
Now, select the
Cause column till the last cause (do not select the 'total' text.)
Next, in the
What data? dialog box, click
Add another range (a new input box appears in the dialog box.)
Now, select the
No. of times column till the last number (do not select the total number)
Again, in the
What data? dialog box, click
Add another range and select the
Cumulative % column till '100'.
Now, in the
What data? dialog box, click
OK.
The
Chart Editor is shown.
In the
Chart Editor, under
Line select the
Combo chart (a preview of the chart is shown to the right), and click
Insert.
The chart is inserted. Click and drag the chart to the viewable area.
Now, click in the chart, on the top right a
down arrow icon is shown. Click this
down arrow icon to reveal a menu. In the menu, click
Advanced edit. The
Chart Editor is shown with the
Customization tab enabled.
In this
Customization tab, on the right is the preview of the chart and on the left is the various options.
We have to work in the left side.
First, under
Chart, enter the
Title for the chart. I entered
Pareto Chart of Late Arrivals by Reported Cause (the same as the image I had used to source the data.)
Scroll down, you will first see the main heading named
Axis, leave this for now, we will come to this heading later. Scroll more to reveal the
Series heading.
To the right of
Series is a drop down, which is named
All Data series. Click this drop down list and select
Cumulative %.
Scroll down and select the
Square as the
Point shape.
Right axis for the
Axis.
Now, scroll up to the
Axis heading. By default, the
Horizontal axis is selected. Change its title to
Reported Causes.
Now select the
Left Vertical axis. Change its title to
No. of Times.
Now select the
Right Vertical axis. Change its title to
Cumulative %. Below this title are two input boxes for
Min and
Max. Enter
0 in
Min and
100 in
Max.
In the
Chart Editor click
Update. The chart is now shown with the changes made.
You can now keep this cart as it is or move the chart to its separate sheet. I would prefer to move the chart to its separate sheet. To do this, click in the chart, then click the
down arrow -> Move to own sheet. The chart is now moved to its separate sheet.
If you wish to show the data for the
No. of Times bars: Click in a bar, a context menu is shown. In the context menu, click
Data label -> Value. Now the numbers are shown on the bars.
The Pareto chart is kind of complete. But, if you want to present this you will require a little bit more information on the chart.
To do this, you can either save the chart as an image and do the steps in a different program or you can copy the chart, and paste it into the new/existing drawing/presentation in Google Drive.
Lets copy the chart into a presentation. Just below the main menu is the
Copy chart button, click it.
Now, open the Google Drive presentation file (or create a new presentation.)
Make the
Layout for the presentation page as
Blank. Now, paste the chart. You will be shown a dialog box that will ask you to either link the chart or to not link the chart. If you link the chart, then whenever you change the chart, it will be reflected here. Its you choice.
Resize the chart to the required size.
Now insert a square from the
Cumulative %'s 80 % to the point where the square would touch the line graph. Then enter the two texts as shown in the figure below:
So, we get the Pareto chart using Google Sheets and Google Slides/Google Drawings.