Friday, July 22, 2016

How to Create a Pareto Chart in Google Sheets

Lets try out to plot a Pareto Chart.in Google Sheets.

Why? Well see the below text for Pareto Chart from wikipedia:
A Pareto chart, named after Vilfredo Pareto, is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line.
The left vertical axis is the frequency of occurrence, but it can alternatively represent cost or another important unit of measure. The right vertical axis is the cumulative percentage of the total number of occurrences, total cost, or total of the particular unit of measure. Because the reasons are in decreasing order, the cumulative function is a concave function. To take the example below, in order to lower the amount of late arrivals by 78%, it is sufficient to solve the first three issues.
The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors. In quality control, it often represents the most common sources of defects, the highest occurring type of defect, or the most frequent reasons for customer complaints, and so on. Wilkinson (2006) devised an algorithm for producing statistically based acceptance limits (similar to confidence intervals) for each bar in the Pareto chart.
These charts can be generated by simple spreadsheet programs, such as Apache OpenOffice/LibreOffice Calc [1] and Microsoft Excel,[2] visualization tools such as Tableau Software,[3] specialized statistical software tools, and online quality charts generators.
The Pareto chart is one of the seven basic tools of quality control.[4]
See, no mention of Google Sheets in the above description. The author(s) has mentioned simple spreadsheet programs, and then goes on the specifically mention the spreadsheet programs. Anyways, we can also use many other spreadsheet programs, but why not try with something every Google user has access to.

I will take inspiration from the sample data provided in this chart:
https://en.wikipedia.org/wiki/File:Pareto.PNG
The sample data is a hypothetical data outlining the causes of employees coming late to office.

First, create a new Google Sheet.

Then insert/collect the data:

CauseNo. of times
Child Care25
Public Transportation21
Weather12
Overslept8
Traffic65
Emergency4
Total135

Now sort the 'No. of times' column in the descending order:
To do this, I started my selection from the first no. of times number to the last cause.
Now, suppose, cause is column D and no. of times is column E.
Then, after the selection, on the main menu, click Data -> Sort range by column E, Z -> A.
The result is as follows:

CauseNo. of times
Traffic65
Child Care25
Public Transportation21
Weather12
Overslept8
Emergency4
Total135

Now calculate each cause's contribution in % times.
To do this calculate the percentage using the total no. of times. For example, for the cause Traffic, the percentage calculation will be: ((65*100)/135) = 48.15.

CauseNo. of times% times
Traffic6548.15
Child Care2518.52
Public Transportation2115.56
Weather128.89
Overslept85.93
Emergency42.96
Total135100.00

Calculate cumulative % from % times column.
This is easily done. Just add the % times. The last number in the Cumulative % column should be 100.

CauseNo. of times% timesCumulative %
Traffic6548.1548.15
Child Care2518.5266.67
Public Transportation2115.5682.22
Weather128.8991.11
Overslept85.9397.04
Emergency42.96100.00
Total135100.00

Edit 20 Jan 2020: The values in the Cumulative % column: The first value is the same as the value in the % times column, '48.15'. The second value is '48.15 + 18.52 = 66.67', the third value is '66.67 + 15.56 = 82.22', the fourth value is '82.22 + 8.89 = 91.11', the fifth value is '91.11 + 5.93 = 97.04', and the sixth value is '97.04 + 2.96 = 100.00'.

Now, we create the chart.

Click in a blank cell that is in a row and column that have no data. Because, if you select a cell whose row/column has some data, Sheets shows some recommended charts and you are required to then customize/modify it. And, we do not what to do this in this case.
On the main menu, click Insert -> Chart
The Chart Editor is shown with the Recommendations tab selected.

In the Chart Types tab, on the left there are the chart types listed. There are a lot of chart types shown under their headings. The headings are: Line, Area, Column, Bar, Pie, Scatter, Map, and Other.

Now, on the left, there is just one box with a table like icon to its right. The table like icon is called the Select data range icon. Click this icon.
The Chart Editor is now hidden and a new dialog box, What data? is shown.
Move the What data? dialog box to a place so that you are able to view your data.
Now, select the Cause column till the last cause (do not select the 'total' text.)
Next, in the What data? dialog box, click Add another range (a new input box appears in the dialog box.)
Now, select the No. of times column till the last number (do not select the total number)
Again, in the What data? dialog box, click Add another range and select the Cumulative % column till '100'.
Now, in the What data? dialog box, click OK.
The Chart Editor is shown.

In the Chart Editor, under Line select the Combo chart (a preview of the chart is shown to the right), and click Insert.
The chart is inserted. Click and drag the chart to the viewable area.

Now, click in the chart, on the top right a down arrow icon is shown. Click this down arrow icon to reveal a menu. In the menu, click Advanced edit. The Chart Editor is shown with the Customization tab enabled.
In this Customization tab, on the right is the preview of the chart and on the left is the various options.
We have to work in the left side.
First, under Chart, enter the Title for the chart. I entered Pareto Chart of Late Arrivals by Reported Cause (the same as the image I had used to source the data.)
Scroll down, you will first see the main heading named Axis, leave this for now, we will come to this heading later. Scroll more to reveal the Series heading.

To the right of Series is a drop down, which is named All Data series. Click this drop down list and select Cumulative %.
Scroll down and select the Square as the Point shape.
Right axis for the Axis.

Now, scroll up to the Axis heading. By default, the Horizontal axis is selected. Change its title to Reported Causes.
Now select the Left Vertical axis. Change its title to No. of Times.
Now select the Right Vertical axis. Change its title to Cumulative %. Below this title are two input boxes for Min and Max. Enter 0 in Min and 100 in Max.

In the Chart Editor click Update. The chart is now shown with the changes made.
You can now keep this cart as it is or move the chart to its separate sheet. I would prefer to move the chart to its separate sheet. To do this, click in the chart, then click the down arrow -> Move to own sheet. The chart is now moved to its separate sheet.

If you wish to show the data for the No. of Times bars: Click in a bar, a context menu is shown. In the context menu, click Data label -> Value. Now the numbers are shown on the bars.

The Pareto chart is kind of complete. But, if you want to present this you will require a little bit more information on the chart.
To do this, you can either save the chart as an image and do the steps in a different program or you can copy the chart, and paste it into the new/existing drawing/presentation in Google Drive.
Lets copy the chart into a presentation. Just below the main menu is the Copy chart button, click it.
Now, open the Google Drive presentation file (or create a new presentation.)
Make the Layout for the presentation page as Blank. Now, paste the chart. You will be shown a dialog box that will ask you to either link the chart or to not link the chart. If you link the chart, then whenever you change the chart, it will be reflected here. Its you choice.
Resize the chart to the required size.
Now insert a square from the Cumulative %'s 80 % to the point where the square would touch the line graph. Then enter the two texts as shown in the figure below:
So, we get the Pareto chart using Google Sheets and Google Slides/Google Drawings.

3 comments:

  1. thank you!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <3333333333 you saved my behind

    ReplyDelete
  2. Replies
    1. Added the text on the values in the 'Cumulative %' column. I hope that this will make the calculation clearer.

      Delete