**Question 1:**

I was wondering if the total defective M&M's were included in the total total
column or if I need to add the two columns together to get the total number in the
package.

**Answer 1**:

You do not need to add the two columns together. The "total" columns
collectively account for the entire bag. The counts in the defective columns represent
subsets of the counts in the totals columns. It is good that you asked the question
-- in general it is alway important to clarify the meaning of your data, when performing
data analysis.

**Question 2**

>When you compute the sample mean for the number of candies in the package, do
you include the # of defectives in the total? Thanks.

**Answer 2:**

- The answer to your question is "it depends."

- Specifically, it depends on the question that you are trying to answer through the
analysis. Do you want to know the total number of candies in the package (the numbers
shown in the figure 6 data colums). Or, are interested only in the number of non-defective
candies (the numbers in the first 6 columns minus the numbers of defective candies in the
remaining columns). The defective candies are still edible, so as a consumer it might not
matter. If you were a quality control person, however, you might want to focus on the
number of defective candies.

- This is for you to decide (and explain in your report). In this case, there is no right
answer - only justified and explained ones.

**Question 3:**

I got the data into excel following the directions on the web, but I don't know
how to make it perform mean calculations and standard deviations. Could you outline Excel
usage on the web page please.

**Answer 3**

Here are some tips:

- Excel can help you with the steps. Follow these steps: 1. Put your cursor in a
particular cell, 2. Select "insert function" from the tools menu. 3. Choose the
function you want (most of the ones we use are under the category statistical. However you
might want to scroll through the list and see what is there). 4. Follow the directions
excel provides for specifying the parameters of the function. 5. Press return -- you
should see the results of your function call.
- Once you have completed this -- you might want to put the cursor back in the cell and
look at the contents of the cell as listed at the top of the excel window. You will see
the results of the function call in the cell. You will see the function call, at the top.
Once you get the hang of it, you can simply type the function call directly - rather than
going through the "Insert function" process.
- For example, assume that you pasted the function "median" into cell a11, with
the median referring to the data in cell range a1..a10. In cell a11, you will see the
result of the median function -- the median of your ten datapoints. If you put your cursor
in cell a11, and look to the top of the excel screen - you will see how the function call
is formatted. Specifically, it will look like "=median(a1..a1)."

**Question 4**

There's an obvious inconsistency in the M&M data for the total number of candy in
the packages, line 70..so do we continue to use this value when graphing the histogram? (
i don't know what values to use for the high and low to look for the number of bins) the
141 (the inconsistency) kind of influences the bin number drastically. Could you let me
know what i should be doing? Thanks,

**Answer 4**

What do you think you should do? Think back to the discussion of the heights -- what to
do with the 80", 90" and 999". One is clearly not possible and probably
should be thrown out. The other two are possible -- although not probable. There are
measures that you can use that are not sensitive to the extreme values.

As for specificaly the histogram, you can determine the number of bins without the
large value and then hand plot the large value.

In general, this is the type of judgment call that you will need to make as a
practioner doing statistics with a dataset. Use your best judgment.

**Question 5**

Where can i find the how to graph the histogram on excel? Thanks.

**Answer 5**

There are two ways to create the histogram chart:

1. Starting from the data.

Excel has a histogram function within its "data analysis" functionality.
"Data analysis" should appear under the tools menu -- if not, you willneed to
get them installed. Either read the help (installation typically does not require disks)
or ask a technical support person.

Go to data analysis, then select histogram. You will be prompted for the range of input
data and the bin range. You create the bin range by creating a column of numbers
representing the endpoints of the bins. For example, if you wanted to have three bins as
follows 0 to 1, 1 to 2, and 2 to 3, you would create a column of numbers with 0,1,2, and
3. You would then provide the locations of these numbers as the bin range (e.g., b1..b4).

Once you have provided the histogram function with the data range and the bin range, if
you press enter - you will probably just get a frequency distribution. To get a histogram,
you need to check the box associated with "chart output" on the histogram
screen. If the chart output option is selected, you should get both a frequency
distribution and a histogram.

2. Starting from frequency distribution (table of ranges and number of data points in
range):

If you already have the frequency distribution and need to create a histogram, you can
use the graphing features of Excel. Under the "insert" menu is the option
"chart". You want to insert a column chart. Excel should walk you through the
process of setting up the chart.

**Question 6:**

How should I work with the excel graphs and figures when creating the
final report? Do I need to integrate them into the text? Can I include them in
an appendix?

**Answer 6**

It would be terrific to get reports that integrate the text and the
figures (histograms, stem and leaf diagrams, box plots, etc.). However,
incorporating the figures and tables into the text can be time consuming.

An alternative, easier, and acceptable solution is to include all the
figures and tables in an appendix and reference them from the text. If this approach
is taken, then the figures and tables should be clearly marked with "Figure
x...." or "Table y. ....". Thus, the figures and tables can be
(and should be) referenced in the report text.

**Question 7**

I have a question regarding calculation of whisker values for box plots. The book
indicates the calculation as follows:

lower whisker = q1-1.5IQR with the whisker extending to the smallest data point within
that range.

upper whisker = q3+1.5IQR with the whisker extending to the largest data point within that
range.

From class I understood that

lower whisker = median-1.5IQR with the whisker extending to the smallest data point
within that range.

upper whisker = median-1.5IQR with the whisker extending to the largest data point within
that range.

Which is correct??

**Answer 7**

Good question! The answer is that you are right (and I was mistaken). The
equations you have above for lower and upper whisker boundaries (using q1 and q3
respectively rather than q2) are correct. This was a mistake in the lecture.

If you have created box plots using the other equations, please do not feel a need to
change them. They will be considered correct if completed using the former equations.

**Question 8**

I was wondering what is meant by "one color information" in "graphical
description of total numbers and one color information". Thanks.

**Answer 8**

It means that there should be a graphical description of the total number of candies
and then also a graphical description of either blue or red or yellow or green or brown.
Logically, this would be best if the graphical description were either of the blue (since
you were required to analyze it) or the other color you analyzed.