Excel Formula Help


Dravin
 Share

Recommended Posts

Okay, I've got a column with a bunch of values in it and I'm trying to figure out how what percentage of them are within one standard deviation of the mean.

Now all of my values are in column A and the only non-text cells contain relevant data.

I am trying to do this

=countif(A:A, ABS(A:A-AVERAGE(A:A))<STDEV.P(A:A))

Now that should count all cells in the range which meet the criteria of the absolute value of it's value minus the mean is less than the standard deviation. Evaluating the formula has it working as expected and the criteria is calculating correctly, but for some reason it's returning a value of zero. Am I simply expecting to much out of poor countif's criteria function?

If I can get that to work then calculating the percentage is a simple /count(A:A) and formatting the cell to display as a percentage. Anyone know why my formula is broken? And what I can do to fix it or otherwise accomplish my goal?

Edited by Dravin
Link to comment
Share on other sites

Hidden

Now, if I'm understanding this correctly, this is what you are trying to do:

You need one cell to do this: =Avg(A:A) then another cell do this: =STDEVP(A:A)

In the B column, put this: =and(abs(x-y)<=1) on each cell opposite of the values. This logic function this will give you a TRUE or FALSE. Now you can use =COUNTIF(B:B,"=TRUE) in another cell.

I figure it would look like this:

Posted Image

I'm probably way off because I have yet to take a statistics class but I hope it gives you the idea. With Excel, you need to compartmentalize then run your comparisons.

Link to comment

I was afraid I'd have to work it out, what good are formulas if you have to stop and take breaks? *sigh* I ended up just running a column near the original data that calculates the absolute z-score (=ABS(cell-AVERAGE(A:A)/STDEV.P(A:A))) and then did a countif on that. Then to calculate the percentage it was just a simple =countif(C:C,"<1")/count(A:A) to get my percentage (and then repeated it but with <2 and <3 to answer the other two questions concerning the percentage of values within 2 standard deviations and 3 standard deviations).

I wish I could have done it without the intermediary cells though, feels messy for some reason. I feel the same way a couple graphs I have to do, I just wish I could work it from the raw column without intermediaries.

Edited by Dravin
Link to comment
Share on other sites

I think it might be. I only need one cell per value (so a column) to give me the value's absolute z-score using the formula in a prior post. Using that score with countif/count gives me the percentages I need for what lays inside the stated standard deviation ranges. I suppose my other option would have been to use FREQUENCY and Bins situated at intervals of the standard deviation (and then just add the appropriate cells/count). I tried that but I'm still shaky on bins and don't trust myself to create ones of the appropriate interval.

Edited by Dravin
Link to comment
Share on other sites

Guest mormonmusic

I didn't read this in detail, but it sounds like you're trying to create a simple relative frequency table, where you assign values to intervals and calculate the percent falling into each interval.

I would look up the HISTOGRAM function in Excel's Analysis ToolPak. If you don't know how to get the Analysis TookPak functioning in Excel, let me know.

After loading the Analysis TookPak into Excel, you can go to Data -- Data Analysis, and select the Histogram option.

You highlight a given set of ranges you want the data to count into (those are the bins, or the BIN range) and then also highlight the column of data you want counted into those bins. You also specify an OUTPUT range which is simply where you want your table to appear after you execute the command.

Execute the function and bingo, you have the frequencies of data that fall into each "bin". Then you create another column where you convert those frequencies into percentages of the total number of frequencies found in each bin using a manual formula.

I think the bins could be autocalculated in the cells using the standard deviation and mean formulas. Hopefully that's enough for you to visualize and that I'm answering your question.

If this is something you have to do repetitvely, then you could make the whole thing work by using a Macro that is attached to a button on your Excel spreadsheet. I actually created a spreadsheet to post here, but I couldn't see an easy way of autoloading the screen shot of Excel spreadsheet into this textbox, like you did above. Maybe you could tell me how to do that and then I could post my solution for you....

Edited by mormonmusic
Link to comment
Share on other sites

The R Project for Statistical Computing

Do statistics like a real man.

set.seed(1)

sampleData <- rnorm(50, 10, 3) # 50 values from a normal distribution with mean=10, sd=3

mu <- mean(sampleData)

sigma <- sd(sampleData)

sum(abs(sampleData - mu)/sigma <= 1)

38 of the values in sampleData are within 1 standard deviation of the mean.

Link to comment
Share on other sites

The R Project for Statistical Computing

Do statistics like a real man.

Except in the world of statistics I am just a lad :(. One thing is this is for a Data Analysis using Excel class. So I'm limited to the abilities of Excel. I note that while I couldn't do it as one step I essentially did your bottom formula in two steps:

=ABS(cell-AVERAGE(A:A)/STDEV.P(A:A)) is equivalent to abs(sampleData - mu)/sigma

and the second step is a little different though:

=countif(C:C,"<1") is similar to the rest of your formula except it's counting not summing. Does the program you linked to use sum as a counting function?

Interestingly looking over my first formula it is equivalent to your bottom formula minus the counting versus summing (and I used an open interval), you just divide by sigma when I didn't but ABS(value-Average(values)<STDEV.P(values)) = ABS(value-AVERAGE(values))/STDEV.P(values)<1 = ABS(sampleDATA - mu)/sigma <1 Nice to know my brain was in the right place. To bad Excel is a wuss about criteria.

Oooh, a question though. If you were asked about within one standard deviation would you do a closed or open interval? The way I think says open, but I note your formula uses a closed interval and so I'm unsure if my thinking just isn't statistical enough and thus flawed?

I would look up the HISTOGRAM function in Excel's Analysis ToolPak.

I took a quick peak at it but I didn't see how I could get it to immediately output as counts of values within standard deviation intervals from the mean. It was late so I didn't fiddle long. I'm retentive enough that I am going to go back in and fiddle with it to see if there is a more direct way than I did it, I'll take another look at the histogram. The extra column of data I had to create just feels messy.

Link to comment
Share on other sites

Guest mormonmusic

I took a quick peak at it but I didn't see how I could get it to immediately output as counts of values within standard deviation intervals from the mean. It was late so I didn't fiddle long. I'm retentive enough that I am going to go back in and fiddle with it to see if there is a more direct way than I did it, I'll take another look at the histogram. The extra column of data I had to create just feels messy.

YOu have to give the intevals you want.

So, if the mean is 100 and the S.D. is 15,you would put in a column something like this:

45

60

85

100

115

130

145

Then, go to the Histogram function in the Analysis Tookpak. Highlight the 45,60,85,100,115,130 and 145 as the bin range, all your data as the input range, and pick a cell in your spreadsheet as the output range. Hit the OK button and your count is done.

You will get counts in the other bin ranges; and I know you want only 1 standard deviation. I think if you remove the 45, 60, 13, and 145 from the bin range, you get another category for a count of all the values that didn't fit into the 1 standard deviation range..

It should be pretty easy; done it a million times.

how did you get the screen shot of excel into this text box in your earlier post. I would like to post my solution.

Link to comment
Share on other sites

I managed to use the histogram function to give me a frequency count (though couldn't I have just used =FREQUENCY()?). I created the bins by means of AVERAGE-X*STDEV.P and AVERAGE+X*STDEV.P where X was how many steps from the mean. Meant that for 1 standard deviation they were all in one bin, for more steps than that I ended up having to add high and low, but no worries.

Edit: Yay for not having superfluous columns the same size as your data cluttering up the worksheet! I can post a screenshot if anyone is curious, though I doubt anyone is.

Edited by Dravin
Link to comment
Share on other sites

Except in the world of statistics I am just a lad :(. One thing is this is for a Data Analysis using Excel class. So I'm limited to the abilities of Excel.

I know you can't control this, but limiting one's data analysis to the abilities of Excel is like limiting space flight to the abilities of a tape worm.

I note that while I couldn't do it as one step I essentially did your bottom formula in two steps:

=ABS(cell-AVERAGE(A:A)/STDEV.P(A:A)) is equivalent to abs(sampleData - mu)/sigma

and the second step is a little different though:

=countif(C:C,"<1") is similar to the rest of your formula except it's counting not summing. Does the program you linked to use sum as a counting function?

Actually, the two programs are doing it in a very similar manner. COUNTIF will return a vector of TRUE or FALSE values and only count the ones that return TRUE.

In R, the sum() function is being applied to a vector of logicals, which have internal values of 0=FALSE, and 1=TRUE.

Interestingly looking over my first formula it is equivalent to your bottom formula minus the counting versus summing (and I used an open interval), you just divide by sigma when I didn't but ABS(value-Average(values)<STDEV.P(values)) = ABS(value-AVERAGE(values))/STDEV.P(values)<1 = ABS(sampleDATA - mu)/sigma <1 Nice to know my brain was in the right place. To bad Excel is a wuss about criteria.

About the only thing I ever use Excel for these days is to view CSV's. I rarely use actual xls files.

Oooh, a question though. If you were asked about within one standard deviation would you do a closed or open interval? The way I think says open, but I note your formula uses a closed interval and so I'm unsure if my thinking just isn't statistical enough and thus flawed?

This question is simultaneously profound and silly. Statisticians understand that the probability of a random variable being exactly equal to 1 standard deviation above the mean is 0. Specifically, the integral of a function from a to a is a line, which has no area. So technically speaking, it doesn't matter because if you add the closed interval, you add 0; if you remove the closed interval, you subtract 0.

Practically speaking, this doesn't work out so nicely (we humans like to round things), and so as a general rule, when I hear the word "within" I apply a closed interval. I do this to be consistent with the decision rule in hypothesis testing. The critical value is the value at which the area beyond the critical value is 0.05 (or some such number). So we reject the null hypothesis when the test statistic is greater than or equal to the critical value. I think most of my colleagues employ a similar logic.

But to be safe, I'd ask your instructor, since he or she is grading you. But you can rest assured, there is no theoretical difference and it only becomes an issue if you're using crap data.

Link to comment
Share on other sites

Guest mormonmusic

I took a quick peak at it but I didn't see how I could get it to immediately output as counts of values within standard deviation intervals from the mean. It was late so I didn't fiddle long. I'm retentive enough that I am going to go back in and fiddle with it to see if there is a more direct way than I did it, I'll take another look at the histogram. The extra column of data I had to create just feels messy.

YOu have to give the intevals you want.

So, if the mean is 100 and the S.D. is 15,you would put in a column something like this:

45

60

85

100

115

130

145

Then, go to the Histogram function in the Analysis Tookpak. Highlight the 45,60,85,100,115,130 and 145 as the bin range, all your data as the input range, and pick a cell in your spreadsheet as the output range. Hit the OK button and your count is done.

You will get counts in the other bin ranges; and I know you want only 1 standard deviation. I think if you remove the 45, 60, 13, and 145 from the bin range, you get another category for a count of all the values that didn't fit into the 1 standard deviation range..

It should be pretty easy; done it a million times.

how did you get the screen shot of excel into this text box in your earlier post? I would like to post my solution. I tried cutting and pasting, and also importing it use the picture button on the formatting toolbar -- the formatting button was looking for a URL to a website...My graphic is an icon on my computer.

Link to comment
Share on other sites

I know you can't control this, but limiting one's data analysis to the abilities of Excel is like limiting space flight to the abilities of a tape worm.

If it makes you feel any better I'll be taking a full blown statistics course next fall. :D

Link to comment
Share on other sites

how did you get the screen shot of excel into this text box in your earlier post? I would like to post my solution. I tried cutting and pasting, and also importing it use the picture button on the formatting toolbar -- the formatting button was looking for a URL to a website...My graphic is an icon on my computer.

Helpful tip in case you don't know it, but Alt + Printscreen gives you a screenshot of the active window, it saves you the step of cutting. As far as posting it to the boards, I loaded it up onto Picasa (I have an album of just such images) and picasa lets you link directly to an image in your album, then I just paste it and wrap it with the image tags. You can use a different image hosting service to the same effect as long as it lets you link directly to an image (instead of only to pages containing the image, though sometimes you can work around that by just copying the image URL).

My solution:

Posted Image

Versus what I was doing:

Posted Image

The percentages are just adding the appropriate cells for the bins and dividing by the total count. It's a lot neater than having a twin column to the original data. Once again I appreciate suggesting using a Histogram to get my source data. I note I was making an error previously with my percentile calculation for question 8, but it's corrected in the newest one (And I just noticed I screwed up question 9 in the new one... fixing).

Edited by Dravin
Link to comment
Share on other sites

Guest mormonmusic

I know how to CTRL-Alt-Ptrscrn -- I do it all the time. I guess I need to simply post the image on the web somewhere, like on a free bloggin service I have, and then point to it -- that's where things got screwed up for me.

Glad you got the histogram thing working...now you know what I do all day for a living that keeps me ducking into LDS.net for a break...

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share