Understanding Negative Binomial Distribution with Excel...

in this post we are going to understand the Negative Binomial Distribution. According to Wikipedia:

"In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number of failures (denoted r) occur. For example, if we define a "1" as failure, and all non-"1"s as successes, and we throw a die repeatedly until the third time “1” appears (r = three failures), then the probability distribution of the number of non-“1”s that had appeared will be negative binomial."

The distribution can be defined by the following equation:

$$*b(x,k,p)=p^k\cdot q^{x-k}\binom{x-1}{k-1}$$ 

Here $p$ is the prbability of sucess, $k$ is the kth trial and $x$ is the number of trails.

The Excel Function that can be used for calculation of Negative Binomial Distribution is NEGBINOMDIST() that has following syntex (from the Built-in help you can find more on it!)

NEGBINOMDIST(number_f,number_s,probability_s)

..where Number_f   is the number of failures, Number_s   is the threshold number of successes, Probability_s   is the probability of a success.

Now we will take an example as-usual and will see how the manual calculation for this distribution is performed and then will move to the use of alternate excel function.

Example:

In an NBA championship series, the teams which wins four games out of seven will be the winner. Suppose that team A has probability of 0.55 of winning over the team B and both team A & B face each other in the Championship games: 

(a) What is the probability that team A will win the series in Six Games?

(b) What is the probability that team A will win the series?

(c) If both teams faces each other in a regional playoff series and the winner is decided by winning three out o f the five games, what is the probability that team A will win a Playoff?

Solution: Using the negative binomial distribution with $x=6$, $k=4$, and $p=0.55$ we get:

For (a) $*b(6,4,0.55)=\binom{5}{3}(0.55)^4(1-0.55)^2 =0.1853$

For (b) *b(4,4,0.55)+*b(5,4,0.55)+*b(6,4,0.55)+*b(7,4,0.55)

Skipping the mathematical details we can conclude it to:

$=0.0915+0.1647+0.1853+0.1668=0.6083$

For (c) *b(3,3,0.55)+*b(4,3,0.55)+*b(5,3,0.55)=0.5931

Skipping the mathematical details we can conclude it to:

$=0.1664+0.2246+0.2021=0.5931$

Alternate Solution:

Here we will see how these solutions can be obtained using NEGBINOMDIST() function:


(a) =Negbinomdist(2,4,0.55)=0.1853

(b) The probability that team A wins the series is the sum of the probabilities that matches it lost will be between zero to three (if it looses fourth match it can't win the series so the formula would be:

=Negbinomdist(3,4,0.55)+Negbinomdist(2,4,0.55)+Negbinomdist(1,4,0.55)
+Negbinomdist(0,4,0.55)

=0.1667+0.1853+0.1647+0.0915=0.6083

(c) Just like (b) the probability of a playoff for Team A is that is loosses match between 0 & 2 so the the formula will be:

=Negbinomdist(2,3,0.55)+Negbinomdist(1,3,0.55)+Negbinomdist(0,3,0.55)

=0.2021+0.2246+0.1664=0.5931 Ans.

So, that was all from me for this post, i hope you have understood the usage of Excel's built in function to facilitate the calculations. Please keep reading and give feedback to improve the block!! Take care!!

Understanding Binomial Distribution with Excel 2007

In this post we will try to understand how the Excel built-in Binomial distribution function can be used for calculating Binomial probabilities. Traditionally a binomial random variable is described by the following formula: 
$$b(x,n,p)=\binom{n}{x}p^x q^{n-x}$$

Where $p$ is the probability of success and $q$ is the probability of failure and can be related by the relation $p+q=1$, $n$ is the number of trails and $x$ is the value at which we want to evaluate the binomial probability .

The Excel 2007 has a built-in function of BINOMDIST() that takes arguments of $x,n,p$ and logical arguments for cumulative or point density estimate and return the value of Binomial Variable.

The Syntex of the function is :BINOMDIST(number_s,trials,probability_s,cum) 

Now we will see how the calculation that are done manually can be skipped if you use BINOMDIST() function, lets consider this example:

Find the probability of obtaining exactly three 2's if an ordinary dice is tossed 5 times.

Solution: The probability of success is this case is $p=\frac{1}{6},n=5,x=3$ gives:
$$b(3,5,\frac{1}{6})=\binom{5}{3}(\frac{1}{6})^{3}(\frac{5}{6})^{2}=\frac{5!}{3!2!}\cdot\frac{5!}{6!} = 0.032$$
Using Excel: The same can be calculated by  the formula: 

=Binomdist(3,5,0.1667,False)=0.032

Example # 02: The probabilities that a patient recovers from a rare blood disease is 0.4. if 15 people are known to have contracted this disease, what is the chance that (a) at lease 10 will survive (b) from 3 to 8 will survive and (c) exactly 5 will survive?

Solution: 

(a) For Probability $that at lease 10 will survive$:

$$P(x\geq 10) = 1-P(x\leq 10) = 1 - \sum_{x=0}^{9}b(x,15,0.4) = 1-0.9662 = 0.0338$$
(b) For Probability $that from 3 to 8 will survive$
$$P(3\leq x\leq8) = \sum_{x=3}^{8}b(x,15,0.4) = \sum_{x=0}^{8}b(x,15,0.4)-\sum_{x=2}^{8}b(x,15,0.4)$$
$$=0.9050-0.0271 = 0.8779$$

(c) For Probability $that exactly 5 will survive$
$$P(x = 5) = b(x,15,0.4) = \sum_{x=0}^{8}b(x,15,0.4)-\sum_{x=0}^{4}b(x,15,0.4)$$
$$=0.4032-0.02173 = 0.1859$$

Using Excel:

(a) =1-Binomdist(9,15,0.4,TRUE) = 0.0338
(b) =Binomdist(8,15,0.4,TRUE)-Binomdist(2,15,0.4,TRUE) = 0.8779
(c) =Binomdist(5,15,0.4,TRUE)-Binomdist(4,15,0.4,TRUE) = 0.1859

hence we can see how easily we can estimate the binomial probabilities using Excel 2007, I hope that you will like this post. Please give your feedback to improve the blog.

Normal Distribution with Excel - Applications & Examples

In my last post I discussed how to use the normal distribution function already added in MS Excel. In this post I will continue with the examples and we will how we use them. In this post i will not discuss the manual solution of the problem but only the Excel Based solution.

The following problems are taken from Introduction to Statistics By Walpole, 13th Edition Chapter 07, Page 197-200.


I wil elaborate how you can use excel formula to calculate the probabilites.

4. A soft-drink machine is regulated so that it discharges an average of 200 milliliters per cup. If the amount of drink is normally distributed with a standard deviation equal to 15 milliliters: 


(a) What fraction of the cups will contain more than 224 milliliters?
(b) What is the probability that a cup contains between 191 and 209 milliliters?
(c) How many cups will likely overflow if 230-mililiters cups are used for the next 1000 drinks.
(d) Below what value do we get the smallest 25% of the drinks?
 


Solution:

Here $\mu$ = 200 mililiters and $\sigma$ = 15 mililiters

(a) =$1-Normdist(224,200,15,1) = 0.0548
(b) =
Normdist(209,200,15,1)-Normdist(191,200,15,1) = 0.451
(c) =(1-
Normdist(230,200,15,1))*1000 = 22.75
(d) =Norminv(0.25,200,15) = 189.89

9. The heights of 1000 students are normally distributed with a mean of 174.5 cm and a std. deviation of 6.9 cm. Assuming that he heights recorded are recorded to the nearest half of a centimeter, how many of these students would you expect to have heights,

(a) Less than 160.0 cm?
(b) Between 171.5 and 182.0 cm inclusive?
(c) Equal to 175.0 cm?
(d) Greater than or equal to 188.0 cm?


Solution:

Here $\mu$ = 174.5 cm mililiters and $\sigma$ = 6.9 cm

(a) =1000*
Normdist(160,174.5,6.9,1)
(b) =(
Normdist(182.5-0.5,174.5,6.9,1)-Normdist(171.5-0.5,174.5,6.9,1))*1000
(c) =(
Normdist(175+0.5,174.5,6.9,1)-Normdist(175-0.5,174.5,6.9,1))*1000
(d) =(1-
Normdist(188-0.5,174.5,6.9,1))*1000

Note: In (b), (c) and (d) we substracted and added 0.5 from values of $x$ to make them inclusive. We are converting point estimate into an intervel estimate through this operation.

16. The average life of a certain type of small motor in 10 years, with a stad. Deviation of 02 years. The manufacturer replaces free all motors that fail while under guarantee. If he is willing to replace only 3% of the motors that fail, how long a guarantee should he offer? Assume that the lives of the motors follow a normal distribution.


Solution:

Here $\mu$ = 10 years and $\sigma$ = 2 years and $\phi$ = 0.03 

=Norminv(0.03,10,2) = 6.238 years

The statistical concepts are self-explanatory. I hope that you are finding these posts helpful certain way.  Hope to listen your feedback soon. Take Care

Understanding Normal Distribution with Excel 2007

Normal Distribution is most commonly used distribution. In this post we try to understand how the manual calculation of Normal distribution problems can be solved using Excel's statistical functions. Excel has following four function that are related to the distribution. 

The Normal Distribution is defined by the following equation.

$$f(x)=\frac{1}{\sigma\sqrt2\pi}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
 

Excel 2007 Provide us with following four function related to this distribution:

1. NORMSDIST()
2. NORMSINV()
3. NORMDIST()
4. NORMINV()

Firstly we will understand what each of four stand for:

1. NORMSDIST()

This function can be used to find the value of the Z-Variable as we normally see in the Table of Z-Values. For example the value of area under the Z-Cuve for Z=0.51 is 0.6950. The Same can be found using NORMSDIST(0.51). This assume mean (
$\mu$) of zero and standard deviation ($\sigma$) of 1.

2. NORMSINV()

This function is used to find the Z-Value when we have area available to us. we can try with the value of area find in the above paragraph like NORMSINV(0.6950) that will be evaluated to 0.51. This assume mean of zero and standard deviation of 1.

3. NORMDIST()

This function takes the value of mean, st. deviation and an argument for cumulative or mass density function. If you put in one it will calculate cmulative probability and otherwise will give mass density function. 



4. NORMINV()

This function return the value of Z when you have area of the curve given in the data.

Example of Usage:

Following example elaborate the use of function. For $\mu$ = 50, $\sigma$ = 10 find the probability that x will lie between 45 and 62.

Using manual calculation we will do this:
$$Z_1=\frac{x-\mu}\sigma =  \frac{45-50}{10}=-0.5$$ and 


$$Z_2=\frac{x-\mu}\sigma =  \frac{62-50}{10}=1.2$$
Therefore the probability is ...

$$P(45<x<62)=P(-0.5<z<1.2)$$
$$=P(z<1.2)-P(z<-0.5)=0.8849-0.3085=0.5764$$
Now Using Excel 2007 formulas:
 

$$P(45<x<62)=Normdist(62,50,10,1)-Normdist(45,50,10,1)=0.5764$$
Example: Give a normal distribution with $\mu$ = 300 and $\sigma$ = 50 find the probability that $x$ assumes a value greater then 362.

Manual Working:$$Z=\frac{x-\mu}\sigma =  \frac{362-300}{50}=1.24$$ and 


$$P(x>362) = P(z>1.24) = 1-P(z>1.24)= 1-0.8925 = 0.1075$$
The Same result can be obtained using $=1-Normdist(362,300,50,1)$
 

Example No. 02 Given a normal distribution of $\mu$ = 40 and $\sigma$ = 6 find the value of x that has 45% of area below it.

The usual manual calculation for this will involve looking up value of $\phi =0.45$ in the Z-Table and substituting it in formula :$$-0.13=\frac{x-40}{6}=-0.13\times6+40\Rightarrow x=39.22$$ In this question we have been given area of the curve and we have been asked about the Z-Value itself. We will use the last function of the series that is Norminv() to calculate the value
$$P(x<Z)=0.45=Norminv(0.45,40,6)=39.24603$$This ends the tutorial for Normal Distribution, I hope that you will like it. Please give feedback to Improve the blog.

Thanks.

Understanding Hyper-geometric Distribution with excel:

Assume that a box contains 10 Plugs of which 6 are good and the rest defective. An operator picks 5 Plugs at random from the 10, and is interested in the number of good Plugs picked. 

Let X denote the number of good Plugs picked. If in a Population of size N contains S successes and (N - S ) failures, and a random sample of size n is drawn from the pool, the number of successes X in the sample follows then Hyper-geomatirc Random Variable can be defined as:

$$Hypergeomatric (x,N,n,k) = \frac{\binom{k}{x}\binom{N-k}{n-x}}{\binom{N}{n}}$$

Lets understand it with an example, for the case described in above paragraph, where N = 10, S = 6 , n = 5 , x = 2.

The Hyper-geomatric Distribution can be use to calculate the chances of getting 02 defective plugs in the sample by this:
 

$$Hypergeomatric (x,N,n,k) = \frac{\binom{5}{2}\binom{10-5}{6-2}}{\binom{10}{6}}$$ 
..This will give you exact probability of 02 Plugs.

For at most 02 Plugs i.e. at max 02 plugs will be defected:

$$Hypergeomatric (x,N,n,k) = \sum_{x=0}^{x=2}\frac{\binom{k}{x}\binom{N-k}{n-x}}{\binom{N}{n}}$$ 
 For atleast 02 Plugs i.e. at max 02 plugs will be defected:
 

$$Hypergeomatric (x,N,n,k) = 1-\sum_{x=0}^{x=2}\frac{\binom{k}{x}\binom{N-k}{n-x}}{\binom{N}{n}}$$

All this can easily be computed using
HYPGEOMDIST function of Excel. This takes follwoing arguments:   

HYPGEOMDIST(sample_s,number_sample,population_s,number_population)


Now using the Excel's Hypergeomatric Function the computation is easily done. Substituteing Values will give you the same result as you have calculated through the conventional formulas.

Hope you like this post!!

Adding Mathametical Expression with Latex

Alhumdolillah, it is quite easy now to write mathematical equations on this blog. I have successfully added ability to write Mathematical expression on the post body as well as the comments of this blog. I have also added a Tool to create equation readily and then use those codes.

Procedure:

The Procedure is simply:

1. On the Top side of the blog you can see "Create Equation & Equation Launcher.

2. Create Equation using the Symbols and you can redialy see the results there as well.

3. Copy the code (Press Ctrl+A & Ctrl+C)

4. Paste it in you comments here you should note two things:

a. If you want to place the codes within text or sentence you should enclose it within dollar sign.

b. If you want to place it as separate text you should enclose it in double dollar sign like.


Example: Within Line Equation:

This is the equation states that $\sqrt{3x-1}+(1+x)^2$ I needed

Example: Not Within Line Equation:

This is the equation $$\frac{x^n-1}{x-1} = \sum_{k=0}^{n-1}x^k$$

Latex Added to My Blog!!

Well this is unusal as i just added the $\LaTeX\$ to my Blog!!!

$$\frac{x^n-1}{x-1} = \sum_{k=0}^{n-1}x^k$$

$$x^2$$

$$x^21 \ne x^{21}$$

$$\overline{x+\overline{y}} = \overline{x}+y$$

$$
\left(
\begin{array}{ccc}
a_{11}&\cdots&a_{1n}\\
\vdots&\ddots&\vdots\\
a_{m1}&\cdots&a_{mn}
\end{array}
\right)
$$


$$
\begin{eqnarray*}
1+2+\ldots+n &=& \frac{1}{2}((1+2+\ldots+n)+(n+\ldots+2+1))\\
&=& \frac{1}{2}\underbrace{(n+1)+(n+1)+\ldots+(n+1)}_{\mbox{$n$ copies}}\\
&=& \frac{n(n+1)}{2}\\
\end{eqnarray*}
$$

Enjoy

The Google Sheet Assignment - Splitting and Transposing Cell based on Delimiter - Part 1

I recently had opportunity to workout a formula with google sheets, where i realized how powerful google sheet formulas are. The problem ...