Talk:Negative binomial distribution
WikiProject Statistics  (Rated Bclass, Midimportance)  


Contents
 1 Negative binomial regression
 2 What's up with the pmf?
 3 Example with realvalued r
 4 Beta negative binomial mixture
 5 Wrong p in Sampling and point estimation of p?
 6 German version is better
 7 request for introduction
 8 reversion
 9 etymology
 10 Equivalence?
 11 Major reorganization
 12 Plots?
 13 the mean is wrong
 14 the mgf is wrong
 15 Use of gamma function for a discrete distribution
 16 Expected Value derivation
 17 MLE
 18 overdispersed Poisson
 19 First paragraph
 20 Trials up to rth success
 21 Little match girl
 22 Major Changes
 23 More dumbing down needed?
 24 a question
 25 an idea
 26 Error in definition section equations.
 27 Very serious issues with File:Negbinomial.gif
 28 Concrete outcomes vs subjective values
 29 Correct equation?
 30 Mode does not appear to be correct
 31 GammaPoisson mixture parameter confused
 32 An error in PMF, Mean and Variance formulas
 33 Isn't the CDF wrong?
 34 MLE section uses wrong pmf?
 35 Inconsistent introduction
 36 Incorrect Fisher information for the convention used in this article
 37 Polya naming convention  bioinformatics
 38 Median is hard to find
 39 Mixed conventions
 40 Is the gif for the pmf wrong?
Negative binomial regression[edit]
There isn't yet an article on negative binomial regression in Wikipedia: maybe I'll write one if I get the time. This application of the distribution uses a reparameterization in terms of the mean and dispersion, so I have added a bullet point to clarify what this form is, and the various terms used, with some additional references. I moved Joe Hilbe's book into the references (updating to the second edition), and so deleted the section on additional reading Peterwlane (talk) 06:30, 30 May 2013 (UTC)
What's up with the pmf?[edit]
Wolfram mathworld, as well as the statistics textbooks I've consulted, list the pmf as having k+r1 choose k1, but in this page it is consistently choose r. Why is this? I can't find any difference in convention. In all cases p is probability of success and k is the desired number of failures. I would really like an explanation because as it stands I perceive it as an error. http://mathworld.wolfram.com/NegativeBinomialDistribution.html — Preceding unsigned comment added by Doublepluswit (talk • contribs) 18:17, 7 May 2013 (UTC)
 On this page, k is the number of successes, not the number of failures, while on the Wolfram page, r1 is the number of successes. Notice that k can be zero, but r cannot. This leads to the valid difference. I would be interested as to the reasoning for using the convention that is used on this page, however. Wolfram's/Ross's convention is more natural and convenient from my perspective at least. Machi4velli (talk) 09:31, 1 July 2013 (UTC)
 This is really frustrating! I definitely think the Wolfram version should be mentioned / included in the alternate param sections. Ashmont42 (talk) 18:09, 19 June 2017 (UTC)
Example with realvalued r[edit]
In the case of an integer valued r one may correctly write:
In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (nonrandom) number r of failures occurs. For example, if one throws a die repeatedly until the third time “1” appears, then the probability distribution of the number of non“1”s that had appeared will be negative binomial.
How would this example be in the case of a real valued r?
Beta negative binomial mixture[edit]
Would not it be a good idea to discuss also the beta negative binomial mixture (see among others Wang, 2011)?
Reference
Wang, Z. (2011). One Mixed Negative Binomial Distribution with Application. Journal of Statistical Planning and Inference, 141, 11531160. — Preceding unsigned comment added by Ad van der Ven (talk • contribs) 10:34, 17 August 2011 (UTC)
Wrong p in Sampling and point estimation of p?[edit]
The formula in the section "Sampling and point estimation of p" seems to give the probability of failure, which is not the definition we're using. For example, if you observe k=0, you saw no successes and k failures, so the probability of success (p) should be low. But the formula gives p=1. Should it be changed to k / (r + k) ? Martin (talk) 15:03, 28 November 2010 (UTC)
German version is better[edit]
I can barely read any German, yet the article here made more sense than this one...74.59.244.25 (talk) 03:26, 18 February 2008 (UTC)
 I'm just noticing this comment now. I'll look at the German version. Michael Hardy (talk) 17:48, 14 April 2010 (UTC)
 The German version also has a different form for the CDF, even though it seems to give the same definition of p (as the probability of success). I believe that the German version of the CDF is accurate. 104.129.196.214 (talk) 14:49, 8 October 2018 (UTC)
request for introduction[edit]
This article needs a proper introduction, that can help a layman understand what the term means, what it entails when used in text or conversation. Currently, this is not feasible, you'd have to scroll down a long ways and start reading the examples to even begin to understand; if you had no previous knowledge of mathematics or statistics at all. I'm putting this at the top, as I think it's more vital issue than any concerning the mathemathical/statistical content of the page. Starting with
In probability and statistics the negative binomial distribution is a discrete probability distribution.
Does not explain what Negative Binomial Distribution is  what separates it from other discrete probability distributions. I personally think this should be attempted as highest priority, obviously I'm not able to do it (or I wouldn't be writing here, eh). Assuming anyone (not to mentione everyone) is able to understand mathematical formulas that incorporate greek letters is IMHO pedagogically unsound Asherett 12:17, 13 September 2007 (UTC)
 Well, obviously the statement that "In probability and statistics the negative binomial distribution is a discrete probability distribution" does not way WHICH discrete probability distribution it isthat comes later in the article. As for making it clear to someone who knows NOTHING AT ALL about mathematics or statistics: that may not be so easy. Perhaps making it clear to a broader audience can be done, with some effort, though. Michael Hardy 19:15, 13 September 2007 (UTC)
The current introduction is puzzling, first defining the distribution in terms of number of trials before the rth success, then referring to the general case with noninteger r without defining it. It would be good to provide a definition that includes noninteger r before referring to them. Nonstandard (talk) 02:12, 17 May 2016 (UTC)
reversion[edit]
I have reverted the most recent edit to negative binomial distribution for the following reason.
 Sometimes one defines the negative binomial distribution to be the distribution of the number of failures before the rth success. In that case, the statement that the expected value is r(1 − p)/p is correct.
 But sometimes, and in particular in the present article, one defines it to be the distribution of the number of trials needed to get r successes. In that case, the statement is wrong.
If you're going to edit one part of the article to be consistent with the former definition, you need to be consistent and change the definition. Michael Hardy 17:40, 7 Jul 2004 (UTC)
etymology[edit]
shouldn't there be a sentence or two saying why this is name negative binomial and what it has anything to do with binomial, especially for layman. —Preceding unsigned comment added by 164.67.59.174 (talk) 19:17, 2 September 2009 (UTC)
Equivalence?[edit]
If X_{r} is the number of trials needed to get r successes, and Y_{s} is the number of successes in s trials, then
The article went from there to say the following:
 Every question about probabilities of negative binomial variables can be translated into an equivalent one about binomial variables.
I removed it. I tentatively propose this as a counterexample: Suppose W_{r} is the number of failures before the r successes have been achieved. Then W_{r} has a negative binomial distribution according to the second convention in this article, and it is clear that this distribution is just the negative binomial distribution according to the first convention, translated r units to the left. This probability distribution is infinitely divisible, a fact now explained in the article. That means that for any positive integer m, no matter how big, there is some probability distribution F such that if U_{1}, ..., U_{m} are random variables distributed according to F, then U_{1} + ... + U_{m} has the same distribution that W_{r} has.
So how can the question of whether the negative binomial distribution is infinitely divisible be "translated into an equivalent one about binomial variables"? Michael Hardy 01:43, 27 Aug 2004 (UTC)
 Removing the bit about "every question" seems OK to me; the important point is the relation between binomial and negative binomial probabilities. But Mike, it wasn't put in there for the purpose of annoying you. You might consider using the edit summary to say something about the edit rather than your state of mind  how about rm questionable claim about "every question" instead of I am removing a statement that has long irritated me. Wile E. Heresiarch 15:23, 6 Nov 2004 (UTC)
Major reorganization[edit]
Trying to be bold, I've just committed several major changes. I found the previous version somewhat confusing, since it talked about three slightly different but closely related "conventions" for the negative binomial, and it never became fully clear to me which convention was in use at which point in the subsequent discussion. I've replaced the definition with what I consider to be the most natural version (the previous convention #3). The reasons that definition is "natural" is that it arises naturally as the GammaPoisson mixture, convergesindistribution to the Poisson, etc. The shifted negative binomial (previous convention #1) can still be derived (see the worked example of the candy pusher). Now we have a single, consistent (hopefully!) definition of the negative binomial instead of three similaryetdifferent conventions. I'm painfully aware that all of the previous three conventions are in use and sometimes referred to as the negative binomial; but then again, that doesn't even begin to exhaust the variations on this distribution that can be found in the wild, so why not pick one reasonble definition and stick to that here? MarkSweep 12:04, 5 Nov 2004 (UTC)
 Well, if we were writing a textbook, we would certainly want to pick one defn and stick to it. However, we're here to document stuff as it is used by others. If there are multiple defns in common use, I don't see that we have the option to pick and choose. Sometimes multiple defns can be collapsed by saying "#2 is a special case of #1 with A always a blurfle" and then describing only #1. I don't know if that's feasible here. Regards & happy editing, Wile E. Heresiarch 15:09, 6 Nov 2004 (UTC)
 Yes, that was basically the case here. The previous "convention #2" was the Pascal distribution, which is a special case of the general negative binomial (previous "convention #3"). This didn't become fully clear in the previous revision, where the discussion of the Pascal distribution seemed more like an afterthought. The previous "convention #1" appeared to be simply a Pascal distribution shifted by a fixed amount. There is still a discussion of that in the worked example, but that could arguably be moved to the front and made more explicit. MarkSweep 23:25, 6 Nov 2004 (UTC)
 Hi, just found this page and I don't like that the starting point is the more general formula that has r being a strictly positive real. I think that 99% of the time somebody is interested in this distribution, r is going to be an integer. Which isn't to say that we should purge this more complete definition, just that there is a lot to be said for following the way the present article on the Binomial distribution is written (since this is closely related) and because that one is a heck of a lot clearer. I would suggest using one variable where r is an integer and a seperate variable where it is a real (to keep them straight). Along the same lines, I also think that starting talking about Bernoulli trials so far down the page is not a good ideaI'd like to see it up top. Is this what you two are talking about? Oh, wait, those dates are 2004! oh well, I'll still wait to see if anyone cares b/c this is a big edit. O^{18} 07:13, 9 November 2005 (UTC)
 I support the previous comment. I am a graduating maths/computer science student, but the first definition was absolutely nonintuitive for me and only the "Occurrence" section made it clear. I doubt whether the generalization is more important than the fact that this distribution is derived from the Pascal distribution. —The preceding unsigned comment was added by 85.206.197.19 (talk) 20:10, 4 May 2007 (UTC).
Plots?[edit]
Is it possible to get some plots of what this looks like? I got sent here from the mosquito page, and anyone reading that probably doesn't want to wade through many lines of math, just see a picture of what it means. zandperl 04:10, 30 August 2005 (UTC)
 One year later, exactly the same issue. Remarkably, the mosquito page still links here, but there's no plot. Anyone?
SketchTheFox 23:21, 19 August 2006 (UTC)
The datapoints and datalines in the animated plot are all mistakenly rightshifted by one. The support begins at k=0, not k=1. The bar charts in some of the other languages (Spanish, Arabic, French, Polish, Slovenian, Turkish, Chinese) are ambiguous, since each bar extends a full unit, so perhaps the creator of the animated plot misinterpreted which side of the bars to assign the values to. The correct values are:
μ=10, r=1, p=0.909091: {{0, 0.0909091}, {1, 0.0826446}, {2, 0.0751315}, {3, 0.0683013}, {4, 0.0620921}, {5, 0.0564474}, {6, 0.0513158}, {7, 0.0466507}, {8, 0.0424098}, {9, 0.0385543}, {10, 0.0350494}, {11, 0.0318631}, {12, 0.0289664}, {13, 0.0263331}, {14, 0.0239392}, {15, 0.0217629}, {16, 0.0197845}, {17, 0.0179859}, {18, 0.0163508}, {19, 0.0148644}, {20, 0.0135131}, {21, 0.0122846}, {22, 0.0111678}, {23, 0.0101526}, {24, 0.0092296}}
μ=10, r=2, p=0.833333: {{0, 0.0277778}, {1, 0.0462963}, {2, 0.0578704}, {3, 0.0643004}, {4, 0.0669796}, {5, 0.0669796}, {6, 0.0651191}, {7, 0.0620181}, {8, 0.058142}, {9, 0.0538352}, {10, 0.0493489}, {11, 0.0448627}, {12, 0.040501}, {13, 0.0363471}, {14, 0.0324527}, {15, 0.0288469}, {16, 0.0255415}, {17, 0.0225366}, {18, 0.0198239}, {19, 0.0173894}, {20, 0.0152157}, {21, 0.0132835}, {22, 0.0115728}, {23, 0.0100633}, {24, 0.0087355}}
μ=10, r=3, p=0.769231: {{0, 0.0122895}, {1, 0.0283604}, {2, 0.0436313}, {3, 0.0559376}, {4, 0.0645434}, {5, 0.0695082}, {6, 0.0712905}, {7, 0.0705071}, {8, 0.0677953}, {9, 0.0637391}, {10, 0.0588361}, {11, 0.0534874}, {12, 0.0480015}, {13, 0.0426049}, {14, 0.0374548}, {15, 0.0326529}, {16, 0.0282574}, {17, 0.0242937}, {18, 0.0207638}, {19, 0.0176534}, {20, 0.0149375}, {21, 0.0125847}, {22, 0.0105606}, {23, 0.00882994}, {24, 0.00735829}}
μ=10, r=4, p=0.714286: {{0, 0.00666389}, {1, 0.0190397}, {2, 0.0339994}, {3, 0.0485706}, {4, 0.0607133}, {5, 0.0693866}, {6, 0.0743428}, {7, 0.07586}, {8, 0.0745054}, {9, 0.0709575}, {10, 0.0658891}, {11, 0.0598992}, {12, 0.0534814}, {13, 0.0470166}, {14, 0.0407797}, {15, 0.034954}, {16, 0.0296485}, {17, 0.0249147}, {18, 0.0207623}, {19, 0.0171718}, {20, 0.0141054}, {21, 0.0115146}, {22, 0.00934628}, {23, 0.00754669}, {24, 0.0060643}}
μ=10, r=5, p=0.666667: {{0, 0.00411523}, {1, 0.0137174}, {2, 0.0274348}, {3, 0.0426764}, {4, 0.0569019}, {5, 0.0682823}, {6, 0.0758692}, {7, 0.079482}, {8, 0.079482}, {9, 0.0765382}, {10, 0.0714357}, {11, 0.0649415}, {12, 0.0577258}, {13, 0.0503251}, {14, 0.0431358}, {15, 0.0364258}, {16, 0.0303548}, {17, 0.0249981}, {18, 0.0203688}, {19, 0.016438}, {20, 0.0131504}, {21, 0.0104368}, {22, 0.00822294}, {23, 0.00643535}, {24, 0.00500527}}
μ=10, r=10, p=0.5: {{0, 0.000976562}, {1, 0.00488281}, {2, 0.0134277}, {3, 0.0268555}, {4, 0.0436401}, {5, 0.0610962}, {6, 0.0763702}, {7, 0.0872803}, {8, 0.0927353}, {9, 0.0927353}, {10, 0.0880985}, {11, 0.0800896}, {12, 0.0700784}, {13, 0.0592971}, {14, 0.0487083}, {15, 0.0389667}, {16, 0.0304427}, {17, 0.0232797}, {18, 0.0174598}, {19, 0.0128651}, {20, 0.0093272}, {21, 0.00666229}, {22, 0.00469388}, {23, 0.00326531}, {24, 0.0022449}}
μ=10, r=20, p=0.333333: {{0, 0.000300729}, {1, 0.00200486}, {2, 0.007017}, {3, 0.0171527}, {4, 0.032876}, {5, 0.0526015}, {6, 0.0730577}, {7, 0.0904524}, {8, 0.101759}, {9, 0.105528}, {10, 0.10201}, {11, 0.0927365}, {12, 0.0798564}, {13, 0.0655232}, {14, 0.0514825}, {15, 0.0388979}, {16, 0.0283631}, {17, 0.020021}, {18, 0.0137181}, {19, 0.00914539}, {20, 0.0059445}, {21, 0.00377429}, {22, 0.00234463}, {23, 0.00142717}, {24, 0.000852336}}
μ=10, r=40, p=0.2: {{0, 0.000132923}, {1, 0.00106338}, {2, 0.00435987}, {3, 0.0122076}, {4, 0.0262464}, {5, 0.0461937}, {6, 0.0692905}, {7, 0.0910675}, {8, 0.107004}, {9, 0.114138}, {10, 0.111855}, {11, 0.101687}, {12, 0.0864336}, {13, 0.0691469}, {14, 0.052354}, {15, 0.0376949}, {16, 0.0259153}, {17, 0.0170736}, {18, 0.0108133}, {19, 0.00660178}, {20, 0.00389505}, {21, 0.00222574}, {22, 0.00123428}, {23, 0.000665436}, {24, 0.000349354}}
This is visually confirmed in the last two frames of the animation, where the mean of the lopsided plot is obviously to the right of the intended mean of 10.
Note that the Italian version also uses this animated plot. AndreasWittenstein (talk) 17:30, 4 February 2011 (UTC)
 Fixed. // stpasha » 23:06, 4 February 2011 (UTC)
Wow, that was quick! Thanks, Pasha. AndreasWittenstein (talk) 00:07, 6 February 2011 (UTC)
the mean is wrong[edit]
should be (1p)r/p, surely
UM According to 'A First Course in Probability' by Sheldon Ross, the mean is r/p
Both are correct. The only difference is in the choice of random variable, in case if you choose X as number of trials for rth success its mean will be r/p, but if Y denote the number of failures for rth success then mean should be rq/p. If you look closely then X and Y are related as follows X=Y+r then using linearity of expectation E(X)=E(Y)+r= rq/p +r =r/p. — Preceding unsigned comment added by 103.37.200.103 (talk) 05:55, 7 June 2018 (UTC)
 Wrong. Look, how many times do so many of us have to keep repeating this? Sheldon Ross's book CORRECTLY gives the mean of what Sheldon Ross's book calls the negative binomial distribution. But there are (as this article explains) at least two conventions concerning WHICH distribution should be called that. Sheesh. Michael Hardy 21:42, 29 November 2006 (UTC)
Correct mean and variance. The mean for the distribution as defined on the page should be r*(1p)/p, and the variance should be r*(1p)/p^2. An easy way to verify these are correct is to plot them together with the pmf (using the same values for r and p). —Preceding unsigned comment added by Cstein (talk • contribs) 12:41, 15 June 2010 (UTC)
Please check again! Other sources, e.g., Wolfram Alpha and the German article, also say that the mean is r*(1p)/p, but they use a different p. If you define
then the mean is r*p/(1p), and the variance r*p/(1p)^2. GogolDöring (talk) 09:44, 21 July 2010 (UTC)
If p is the positive probability, as the page states, then the mean is r*(1p)/p. This needs to be fixed! — Preceding unsigned comment added by 71.163.43.88 (talk) 21:53, 13 March 2013 (UTC)
— The book I use is "Statistical Distributions, 2nd Edition" by Evans, Hastings, and Peacock. They define r as the number of successes, and p as P(success). They also define q=(1p) which shortens all the formulas. They say the mean is rq/p, and the variance is mean/p. This makes sense to me. Suppose "success" is "being a genius". Suppose p is 10^6 or one in a million. That means if you want r geniuses, you need about r/10^6 = r × 10^6 = r million people. So the smaller p is, the bigger the mean has to be. And of course, the smaller p is, the less relevant q is, because it's basically one.
I can see that if you say you're looking for r failures, rather than r successes, you could get what this article says.
MikeDunlavey (talk) 14:01, 11 April 2015 (UTC)
the mgf is wrong[edit]
The numerator should be pe^t instead of p. The following link can support this http://www.math.tntech.edu/ISR/Introduction_to_Probability/Discrete_Distributions/thispage/newnode10.html
The bottom of that page gives the mgf of negative binomial distribution. I verified it. —Preceding unsigned comment added by 136.142.163.158 (talk • contribs)
 WRONG!!! This article has it right, and so does the web page you cited. They're talking about TWO DIFFERENT DISTRIBUTIONS. You did not read carefully. The negative binomial distribution dealt with in this article is supported on the set
 { 0, 1, 2, 3, ... }
 whereas the one on the web page you cite is supported on the set
 { r, r + 1, r + 2, .... }
 Both articles are clear about this. You need to read more carefully. Michael Hardy 19:55, 13 September 2006 (UTC)
 Both articles may be mathematically correct, but the using the number of successes as the RV, the number of failures as the goal (one parameter), and the probability of success as the other parameter is to me less intuitive than using the number of failures as the RV, number of successes as the goal parameter, and the probability of success as the other parameter. The introduction to the example of selling candy using the article's current convention seems unnatural and forced. There is much more than being mathematically correct. Lovibond (talk) 16:45, 11 October 2015 (UTC)
Use of gamma function for a discrete distribution[edit]
Is it the convention among probability literature to represent the negative binomial with the gamma function? In Sheldon Ross's introductory text, the distribution is introduced without it (although that is an alternative representation of the distribution). I am not objecting but as a beginner am curious why this is how it is represented. reddaly
I think either adding this way of writing it: , or specifying that would be beneficial. some people start running when they see the gamma function
 Good idea. It would be easier on the eyes for those who haven't yet discovered how to love the Γ function. Aastrup 22:24, 18 July 2007 (UTC)
Expected Value derivation[edit]
The classic derivation of the mean of the NBD should be on this page, as it is on the binomial distribution page. Vince ^{Talk} 04:44, 12 May 2007 (UTC)
 I agree. Aastrup 22:24, 18 July 2007 (UTC)
MLE[edit]
This article lacks Maximum Likelihood, and especially Anscombe's Conjecture (which has been proven). Aastrup 22:24, 18 July 2007 (UTC)
overdispersed Poisson[edit]
I recently added a note about how the Poisson distribution with a dispersion parameter is more general than the negative binomial distribution and would make more sense when one is simply looking for a Poisson distribution with a dispersion parameter. I think it's important to realize that the Poisson distribution with a dispersion parameter described by M&N is more general in that the variance has positive support instead of the more limited greater support than the mean. There certainly are situation where the negative binomial distribution makes sense, but if one is just looking for a Poisson with a dispersion parameter, why beat around the bush with this other distribution and not just go for the real thing? O^{18} (talk) 17:38, 26 January 2008 (UTC)
 There is no such thing as "overdispersed Poisson", because if it is overdispersed, then it is not Poisson. If "the Poisson distribution with a dispersion parameter described by M&N" is important, then go ahead and describe it in some other article, perhaps in a new article. This article is about the negative binomial distribution only. The (positive) binomial distributions have variance < mean, and the Poisson distribution has variance = mean, and the negative binomial distribution has variance > mean. Bo Jacoby (talk) 22:36, 26 January 2008 (UTC).
First paragraph[edit]
Among several objections I have to the edits done on April 1st by 128.103.233.11, is this: the rest of the article is about the distribution of the number of failures before the rth success, not about the one that counts the number of trials up to and including the rth success. Thus, in the experiment that that user described, the distribution should have started at 0, not at 2. This matters because (1) we want to include the case where r is not an integer, because (2) we want to be able to see the infinite divisibility of this distribution. Michael Hardy (talk) 16:23, 4 April 2009 (UTC)
Trials up to rth success[edit]
This page should be updated to include a column in the side table for the version of the negative binomial for "numbers of trials to rth success". This is the most intuitive, if not the most common, version of this distribution. It answers the question "how many batches should I run if I want r success." I think the page on the geometric distribution handles this nicely, there is no reason the exact analog cannot be done here. Until this is done, I predict endless waves of people claiming that the mean is r/p. As it is, this page is currently unreadable. Formivore (talk) 22:05, 6 April 2009 (UTC)
 Yeah, what situation is the distribution as described on this page useful for? I've only ever encountered the NB distribution that is "numbers of trials to rth success". O^{18} (talk) 05:39, 26 July 2009 (UTC)
 Really, what's so difficult about this? The negative binomial distribution builds upon a sequence of Bernoulli trials. Each trial has a binary outcome: two possibilities. The words “success” and “failure” are just labels we arbitrarily attach to those 2 outcomes. Say, if your trials consist of flipping the coin, would you call Heads the “success” or Tails? If the trial consists of people voting for a democratic or a republican party, which one should be called the success (okay, you might have a personal opinion on this account :)? If the trial is a survey question with answers Yes/No — which one is success? and so on...
 “Numbers of trials to rth failure” is just as valid interpretation as the opposite one. For example, suppose in a hospital a doctor gets fired after the 3rd patient who dies from his error. “A patient dying” we'll call the failure (well it would be awkward to call it a success). So how many patients will the doctor have until he gets fired — that would be our negative binomial distribution? // stpasha » 20:47, 15 April 2010 (UTC)
 stpasha, I think you just highlighted the point. The distribution on the page is the number successes before the rth failure. But you said, the number of patients. For the distribution on the page, it would be the number of patients that don't die until the doctor gets fired. But that is a much less natural parameterization. Think about a manufacturing processyou want to know how many widgets you have to make before you get, say three that work. The distribution on the page would how many bad widgets you have to make before you get 3 good ones, but you really want to know how many total widgets you have to make. 0^{18} (talk) 23:12, 15 April 2010 (UTC)
The definition on this page ("the number of successes in a sequence of Bernoulli trials before a specified (nonrandom) number of failures") is not one I can recall seeing in any probability and statistics textbook. I have about 30 on my bookshelf; of the six I sampled, five defined NB(r,p) as the number of trials before the rth success, and one defined it as the number of failures before the r success. None used the definition of this page. One of the "classic" probability text (Feller, An Introduction to Probability Theory and its Applications, vol. 1, page 165) uses the numberoffailures definition. I too would suggest that this page describe those two competing definitions (similar to what is done on the Geometric distribution page). The current definition should either be scrapped, or (if someone can point out a source that uses that definition) perhaps retained as an alternate definition in a separate section. DarrylNester (talk) 16:30, 18 February 2013 (UTC)
Little match girl[edit]
Are we reading the same The Little Match Girl article? The one that is linked is about a girl who dies on a cold night after being too afraid to return home for fear of being punished, and instead dies in the cold. I really don't see the link between the Andersen story and this article, nor why it is a "classic example" of a negative binomial distribution. Considering what is in the linked article, I see no benefit (and a great deal of confusion) by putting that link there. The story is similar to the example of pat, but totally non mathematical, and adds no information about the neg. bin. dist. Just to make things clear, I am not the IP earlier, and I don't think the link should be there User A1 (talk) 02:57, 25 July 2009 (UTC)
 In the setup of TLMG, Pat must empty her box of matches or face child abuse. In Dr. Evans' example, Pat must empty her box of candy bars or face child abuse. What is the probability that Pat freezes to death? Damian Yerrick (talk  stalk) 13:10, 25 July 2009 (UTC)
 So, in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping (were such an article to exist)? In one sense it is not apropos, in another, it is just part of the canon regardless of how interesting it looks when you don't know the history. O^{18} (talk) 20:12, 25 July 2009 (UTC)
 I think that the link firstly is far to tenuous, both myself and an IP have no idea what you are on about w.r.t. the link. Secondly, I would remind you of WP:EGG (no easter egg links). Finally, should we then link integer to Hansel and gretel, temperature, porridge and bed to little red riding hood? Follicle_(anatomy) to Rapunzel ? I consider the links no more bizzare than this.User A1 (talk) 01:45, 26 July 2009 (UTC)
 Sorry, still going: in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping. Sure that's fine, as it is a good example of the applicability of the mathematics, but I wouldn't then link that to waltzing matilda, on the pretext that the swagman steeps his tea. User A1 (talk) 01:49, 26 July 2009 (UTC)
 After using google for a while, it appears that the little match girl, negative binomial link was only on website that use the text as it used to appear in this article. I don't understand why you couldn't see why the link was related, but given that it is 2 to 1, and the 1 doesn't really care, I say lets just ax it and be done. O^{18} (talk) 05:30, 26 July 2009 (UTC)
Major Changes[edit]
I have added the alternate formulation of the negative binomial that describes the probability of k **trials** given r sucesses to the side table and to the body section describing the pmf. This presentation of both formulations follows e.g. Casella, and I believe is justified both by the record of this talk page as well as by theoretical considerations. While the trials to r sucesses formulation has some disadvantages (parameterdependent support) it has the big advantage actually being the waiting time distribution of a Bernoulli process. The twocolumn side table was taken from the page on the geometric distribution; in fact the two geometric distributions are just the cases of the two neg. binomials with r=1. If it's worth doing there (where the difference is a factor of (1p) fer cryin' out loud), it's worth doing here.
I have not modified any other sections. I believe everything else on the page is still valid after this change (since the original pmf is still there). Some of it may now not be needed and could be removed. If anyone has cleaner way of doing this presentation (which is a bit clunky) go ahead. However, I would appreciate it if this change was not reverted without a good argument against it. Formivore (talk) 23:44, 16 October 2009 (UTC)
 I believe the second formulation (the number of trials before rth success) should be removed as a second column of the infobox. A person who doesn’t know what the NB distribution is and comes to this page, will likely to get confused by the fact that there seems to be two different(?) distributions by the same name, and will never realize that they only differ up to a shift by a constant r. Btw Casella starts with the informal description of what is called “2nd formulation” here, but later on redefines it into our “1st formulation” and says that “unless specified otherwise, later on we will always be understanding this definition when we use term ‘negative binomial’”.
 If we leave only one definition (leaving the other one as a short subsection describing the differences in the alternative formulation), it has following advantages: (1) the reader will never get confused regarding which definition is used on the page, (2) this definition can be properly generalized to the negative multinomial distribution, (3) this definition is infinitely divisible, arises as a mixture of gammapoisson, and other things mentioned on this talk page.
 It will also be beneficial to recast the parameter p as the probability of failure not of success (or alternatively, to swap around what we consider failures and what are successes here). E.g. we may define the NB distribution as “probability of having k=0,1,2,… successes before a fixed number r of failures occur”? That way the definition sounds more naturally, and extends to the multinomial case gracefully. … stpasha » 10:44, 8 December 2009 (UTC)
 Both of these are couched it terms of trials and r appears in the "choose" function, but r is stated to be a real. Maybe we should start simple and then get more complicated later? What loss is there to having r be an integer and then having a section that allows for otherwise and then states the pdf with gamma functions (I'm assuming that is what takes the place of the choose). 0^{18} (talk) 14:50, 8 December 2009 (UTC)
 I have to disagree Stpasha. If someone comes to this page not knowing what the NB is, there is a good chance they will have the wrong distribution in mind leading to more confusion, not less. Roughly half of this talk page is taken up with confusions of this sort. Maybe it's not best to have the double side table, but there should at least be a very clear explanation at the top of the article of the two formulations.
 Successes and failures are defined as they are to generalize the geometric distribution. This is a more important analogy that the negative multinomial, which is fairly obscure. That said I don't see how one way is more natural than the other for the NB. Formivore (talk) 07:42, 12 December 2009 (UTC)
 Well, the “success” and “failure” are just arbitrary labels we assign to two possible outcomes of a Bernoulli trial. Say, if we consider an individual who has small chance p of dying in each day (so that the lifespan has geometrical distribution), then the event of his death will be called “success”.
 In order to have consistency we might as well reparametrize the geometric distribution as well, so that its pmf is f(k) = p^{k−1}(1−p). This expression actually looks simpler than the f(k) = p(1−p)^{k−1} (although of course they are quite the same). … stpasha » 12:36, 12 December 2009 (UTC)
Looking at this article is looking at social failure. All kind of flotsam has accumulated. Regardless of whether the side table should have one or two columns, this article should be revised to remove redundancies and sections that are not notable. I'd propose the following changes:
1)Move the "Limiting Case" and "GammaPoisson mixture" subsections further down in the article. I don't know if the parameterization used to arrive at the limit is broadly applicable, or if it is only used for this derivation. If the former is true, then this should be explained. Otherwise this should be moved to the "Related Distributions" section. The mixture derivation does not describe a specification at all and should be moved to the "Occurrence" section. This section should just describe what this distribution is, that's it.
2)The "Relation to other distributions" subsections describes, in a derivation involving the incomplete gamma function, the k trials to r successes stuff that the wrangling has been about. A good explication at the beginning of the article will obviate this section.
3)The "Example" at the end of the article is unnecessary and poorly written. There is also a much shorter example in the "Waiting time in a Bernoulli process" subsection that does not involve candy bars. Formivore (talk) 08:59, 12 December 2009 (UTC)
The article titled geometric distribution has two columns: one for the number of trials before the first success, and one of the number of trials including the first success.
It was necessary to do that because before it appeared that way, idiots wreckless irresponsible editors kept coming along saying "I CAN'T BELIEVE THIS ARTICLE MAKES SUCH A CLUMSY MISTAKE!!!! MY TEXTBOOK SAYS....." and then recording information that's correct for one of the two distributions and wrong for the other, and failing to notice that there are two of them, even though the article clearly said so.
We cannot omit the negative binomial distribution of the number of trials before the rth success because
 That's the one that's infinitely divisible;
 That's the one that arises as a compound Poisson distribution;
 That's the one that allows r to be real rather than necessarily and integer.
Michael Hardy (talk) 19:55, 12 December 2009 (UTC)
 Ok it seems like it’s either me, or Michael (or both) are confused here. Which only reinforces the point that the entire situation is utterly befuddling. The first column is not the “number of trials before the rth success”, but rather the number of “failures” before the rth success. So the difference between two columns is not in before/including, but rather whether we count only the failures, or both the failures and the successes. The two definitions differ by a shift constant r, so it's no biggie.
 Oh, and I'm not saying we should omit the definition of negative binomial as the number of failures before the rth success, that's the one I'm suggesting to keep, while the other one to scratch out (the one whose support starts from r). … stpasha » 09:36, 13 December 2009 (UTC)
OK, I haven't look at this discussion for a while. I was hasty with language; what I meant was:
 One distribution is that of the number of trials needed to get a specified number of successes; and
 One distribution is that of the number of failures before a specified number of successes.
The latter allows the "specified number" to be a noninteger, and is infinitely divisible. If we're going to keep only one, it should be that one. Michael Hardy (talk) 03:16, 21 December 2009 (UTC)
 Michael, I think that would be a great idea for a text book, but I would rather see the page be, well, encyclopedic in its coverage. One thing I think is certain, if we want to state the noninteger case, it should be in another section, not in the bar on the right. 0^{18} (talk) 16:49, 21 December 2009 (UTC)
I never said we should have a "bar" for the specifically noninteger case. But we should have one for the case that's supported on {0, 1, 2, ...}. And it should state a parameter space that includes nonintegers. Somewhere in the text of the article that should be explained (possibly in its own section). Michael Hardy (talk) 20:48, 21 December 2009 (UTC)
More dumbing down needed?[edit]
The recent edits by user:24.127.43.26 and by user:Phantomofthesea make me wonder if we need to dumb this down again to rid ourselves of irresponsible editors who edit without paying attention to what they're reading or what they're writing. Michael Hardy (talk) 02:52, 19 February 2010 (UTC)
 Maybe we should reject the “success/failure” terminology altogether, and instead use something more neutral, like “0/1”. That way whenever a person reads this page he/she would have to stop for 3 seconds and think how our 0/1 maps to his/her textbook’s success/failure. // stpasha » 20:51, 15 April 2010 (UTC)
a question[edit]
I don't want to mess with the entry, but according to Casella & Berger, the pmf listed here is incorrect. The p and (1p) are switched. It should be (p^r)(1p)^k. I haven't looked through to see how that mistake affects the rest of the article, if at all, so I'll leave it to someone with more knowledge of this article than me to correct. —Preceding unsigned comment added by 128.186.4.160 (talk • contribs)
 Sigh............. not this comment again. The article says:
 Different texts adopt slightly different definitions for the negative binomial distribution.
 OK? You need to read what it says!. Michael Hardy (talk) 21:25, 13 April 2010 (UTC)
 Sigh............. not this comment again. The article says:
Ok, I read it more carefully and I concede that what is written here is technically correct. However why not just stick with the Casella Berger definition on here? I'd argue that the Casella/Berger book is the most widely used of its kind, so defining the pmf this way is just confusing to most people. —Preceding unsigned comment added by 68.42.50.243 (talk • contribs) 21:00, 13 April 2010
I don't see where this decision was discussed above. In the first instance that I see of such a discussion Michael Hardy is saying it has happened before, so I guess there must be an unlinked archive?0^{18} (talk) 02:58, 14 April 2010 (UTC)
 Sorry, I've now looked at this talk page and see that it looks to me like MH pointed out that any change would require the entire page be changed in the section titled, " reversion". Since then in the section, "Trials up to rth success" Formivore correctly predicts endless waves of people correcting it because of the more intuitive interpretation of the alternative specification. There has also been a more lengthy discussion in the section titled "Major Changes" where Formivore tries to update the article and describes it as being in disarray and confusing. Formivore appears to have given up. In the end MH likes that one can make a (somehow useful?) change of support for one of the parameters for the less intuitive parameterization. 0^{18} (talk) 04:47, 14 April 2010 (UTC)
 To focus my ramble into a question, what is the value of being able to treat r as real and not just integer valued? Why do we care? Also, even if we do want this, might it make more sense to give that formulation a separate section that starts by reparameterizing, showing the new cdf/pdf and then explaining why it is useful. 0^{18} (talk) 17:02, 14 April 2010 (UTC)
We want to treat r as real because it shows that this is an infinitely divisible distribution and that there's a corresponding Levy process. Michael Hardy (talk) 17:46, 14 April 2010 (UTC)
 Okay, so (1) Why is there no mention of "Levy process", and (2) why does this trump the overall understandability of the article? Would you agree that this parameterization could be moved it its own section? 0^{18} (talk) 18:09, 14 April 2010 (UTC)
an idea[edit]
We could put the whole box in a transcluded page to make it somewhat more difficult to quickly edit it. It is a bit extreme, but there have been many well intentioned (if somewhat inattentive) incorrect edits to it. 0^{18} (talk) 17:07, 29 April 2010 (UTC)
 Sounds great. // stpasha » 18:02, 29 April 2010 (UTC)
 And we need to do the same thing with dice/die... // stpasha »
Error in definition section equations.[edit]
In the summary on the right hand side we have:
r  is the number of failures
pmf: c \times (1p)^r p^k
which makes sense.
Then in the definition section we have, success is p, failure is (1p) but then the pmf function is given as
c \times (1p)^k p^r
which is the wrong way round  this gives the probability of k failures in r+k trials. This error continues thoughout the definition section. However in the related distributions section when the more common version of the pmf is written using \lambda (more commonly \theta in my experiance) we have (1p)^r p^k, which is as it should be following the textual definitions given previously. —Preceding unsigned comment added by 193.63.46.63 (talk) 10:24, 4 October 2010 (UTC)
Very serious issues with File:Negbinomial.gif[edit]
I see there are several very serious issues with the article's picture Negbinomial.gif:
I am suprised noone has caught this after all the time this picture (or its previous incarnations) has been shown in the title page.
We know that the mean of the distribution is . To have a constant mean value of 10, then p and r have to be related as , which is the wrong thing to do, as p and r should be completely independent of one another. For example, for r=10, then p=1/2. But if r=20, then p=1/3. Since p and r are the exogenous variables of the distribution, then we should show a picture of a distribution that keeps one or the other constant. We should NOT show a picture that varies both simultaneously, since this does not show the true behavior of the function.
Also, if p and r are set this way, then there is no way that the standard deviation will be constant. Since we know that , then we find that , a nonconstant function of r. And it is odd that the author of the picture chose to show the standard deviation as a horizontal segment. It should be in the same domain as the mean (i.e., a vertical line).
This picture has so many issues that I must recommend that it not be shown and that a new one (hopefully a correct one) be developed. In an earlier post, someone mentioned that the German language version of the article is a better one. Like that person, I don't read German either, but I can tell that the sample picture used there is a correct one. Perhaps an equivalent picture for the English language can be developed to replace Negbinomial.gif. If noone comes up with one in the next few days, I'll just bite the bullet and make my own and upload it. Bruno talk 15:58, 25 May 2011 (UTC)
 This is not an error, just an expositional bit you don't like. I very much prefer the explanation on the page to one where the mean is changing, though it might be worth adding p to the graph as well as r to make it clear that both parameters are changing. One could, of course, reparameterize the NB so that the graphs shown were not only changing one parameter, in some sense it is arbitrary. In any case, the pmf graph should probably have p labeled on it even if it were not changing. 0^{18} (talk) 16:33, 17 August 2011 (UTC)
 Yes, while it may not be an error, I surely don't like it as you point out, and neither should you. I frankly do not see how presenting a function where its main parameters vary simultaneously contributes to clarity. While the animation looks pretty, the reader cannot get a clear understanding as to how the function actually works. Either you vary one parameter at a time and do the animation that way, or you show a static picture like in the German language article. Yes, it is not wrong (save for the standard deviation point), but it is not right either.
 There is already a lot of fodder for confusion by presenting the material in the article in a way that deviates from the standard texts. The picture contributes to this. Again, it is not wrong either, but it is certainly not right, in that readers are left wondering what is going on. We see evidence of this elsewhere in this Talk page, where lessthancareful readers get into pitfalls, and other writers feel compelled to point out the deviations. I don't see why the article should be rife with these problems, as there are much better ways to present the material. This is a substandard article, starting from the picture. I'd offer to rewrite the whole thing but I know I will certainly run into the same resistance I am running into by pointing out deficiencies in the graphic that could easily be corrected. Expositional bit, my foot! Bruno talk 13:47, 18 August 2011 (UTC)
I think the figure here does a poor job of providing a visual idiom for the negative bionomial distribution that distinguishes it from monotone distributions link the geometric with a single parameter. — Preceding unsigned comment added by 65.127.74.2 (talk) 15:41, 19 May 2013 (UTC)
Concrete outcomes vs subjective values[edit]
The outcomes "success" and "failure" are concrete and can be mapped to a set {1,0} by an indicator function. The values "good" and "bad" are subjective values based on social constructions, experiences and personal preferences (concepts that may not even exist in concrete form). The comparison here between "success"/"failure" and good"/"bad" does not make any sense: "When applied to realworld situations, the words success and failure need not necessarily be associated with outcomes which we see as good or bad." The point of an experiment is *not* to have subjective biases, so why would the experimenter see the outcomes as good or bad? Runestone1 (talk) 00:44, 2 June 2011 (UTC)
 The quote agrees with you... — Preceding unsigned comment added by Machi4velli (talk • contribs) 09:43, 1 July 2013 (UTC)
Correct equation?[edit]
In the "Extension to realvalued r" section, I see a denominator of "x! gamma(x)". Is that right, or should it be "x gamma(x)"?
 It's correct. Note that if x were a positive integer, we would have gamma(x) = (x1)!, and you'll see that it would reconstruct the binomial coefficient in the integer case Machi4velli (talk) 10:00, 1 July 2013 (UTC)
Mode does not appear to be correct[edit]
The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:40, 19 July 2013 (UTC)
 The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:49, 19 July 2013 (UTC)
GammaPoisson mixture parameter confused[edit]
I changed the second sentence of the section "Gamma–Poisson mixture" which had suggested that negativebinomial(r,p) ~ poisson(gamma(shape = r, shape = p/(1 − p))). Instead, it should be the rate parameter that should have that value. Here's some R code that makes it obvious:
 many = 10000
 r = 15 # trunc(runif(1,2,20))
 p = 0.56 # runif(1)
 x = rnbinom(many,r,p)
 # negativebinomial(r,p) ~ poisson(gamma(shape = r, rate = p/(1 − p)))
 lambda = rgamma(many,shape = r, rate = p/(1 − p))
 y = rpois(many,lambda)
 # negativebinomial(r,p) ~ poisson(gamma(shape = r, scale = p/(1 − p)))
 lambda = rgamma(many,shape = r, scale = p/(1 − p))
 z = rpois(many,lambda)
 plot(sort(x),1:many/many, xlim=range(c(x,y)),ylim=c(0,1),col='green', lwd=3, type='l')
 lines(sort(y),1:many/many, col='blue')
 lines(sort(z),1:many/many, col='red')
Scwarebang (talk) 00:24, 12 October 2013 (UTC)
An error in PMF, Mean and Variance formulas[edit]
I always have assumed that the formulas given in Wikipedia were correct. I will not do that anymore... I have no idea, why such basic mistake prevailed for so long. Maybe because the mistake is on the sidebar, which is not straightforward to edit (I have no idea, how to do it). Here are the correct entries:
Mean =
Variance =
The results were checked manually on Wolfram's Mathematica, and then checked again, if there is no error on the Mathematica's side. — Preceding unsigned comment added by Adam Ryczkowski (talk • contribs) 10:21, 23 October 2013 (UTC)
 You are right when is the probability of success and is the number of successes until the experiment is stopped. This is because in this case, is a sum of geometric random variables with probability . The mean of each such variable is so the mean of their sum is .
 However, the article defined as the number of failures until the experiment is stopped. Then, is a sum of geometric random variables with probability . The mean of each such variable is so the mean of their sum is as in the side bar. Erel Segal (talk) 08:50, 8 December 2015 (UTC)
Isn't the CDF wrong?[edit]
CDF: , the regularized incomplete beta function.
According to the regularized incomplete beta function .
So, . But we know that .
 I agree that the CDF is wrong. Based on the way is defined (as the probability of success), I would have expected or . I've been going in circles for a while trying to figure out why values calculated using this (incorrectly) presented CDF don't match R's pnbinom function. Fwiw, the German version of this page presents the CDF as , which matches numerical calculations. — Preceding unsigned comment added by 104.129.196.214 (talk) 14:55, 8 October 2018 (UTC)
MLE section uses wrong pmf?[edit]
Doesn't the MLE section use the wrong pmf (where r is the number of successes instead of failures)? This is inconsistent with the rest of the article. — Preceding unsigned comment added by Nicole.wp (talk • contribs) 23:35, 27 May 2014 (UTC)
I thought so, and changed it. Apologies if I should have discussed first.Clay Spence (talk) 20:45, 8 October 2014 (UTC)
Inconsistent introduction[edit]
The case with r real contradicts the initial definition "The number of successes before r failures in independent Bernoulli trials". It should be made clear at the outset that this case is a special case.
The reason for putting this case first seems to have been that it's the most common in practice. I doubt this, although it might, perhaps, be the most common in elementary texts (and is all that the Wolfram Math World article discusses).
The real case occurs often when the assumptions for the Poisson distribution are not quite met.
For example, fatal road accidents tend to follow the Poisson distribution, while deaths do not because more than one may occur in a single accident. We get the negative binomial if the number of deaths per accident follows a logarithmic series distribution.
Similarly, people can have different rates for nonfatal accidents, so that repeated accidents to the same person are not Poisson distributed.
I remember the second case from my first statistics course in my first year at university (1964/65). Greenwood and Yule (1920) studied accidents to women manufacturing high explosive shells in World War I. Letting the Poisson parameters for individual workers (representing accident proneness) be gamma distributed gives the negative binomial.
(In our course we started with the negative binomial and an accident proneness variable with an arbitrary distribution. This then gives rise to an integral equation for this distribution which the gamma fitted. High school calculus was a prerequisite for B.Sc. and the course discussed the beta and gamma distributions, but we would not have been able to solve the integral equation except by trying all the continuous distributions we knew.)
I suggest starting with a description such as "the negative binomial is a distribution on the integers 0, 1, 2, 3, ... with two parameters often denoted by p and r where p is between 0 and 1 and r is positive."
Then continue with "The special case when r is an integer is known as the Pascal distribution and represents the number of successes in independent Bernoulli trials before obtaining r failures." I also think the different definitions in the Pascal case should be mentioned here and not relegated to a side bar.
Continue "The more general case (also known as the Polya distribution) occurs in at least two ways." Then list those ways as I described above.
We should also mention in the article (but not in the introduction) that alternative parametrizations for the Polya case are often more useful, for example the mean = pr/(1p) = and r or odds ratio = p/(1p) = and r.
TerryMre (talk) 00:30, 6 October 2014 (UTC)
Incorrect Fisher information for the convention used in this article[edit]
The negative binomial distribution as defined in this article represents the number of successes (k) before r failures. The fisher information is correct for the formulation representing the number of failures (k) before r successes. See the following WolframAlpha calculation:
Basically, p and (1p) need to be switched.
 I agree. I have calculated it myself but I don't know how to change the original.
 Note that this only applies to known r (which usually means that r is an integer). When both r and p are both unknown Fisher's information becomes a matrix which doesn't have a closed form.
 TerryMre (talk) 21:01, 3 December 2014 (UTC)
I also calculated it by myself and it is correct. I edited it now.
Polya naming convention  bioinformatics[edit]
The article head includes the statement, "There is a convention among engineers, climatologists, and others to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integervalued stoppingtime parameter r, and use “Polya” for the realvalued case." I'm wondering whether people think it would useful to note that this convention is not followed in bioinformatics. Specifically, the negative binomial distribution plays a central role in interpreting the counts of Next Generation Sequencing reads as overdispersed Poisson variates. See the deseq package homepage http://bioconductor.org/packages/release/bioc/html/DESeq.html . Note that this is among the top 5% of downloads in bioconductor, bioconductor being something of an industry standard distribution of analysis packages. (here's a citation for that claim http://www.nature.com/nmeth/journal/v12/n2/abs/nmeth.3252.html
I might edit the sentence as follows: "There is a convention in fields such as engineering and climatology (notably excluding bioinformatics) to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integervalued stoppingtime parameter r, and use “Polya” for the realvalued case." or perhaps "There is a convention among engineers, climatologists, and others to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integervalued stoppingtime parameter r, and use “Polya” for the realvalued case (notably, this convention is not observed in bioinformatcs)."
So my question is, is the loss in clarity due to inserting a parenthetical element like that worth the added information? Having written this, I'm not so sure, because it seems like if you know enough to understand (or even wonder about) the relevance of the NBD to bioinformatics, the naming convention issue won't confuse you. And besides, is bioinformatics so important that the exception is worth putting in the head of the article? I mean, it's my field, so it's hard for me to judge its notability. Flies 1 (talk) 00:04, 31 January 2016 (UTC)
Median is hard to find[edit]
I realize the median of this distribution is hard to find (papers exist on the bounds), but that fact should at least be mentioned on the page Barry.carter (talk) 12:26, 24 December 2016 (UTC)
Mixed conventions[edit]
I don't have the stats background or time to go through and figure this out, but my stats professor was looking at this article this morning and told us in class today that, as warned about in §Alternative formulations, the article is using subtly different definitions of the NB distribution in different sections and is accordingly {{selfcontradictory}}. This has evidently been controversial here in the past, but since there's now a stable set of standard and alternate formulations, it would be great if someone went through to check that everything is internally consistent with the given definition. FourViolas (talk) 20:27, 17 October 2018 (UTC)
 I agree this is a very big issue. What we may need is to rewrite the entire page using a more common formulation. I haven't seen any other source use the formulation that this wiki page uses. Every time it is fixed, someone always comes and breaks it again. The problem is that there are many wellintentioned vandals that will come and make very subtle changes to the page, like simply adding a "1", or swap "r and k", or whatever. They don't realize that whatever source they are using has a different formulation than what this page uses, and they are unknowingly breaking the equations and making the page internally inconsistent.
 Even though I drafted the alternative formulations section, I'm not confident I could rewrite the page in a different, more common formulation. I suggest perhaps a formulation where X is counting r failures given k successes, even though I don't personally find this formulation the most intuitive. If someone wants to do this, I fully support them. Ajnosek (talk) 19:29, 17 April 2019 (UTC)
 Yes, this ("wellintended vandals") is a fundamental problem of math on Wikipedia. I really do not know how could it be solved. See also here. Boris Tsirelson (talk) 04:14, 18 April 2019 (UTC)
A diversity of conventions have seemingly always haunted the topic of the negative binomial distribution, and people who learned about the NB distribution is one course often don't know that the version they learned isn't the only one. Michael Hardy (talk) 17:18, 18 April 2019 (UTC)
Is the gif for the pmf wrong?[edit]
Is the gif for the pmf wrong? Because as there is a higher number of successes necessary, the random variable should also increase and thus mean too. It will start looking more normal but also shift right, correct? Plus, when r=40, there is no way that k could be so low. That doesn't make sense to me SwagmanJ (talk) 16:14, 19 June 2019 (UTC)