Originally Posted by smurgerburger View Post
I think the proper solution may involve using stat software to fit a curve to the data or something like that.

What I would try is to use the most stable data point, the % of tickets that never dupe, and work backwards from that.

So if 70% of the (drawn and returned) tickets never duped over 100 trials that would imply (ignoring variance), that the chance of not duping a given ticket in 100 draws is 70%.

.7^(1/100) would be the chance of not duping in 1 trial which is .9964. Then 1 - .9964 gives the chances of duping in 1 trial = .00356, which is 279:1, for 280 total tickets.

This approach ignores all the extra information you have and is still going to be subject to variance (because that 70% figure is subject to variance).

I think the way you use the extra information of double dupes and triple dupes etc to mitigate variance is curve fitting. But idk.
Yea, this might be just as good of an estimate as anything. It would definitely be the most straighforward.

My solution was just to sim a ton of them then find the ones with the same results. So if I simmed with 50 then print out results like 71,15,8,6 (for 0, 15 of 1 duplicate, 2 with 3 of the same etc for each trial.. Then just count the exact matches of each simmed pool size with our 1 sample. It would at least could remove variance on the testing side but there will always be the variance in the observed duplicates.

I'd have to think more about your method. It sorta makes sense but I'm not sure how good of an approximation it would be. However, writing a sim to test the 1 size estimated by the above amount would actually be quite easy. Hmm maybe when I have an hour or so free.

The curve fitting approach also seems correct to me but that is over my head. Maybe fresh out of college I'd have some clue..