There are
pairs of sampled individuals - each pair has a
chance of being the same person. Therefore, we can estimate
the "rate of occurrence" of a pair being the same person as
.
Therefore, the number of pairs in the sample that are the same person can be
approximated by Pois(1/2).
Then the probability that there is at least one pair in the sample that are the same person is . This can be verified as a close approximation in R - the probability that every individual in the sample is unique is the last value resulting from the command cumprod(1-(0:999)/1000000), which is .6067. 1 minus this value gives .3933, the actual probability some two sampled individuals are the same person, which is very close to our Poisson approximation.