Human rights stats, part 1

I follow @simplystats on Twitter, and on March 1 they had a post that linked to an article in Foreign Policy about a guy who has the coolest job in applied stats. He works here.

The original piece described a quick algorithm that you can use to estimate the number of human rights violations using a technique first devised for counting fish in a pond. The gist of it is this: catch and release fish over two days. Tag the fish caught on the first day. Count each day's catch and the number of fish caught twice. That is the overlap. To estimate the number of fish in the pond, multiply the two days' catches and divide the total by the overlap.

I had a data set of insurance claims in Stata's memory at the time of my reading, with observations uniquely identified by a variable named claim_id.

I decided to use it as the model of a pond with as many fish in it as observations in my data set, so I wrote a little fishing program. It takes one argument: some round upper bound of the number of fish I might catch in a day. I'll call it n. It can be 100, or it can be 1,000. Here:


// try MSE
capture prog drop guessObservations
program guessObservations

args n // upper bound of a day's catch.

qui {
   local day1fishcount=int(runiform()*`n')
   local day2fishcount=int(runiform()*`n')

   forvalues i=1/2 {
      preserve
      tempfile day`i'fishlist
      sample `day`i'fishcount', count
      keep claim_id
      save "`day`i'fishlist'", replace
      restore
   }

   preserve
   drop _all
   use "`day1fishlist'"
   merge 1:1 claim_id using "`day2fishlist'"
   count if _merge==3
   local overlap=r(N)
   restore

   local totalfish=`day1fishcount'*`day2fishcount'
   if `overlap'>0 {
      local totalfish=`totalfish'/`overlap'
   }
   count
   local truect=r(N)
}

local fmt _col(30) %10.0fc
di ""
di "Fish caught on day 1:" `fmt' `day1fishcount'
di "Fish caught on day 2:" `fmt' `day2fishcount'
di "Overlap:"              `fmt' `overlap'
di "Estimate:"             `fmt' `totalfish'
di "True count:"           `fmt' `truect'

end

My data set has some 150,000 observations. Choosing a small n, say guessObservations 100, sets me up for an overlap of zero, but even so the two catches multiplied together won't even come close to the true size of the population. This is a technique for counting hungry fish in a small pond, not in an ocean. The size of the daily catch should be representative of the total, so you can have some decent overlap.

Setting n=1,000 keeps it small enough relative to the total population that it's still possible to have zero overlap, but n is now large enough to overshoot wildly in that case. If I catch 900 fish each day with zero overlap, I will guess that there are 810,000 fish there. However, an overlap as small as 5 will get me pretty close to the true population.

Setting n=10,000 performs much better. I may still have a day when the fish won't bite, and get this:


. guessObservations 10000

Fish caught on day 1:                49
Fish caught on day 2:             4,182
Overlap:                              3
Estimate:                        68,306
True count:                     157,638

But with any luck, I will probably get this:


. guessObservations 10000

Fish caught on day 1:             9,662
Fish caught on day 2:             3,220
Overlap:                            220
Estimate:                       141,417
True count:                     157,638

The larger n, the larger the overlap, and the better the precision. That makes sense: in the limit, the true number times itself divided by itself will yield the true number.

But does n have to be very large relative to the size of the population? And does my guess -- or the uncertainty surrounding it -- depend on what probability distribution function I assume for the daily catch? Next time I'll be doing some simulations.

Human rights stats, part 1

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Sarah Samis, Emil Bove III

Colombo Mob Capo ‘Jersey Sal’ Profaci Cashes In Chips, Legendary NY Mafia...

ZARIA CUMMINGS

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

スタンバイモードでのトランザクションログのリストア時にエラー 9004 が発生する事象について

Sunny Garcia’s Ex-Wife Colleen McCullough

99 God Status for Whatsapp, Facebook

Download/Watch: Mampi – Nyula Yako (official video)

MHDD

Windows Update / Microsoft Update の接続先 URL について

Eureka S01-S05 1080p BluRay DD5.1 H265-d3g

Black Angus Grilled Artichokes

Tasila’s Mother Dumped Edgar Lungu For A White Man In USA

Windows Time サービスの ID 36 の警告。これって無視しても大丈夫ですか？

Steinberg Cubase Pro 14.0.32 Incl V.R Unlocker

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Pumped up UDA chief ordered teen to be shot

SAP Fiori launchpad is empty with error : unable to load Group