Data Sampling

There are times where you may look at sampling your data feeds to trade data accuracy for lower costs and faster query speeds. Sampling involves sending only a percentage of your data, then extrapolating the values to get an estimate. This filtered data should be decided in random fashion to not skew or bias the results. Please note, tracking uniques is not recommended if you choose to sample.

If you choose to sample, you should thereafter treat metrics as an indicative of trends, and not as a full representative of the truth. In particular, if you are dealing with money, or key metrics that your own customers care about (impressions, conversions, etc…) then sampling is not recommended.

How to Sample

There are two methods Metamarkets can support to start sampling your data stream. The first and most ideal way to implement is for you to include a "Sampling Factor" that signifies the rate used in each data record. An example of this for a 1% stream would look like:

     "timestamp": 2016-04-11T01:02:03Z,
     "id": 1234,
     "app_id": "54321",
     "app_name": "iOS Test App",
     "impression_count": 1,
     "sampling_factor": 0.01 

From this point, Metamarkets will extrapolate the sampling using that factor value in the query engine. On the dashboard, the formula (hidden to you) would be [impressions (sampled) = impression_count * (1 / sampling_factor)]. This method will allow you to adjust the ratio at any time without needing assistance from Metamarkets.

If you're unable to do this, the other method is to coordinate a change with your Data Engineer. This way involves setting a time to make the change, shutting off the data processing, Metamarkets pushing a code update with the new sampling ratio, and restarting the data pipeline. Both methods will work fine, but the second way is simply more work for all involved.

What to Sample

In general, Metamarkets does not recommend sampling impression or post-impression records, or any records that require a join or are tied to revenue.

Sampling for DSPs is the most common situation where we see Sampling used. We recommend sampling bid requests that did not include a response, which will give you an estimate for available inventory.

Sampling for Exchanges can be done on Auctions without a winning bid present. This will allow you to get an estimate of available inventory, fill rates, and buyer win rates. Metamarkets does not recommended sampling completed Auctions as it becomes difficult to properly account for Impression and Click events. For example, let's say we have 100 Requests & Responses, 20 Impressions, and 20 Clicks. If the Requests are randomly sampled at a rate of 10%, it is entirely possible (and almost very likely to happen) that the Impressions & Clicks passed may not have a Request to join to and be discarded. The dashboard would show a Fill Rate of 0% instead of an expected ~20%. In order to work around this issue, you may need to develop custom logic to partition your data with different rates, such as not sampling impressions, sampling bid losses at one rate, and sampling no bids at another rate.