This presentation discusses the performance of sampling plans
for various distributions of organisms.
My name is Marcel Zwietering.
I am professor of food microbiology at Wageningen University and I am a member of ICMSF since 2005
For a homogeneous contamination, we see here the probability to accept a batch for
various numbers of samples and for various defective rates.
With higher numbers of samples, batches with a higher defective rate are generally rejected.
Batches free of the pathogen are accepted, so that is all good.
But batches with a low defective rate have a high probability to be accepted, even with
large numbers of samples.
You can also describe this table in graphs, and you can see here the probability of accepting
a lot at various defective rates and for various numbers of samples, and you can see
that if you have a certain defective rate and you take only 1 sample, the yellow line,
or you take 5 samples, the violet line, or 30 samples, the grey line, your acceptance rate largely goes down.
In this case 5 times more samples gives a factor 7.7 lower acceptance rate.
And this also shows a misconception
"Using a realistic sampling plan, it is possible to test for absence of a pathogen in a batch of food".
You only test for absence in the samples, and unless you test all foods, you can never
prove absence in the batch.
If we are at the very left part of the graph, so the organism is present, but at very low
levels, even sampling plans with large numbers of samples still let this go undetected.
We have also seen in sampling plans that the number of c can exist:
the number of samples that are allowed to be positive.
We can also derive an operating characteristic curve for a sampling plan with 10 samples
and determine what the effect of the c value is.
This type of sampling plans is not used for serious or severe pathogens, since we do not
want them in the samples at all, but can be used for example for hygiene indicators.
In this case again the operating characteristic or OC-curve depends on the proportion defective,
on the number of samples, but also on the c value
and these curves can be determined with the binomial distribution
And this then results in another misconception, the misconception is that
"current sampling plans assume that microorganisms follow the binomial distribution".
This is not what is assumed.
The microorganisms are here homogenously distributed, or randomly.
But the result of the 10 samples follows the binomial distribution since we take 10 samples
each with a certain probability of being defective and we accept c to be positive, and that
process is simply following the binomial theory. So it is not the microorganisms that are then assumed
to be binomially distributed but it is the outcome of the stochastic sampling process.
And that goes directly yet to another misconception, that for sampling plans it is assumed that
microorganisms are homogeneously distributed in a batch.
Until now indeed I have made that assumption for the first step of the explanation, but the
explanation goes further.
In reality, the performance of sampling plans is often investigated for non-homogenous distributions.
If we can assume a homogenous distribution, we would not take 30 samples of 10 g each;
rather, we could simply take 300g of sample, since the detection probability will be equal
for the homogenous distribution.
That would be much easier.
But since we know that organisms are often NOT homogenously distributed
we take more smaller samples.
Let's look at this now for a heterogeneous high-level contamination
that can be quantified or counted.
If we have a distribution of microorganisms in a batch of food, and we have a probability
distribution of concentration.
Look, for example, here at the red curve, that has a certain mean log, spread, and we
also have a microbiological limit and that is now exactly in the middle of the red curve at 2 logs,
think for example about Listeria monocytogenes.
If you would take one sample from that red curve you would have 50% probability that
it would be below the limit and 50% probability that it would be above.
And therefore, in the right curve of the probability of a defective sample unit for a mean log
concentration of 2 for the red curve we place a point at 50%.
If we have on the other hand the blue distribution of the microorganisms at 1.5 as mean log
the probability that a sample will be above m will be lower, at about 10%, and for
the pink curve it will be higher, at about 90%.
These points we also place then in the right graph.
This results then in the probability of ONE sample to be above the m value as function
of the mean log concentration in the batch.
But again, in the sampling plan we take several samples so we can make use of this curve now
being on the left, to determine the OC curve for a higher number of samples, again making
use of the binomial function including our n and our c value.
And then we can see what type of levels of concentration will most probably be tested
as acceptable and what concentration will be most probably result in rejection.
So, the procedure consists of three steps, first, the description of the distribution
of the concentration; second, the probability that ONE sample from that distribution results
in a defective sample; and third, the acceptance probability of the sampling plan, the OC-curve.
We can now determine the OC curve for various
distributions of microorganisms.
If we have, for example, batches with equal mean log value but different spread we can
determine the effect on the OC-curve.
The yellow curve, for a process that is very well under control, results in a very steep OC curve;
conversely the brown curve that has a very wide distribution, we reject batches
already at a very much lower mean log concentrations.
You have a lower probability to accept such a batch, which is a good thing,
since it is less under control.
All these calculations are based on the log mean concentration,
also called the log geometric mean concentration.
But we can also determine what this looks like if we plot it as function of the log
arithmetic mean concentration.
The geometric mean of all the curves on the left is equal.
But the overall numbers of organisms in the brown curve is much higher and that is better
represented in the arithmetic mean, that is much higher in the brown curve
than in the red or the yellow curve.
Then if we first take the mean and then take the logarithm, you get another type of OC-curve
and the performance for the various distributions becomes much closer, showing less effect of
the spread of the distribution.
These are graphs for quantitative sampling plans.
But let's look now also at sampling plans where the organisms are again not homogenously distributed
and are at very low levels -- situation c.
This is relevant for presence absence testing for Salmonella or Cronobacter.
Here we must consider not only the distribution of the organisms, which is lognormal;
but also the probability that given a certain concentration of the pathogen there is actually
one organism present in the sample unit.
And this last part is a Poisson process.
In that case, we can again calculate the OC curve for various batches with varying spread
of the contamination and we see that the performance on the arithmetic scale is even closer for
the various standard deviations than for the quantitative case.
And that is very good news because often we do not know this standard deviation.
Often a value of 0.8 is used, and for well mixed food products 0.4
and for very heterogenous material we chose 1.2. But it are all estimates,
but we see here that is does not matter that much for the OC curve.
So, I have explained three relevant stochastic phenomena.
First, the actual spatial distribution of the organism in the batch of food, governed
by microbial processes or by processing, and that results in a specific statistical distribution.
The second is the stochastic process of taking one sample and whether or not that sample is defective.
The third part is then the acceptance of a lot based on taking n samples, of which c may be positive
and each sample having a Pdefective, what can be calculated with the binomial equation.
This makes the statistics a little bit complex.
But in reality these three phenomena exist and need to be taken into account.
Let's look at an example. When the actual distribution of organisms is log normally
distributed in the batch of food, and the whole batch is sampled and tested, a frequency
distribution of the number of organisms in the batch yields a normal distribution for
the log number of organisms.
If we take a sample, which is a Poisson process, the probability of a defective follows the
Poisson(lognormal) distribution.
But if I then use a sampling plan with 30 samples of which zero or two or 5 are allowed
to be positive, I can calculate the probability of acceptance of the sampling plan with the
binomial(Poisson(Lognormal)) distribution.
The lognormal is the actual distribution, the Poisson is the taking a sample, and
the binomial is the overall sampling plan.
So, these three separate aspects are described by their appropriate statistical function.
While this is complicated, fortunately tools can help you to calculate these probabilities,
but it is good of course to understand the basis of these calculations.
One of these tools is the ICMSF sampling plan tool, that can be found on the ICMSF website,
www.icmsf.org.
Another tool is developed by the FAO WHO and yet another developed by the University of
Cordoba from the Baseline project.
And all these tools have similar approaches to handle the phenomena I have described today.
When you input your data in the ICMSF spreadsheet, it will automatically produce an interpretation
sentence about the performance of the sampling plan.
There are sheets for absence presence tests, and for quantitative tests both for 2 and
3 class sampling plans.
Furthermore, for absence presence tests, it is also possible to investigate the effect
of specificity and sensitivity on the performance of the sampling plans.
To summarise, sampling and testing plays only a limited role in control of food safety,
but it has a role in verification.
The distribution of organisms can have an effect on the performance of sampling plans.
However, for log normally distributed organisms, use of the arithmetic mean of the number of
organisms to evaluate the performance of the sampling plan
reduces the effect of the spread of the distribution.
Tools exist to perform the calculations, for example at the ICMSF website.
But understanding of the underlying assumptions is necessary to intelligently use these tools.
Further information and in depth elaborations can also be found in the ICMSF Books 7 and 8.
Thank you for your attention.
No comments:
Post a Comment