# sample size for incidence rate

Step 3: Participation rate n''' =n'' x (100 + (1-pr)) • Description: – n''' = required sample size correcting for participation rate – n'' = previously calculated sample size – pr = participation rate • In most prevalence TB disease surveys a participation rate of 85% seems reasonable ... • Sample size planning aims to select a sufficient number of subjects to keep αand βlow without making the study too expensive or difficult. Within each study, the difference between the treatment group and the control group is the sample estimate of the effect size. Some of the magnitude of this discrepancy might be due to a difference between incidence and prevalence, for example if this is a long-term condition and the value of 0.1% for the general population that you cite is truly an incidence rate (say per 100,000 people per year) and the 10% value you have estimated from your retrospective data is prevalence. A good maximum sample size is usually 10% as long as it does not exceed 1000. But for the results to be interpretable in terms of the general population, you would have to document that both the disease cases and the non-disease cases in your "source population" are representative of what's in the general population. Incidence Rate of Disease = (n / Total population at risk) x 10 n. Where. n - Total no of new cases of specific disease. Since the population size is always larger than the sample size, then the sample statistic. Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In this paper we derive methods for determining sample sizes for cross-sectional surveys to estimate incidence with sufficient precision. A random sample is one in which every member of a population has an equal chance of being selected. Also saw I had missed that the retrospective rate cited by the OP was probably a prevalence rather than an incidence. So if you wish to make any statements about the general population rather than just the "source population" that underlies your retrospective data, you must take the difference between the populations into account. While in the data I have for the retrospective research it is around 10%, due to the way the data for the research was collected. In general, capital letters refer to population attributes (i.e., parameters); and lower-case letters refer to sample attributes (i.e., statistics). I suspect that what you have estimated from your retrospective data is "prevalence," not "incidence." z = 1.645, p = 0.5, e = 0.04 Hypothesis tests. My response was mostly based on my experience/frustration with working on retrospective clinical databases, which has occupied much of my attention for several years. If that group of patients is your source population then you should use the characteristics of those patients as your guide to study design. The sample size (n) can be calculated using the following formula: n = z 2 * p * (1 - p) / e 2 where z = 1.645 for a confidence level (α) of 90%, p = proportion (expressed as a decimal), e = margin of error. I can get an fixed (quite low) number of samples, which practically forces me to oversample the disease cases. Understanding HIV incidence, the rate at which new infections occur in populations, is critical for tracking and surveillance of the epidemic. As stated previously, we normall approximate 1.96 by 2. As the above paper notes on page 395: ... some prevalence studies may involve sampling on exposure status, just as some incidence studies may involve such sampling. Among other things, you then need to see whether there have been changes over time in incidence/prevalence or in the characteristics/risk factors of the retrospective-patient "source population." You cite a 100-fold difference in "incidence" between the population from which you are sampling and the general population. We therefore want s p 1(1−p 1)+p 2(1−p 2) n ≈ 0.02/2 = 0.01 To work out the required sample size, we usually take p 1 = p 2 = the value closer to 0.5, since this would give rise to a larger standard error and therefore a larger sample size. Enrolling too many patients can be unnecessarily costly or time-consuming. Given the apparently large difference in prevalence/incidence that you note and my experience with analysis of retrospective clinical data, my guess is that the characteristics of the non-disease cases in your data will a good deal different from the general population and that you will have to take that difference into account in your study. Sample size is a frequently-used term in statistics and market research, and one that inevitably comes up whenever you're surveying a large population of respondents. In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Before a study is conducted, investigators need to determine how many subjects should be included. When comparing groups in your data, you can have either independent or dependent samples. The problem you face, as noted in a comment on your question, is extrapolation to the general population. The uncertainty in a given random sample (namely that is expected that the proportion estimate, p̂, is a good, but not perfect, approximation for the true proportion p) can be summarized by saying that the estimate p̂ is normally distributed with mean p and variance p(1-p)/n. Maybe it would be wiser to approach it as a case control study and aim for odds ratio instead of risk ratio goal. Clinical databases (in the US at least, where there is no common medical-record system) typically represent people who have presented to a specific clinical practice or hospital for treatment. They thus might not well represent the broader population, in many critical respects. Because the rate of outcome is usually smaller than the prevalence of the exposure, cohort studies typically require larger sample sizes to have the same power as a case-control study. That convention refers to a different situation: it refers to the usual minimum sample size required for the Central Limit Theorem to apply. This calculator uses a number of different equations to determine the minimum number of subjects that need to be enrolled in a study in order to have sufficient statistical power to detect a treatment effect. One study cohort will be compared to a known value published in previous literature. With this information, I am asked to inflate the sample size to accommodate the incidence rate, reachable rate, and response rate anticipated. In this article, we derive methods for determining sample sizes for cross-sectional surveys to estimate incidence with sufficient precision. Example: In a hospital, there are 3 total number of new cases of specific disease and total population risk is 2. In order to use statistics to learn things about the population, the sample must be random. Generally speaking, statistical power is determined by the following variables: To calculate the post-hoc statistical power of an existing trial, please visit the post-hoc power analysis calculator. Population Sample Size (n) = (Z 2 x P(1 - P)) / e 2 Where, Z = Z Score of Confidence Level P = Expected Proportion e = Desired Precision N = Population Size For small populations n can be adjusted so that n(adj) = (Nxn)/(N+n) This sampling scheme does not change the basic study type, rather it redefines the population that is being studied (from the entire group of workers in the factory to the newly defined subgroup). Although it might be possible to use retrospective data to examine incidence, if you simply collect retrospective data on a set of patients and determine the fraction of them that had the condition, you are examining prevalence not incidence. By enrolling too few subjects, a study may not have enough statistical power to detect a difference (type II error). In single-institution retrospective analysis, trying to get a larger sample size generally means going back farther in time for more cases. Because the population is pre-qualified, the incidence rate is 100%. The estimated effects in both studies can represent either a real effect or random sample error. A good maximum sample size is usually around 10% of the population, as long as this does not exceed 1000. This formula can be used when you know and want to determine the sample size necessary to establish, with a confidence of , the mean value to within . ... all epidemiological studies are (or should be) based on a particular population (the 'source population') followed over a particular period of time (the 'risk period'). X refers to a set of population elements; and x, to a set of sample elements. While in the data I have for the retrospective research it is around 10%, due to the way the data for the research was collected. Confidence level is closely related to confidence interval (margin of error). With this sample we will be 95 percent confident that the sample mean will be within 1 minute of the true population of Internet usage. Most statisticians agree that the minimum sample size to get any kind of meaningful result is 100. It is important to note, however, that a larger total sample size will be required the further the sampling ratio is from 1. The known (previous research) incidence rate in general population is very low, 0.1%. Formula: Incidence Rate of Disease = (n / Total population at risk) x 10 n. You might think about your situation as over-sampling the disease cases, similar to what's described in the preceding quote. The mathematics of probability prove that the size of the population is irrelevant unless the size of the sample exceeds a few percent of the total population you are examining. It requires that every possible sample of the selected size has an equal chance of being used. If your population is less than 100 then you really need to survey all of them.