Tutorial 4: Sampling Design
Interactive Guide to Research Sampling Methodology
Prakash Ukhalkar
Welcome to the Interactive Guide to Sampling
This guide is designed to help you, as a student researcher, understand one of the most fundamental concepts in research: sampling. In research, we want to know things about a large group of people or items (a "population"), but it's often impossible or too expensive to study every single one.
Sampling is the process of selecting a smaller group (a "sample") to represent the larger one. The way we choose this sample is critical to whether we can trust our research findings.
How to Use This Guide
Use the navigation menu on the left to move through the key topics. We've added detailed examples and visualizations to make each concept clear.
- The basic difference between a Census and a Sample (Section 1).
- The five-step process for designing a robust sample (Section 3).
- Detailed, real-world applications and mechanisms for all major sampling methods (Section 5).
- An overview of the sample size formula and key variables (Section 6).
1. Census vs. Sample Survey (Visualized)
This graphic illustrates the core relationship: the sample is always contained within the population. The Census targets everything, while the Sample targets only the small, selected subset.
The Concepts
Population (N) - The Whole
The entire group of people or objects you are interested in. A census attempts to study this whole group.
Sample (n) - The Part
A small, carefully selected subset of the population. Sampling aims to make inferences about the whole based on this part.
Key Takeaway:
A census eliminates sampling error but is slow and costly. A sample is fast and cheap but introduces a margin of error that must be minimized through good design.
2. Implications of a Sample Design
Your sampling design is the blueprint for your research. Its implications are massive. A poor sampling design can invalidate your entire study, no matter how well you analyze the data.
Implications of a Sample Design
- Statistical Reliability: A strong design allows you to calculate the confidence interval and margin of error, which are essential for statistical inference.
- Generalizability: The ability to confidently project findings from the sample back to the population is entirely dependent on the method used.
- Control of Bias: A poor design introduces selection bias or non-response bias, leading to inaccurate conclusions that don't reflect the true population.
- Resource Allocation: A good plan ensures you spend your time and budget collecting data efficiently, rather than wasting resources on a sample that is too large or flawed.
Characteristics of a Good Sample Design
- Representativeness: The sample must accurately mirror the characteristics of the population in key areas (e.g., demographics, behavior).
- Validity & Feasibility: The chosen method must be logically sound and practically executable given real-world constraints like budget, time, and access to data.
- Small Sampling Error: The design should minimize the chance difference between the sample result and the population parameter (i.e., aim for a low margin of error).
- Objective Selection: The selection process should be systematic and quantifiable, relying on random chance (for probability methods) rather than the subjective judgment of the researcher.
3. Steps in Sampling Design: A 5-Step Flow
This visual flow chart breaks down the logical sequence researchers must follow when creating a sampling plan, moving from abstract definition to physical execution.
Step 1: Define Your Population
Clearly establish the target population (who or what is being studied) and the sampling unit (the item being selected, e.g., an individual, a household, a transaction).
Step 2: Identify the Sampling Frame
Obtain the actual list or database (the sampling frame) from which the sample members will be drawn. If no perfect frame exists, acknowledge its limitations.
Step 3: Determine Sample Size (n)
Calculate the required number of units based on the desired level of confidence and margin of error.
Step 4: Select the Sampling Method
Choose between Probability (random) or Non-Probability (non-random) methods based on cost, frame availability, and research objective.
Step 5: Execute and Validate the Sample
Apply the chosen selection rules and collect the data. Validate the sample's demographics against known population data if possible.
4. Criteria for Selecting a Sampling Procedure
Choosing the right procedure is critical. The decision is never based on a single factor, but rather a trade-off between statistical rigor and practical feasibility.
1. Research Objective & Need for Inference
If the goal is to conduct a quantitative study and make verifiable claims about a large population, Probability Sampling is mandatory. For qualitative exploration, Non-Probability methods may suffice.
2. Resources (Time and Cost)
Probability sampling is typically more demanding and expensive. Non-Probability methods are fast and cheap, used when resources are limited.
3. Availability of a Sampling Frame
You cannot use Simple Random or Systematic Sampling without a complete population list. If unavailable, use Non-Probability methods like Snowball sampling.
5. Types of Sample Designs
Sampling methods fall into two categories: Probability (random, scientifically valid) and Non-Probability (non-random, exploratory). Use the interactive visualizer below.
Interactive Probability Sampling Visualizer
Click a method above to see its visual mechanism.
Detailed Examples
Simple Random Sampling (SRS)
Examples (Random Selection):
- Customer Feedback: An e-commerce company uses a computer program to select 1% of all customer email addresses to receive a satisfaction survey.
- Quality Control: An auditor assigns a unique number to every financial transaction for the year and uses a random number generator to select 300 transactions for review.
- Student Study: A professor uses the last two digits of student ID numbers and selects all students whose numbers end in '25' to participate in a campus study.
Systematic Sampling
Examples (Interval Selection):
- Public Transit: To estimate daily ridership, a research team selects a random starting time and then interviews every 10th person exiting a subway station.
- Inventory Check: A warehouse manager needs to physically count 5% of 2,000 pallets. They set their sampling interval (k=40), pick a random starting pallet, and count every 40th pallet thereafter.
- Web Traffic: A website logging system is set to capture the data for every 50th visitor who loads the homepage to measure average session duration.
Stratified Sampling
Examples (Subgroup Randomization):
- Political Polling: Voters are stratified by political affiliation (Democrat, Republican, Independent). A proportionate random sample is taken from each group to ensure accurate representation of the total electorate.
- Media Consumption: Researchers stratify their population by primary media source (Newspaper, TV, Social Media). They then randomly select 200 people from each stratum to compare the influence of different media types.
- Financial Analysis: A bank stratifies its customers by account type (Checking, Savings, Investment) and randomly selects 5% from each stratum to study cross-selling opportunities.
Cluster Sampling
Examples (Group Selection):
- Healthcare Study: To study vaccination rates in a large county, researchers randomly select 10 health clinics (clusters) and survey all registered patients at those selected clinics.
- Educational Performance: A state education department randomly selects 50 public high schools (clusters) and tests all 10th-grade students in those 50 schools.
- Manufacturing Audit: A company with 20 production lines divides its monthly output into weekly batches (clusters). They randomly select 4 weekly batches and test every product produced in those 4 batches.
Convenience Sampling
Examples (Ease of Access):
- Usability Testing: A UX researcher asks colleagues in their office to test a new app feature because they are readily available, even though they aren't the target users.
- Campus Research: A student stands outside the main library and interviews the first 50 people who agree to talk to them about campus meal plans.
- Online Poll: A blogger puts up a poll on their website asking their regular readers what topic they want next.
Judgmental (Purposive)
Examples (Expert Selection):
- Financial Forensics: An investigative team needs to find the root cause of fraud and selects only three high-risk accounts and the corresponding three account managers they suspect are involved.
- Product Development: A company creating a new tool for specialized mechanics deliberately selects a panel of 15 mechanics who are known industry leaders and highly critical of existing tools.
- Historical Research: A historian chooses five specific diaries and four specific letters from a huge archive because those documents cover the exact time period and region they are studying.
Quota Sampling
Examples (Filling Predetermined Slots):
- Exit Polling: An interviewer is told to survey 50 women under 40, 50 women over 40, 50 men under 40, and 50 men over 40. They choose the people until the quota is met, often by convenience.
- Retail Study: A research firm is told to recruit 40 participants, ensuring 20 are frequent shoppers (weekly visits) and 20 are infrequent shoppers (monthly visits).
- Media Survey: A news outlet sets a quota to get feedback from 30 participants from the East Coast and 30 participants from the West Coast, using online ads until the quotas are filled.
Snowball Sampling
Examples (Referral Chain):
- Rare Disorder: A study on a very rare genetic disorder finds one patient, who then refers the researcher to a small network of other patients they know.
- Niche Professionals: A researcher studying the career trajectory of retired deep-sea welders starts with one contact, who then provides links to their entire professional network.
- Undercover Groups: A sociological study of an exclusive or private social club gains access by interviewing one founding member, who then introduces the researcher to the next level of membership.
6. Determining Sample Size
Sample size calculation connects your practical decisions (margin of error) to statistical goals (confidence level).
Sample Size Formula (For Proportions)
n = Required sample size
Z = Z-score (confidence level)
p = Population proportion (0.5 for maximum)
e = Margin of error (e.g., 0.03 for 3%)
Interactive: Z-Score and Confidence
Higher confidence requires a larger Z-score, increasing the required sample size.
Z-score for this confidence level: 1.960
A typical national poll uses 95% confidence and 3% margin of error (≈1,067 sample size).
Practical Example: University Student Satisfaction Survey
Scenario:
A university wants to conduct a student satisfaction survey to determine what percentage of students are satisfied with campus facilities. The university has 15,000 enrolled students.
Step-by-Step Calculation:
Given Information:
- Population (N): 15,000 students
- Confidence Level: 95%
- Margin of Error (e): ±4%
- Population Proportion (p): 0.5 (maximum variability)
Formula Values:
- Z-score: 1.96 (for 95% confidence)
- p: 0.5
- e: 0.04 (4% as decimal)
- (1 - p): 0.5
Calculation Process:
Interpretation:
To achieve a 4% margin of error with 95% confidence, the university needs to survey at least 601 students.
This means if the survey shows 70% satisfaction, we can be 95% confident that the true satisfaction rate for all 15,000 students falls between 66% and 74% (70% ± 4%).
What if we want higher precision?
±2% margin of error:
n = (1.96² × 0.5 × 0.5) / 0.02² = 2,401 students
Higher precision requires much larger sample
What if we accept lower confidence?
90% confidence (±4% margin):
n = (1.645² × 0.5 × 0.5) / 0.04² = 422 students
Lower confidence allows smaller sample