Animal Science Research Methods: A Practical Guide

Animal science research operates under constraints that don’t exist in physics or chemistry. Your subjects have preferences. They get stressed in unfamiliar environments. They bite. They die during longitudinal studies. And unlike a chemical reaction, an animal’s behavior on Tuesday might be completely different from its behavior on Wednesday for reasons you’ll never identify.

This makes methodology — how you design studies, collect data, and analyze results — the difference between publishable science and expensive noise. Graduate students learn this the hard way, usually by collecting three months of unusable data before their advisor explains what they should have done differently.

This guide covers the methods used across animal science subdisciplines, from behavioral ecology to veterinary clinical research.

Study Design Types

Observational Studies

Observational studies record what animals do without experimental manipulation. You watch, you record, you analyze patterns. These are the foundation of ethology and behavioral ecology.

Focal animal sampling is the most common approach. You select one individual and record all behaviors for a defined period (a focal bout). Typical focal bouts run 5-30 minutes depending on the species and research question. The method works well for primates, ungulates, and social carnivores where individuals are identifiable.

Scan sampling records what every visible individual is doing at predetermined intervals — every 30 seconds, every 5 minutes. It’s efficient for questions about group-level behavior: What proportion of the herd is grazing at any given time? It’s poor for rare behaviors that might happen between scan intervals.

Ad libitum sampling records any notable behavior whenever it occurs, without systematic timing. It’s appropriate for rare events (predation, mating, intergroup encounters) but introduces bias because conspicuous behaviors get over-represented. Use it for generating hypotheses, not for testing them.

Continuous recording captures every behavior and its duration throughout an observation period. The most data-rich approach but also the most demanding. Video recording has made continuous sampling practical for more studies, especially when combined with coding software like BORIS.

Experimental Studies

Experiments manipulate one or more variables to test causal hypotheses. In animal science, this includes feeding trials, drug efficacy studies, housing system comparisons, training method evaluations, and enrichment interventions.

The gold standard is the randomized controlled trial (RCT). Animals are randomly assigned to treatment and control groups. The treatment group receives the intervention; the control group doesn’t. Outcomes are compared statistically.

True RCTs require enough animals per group to achieve statistical power. A power analysis before you start tells you how many subjects you need. Running an underpowered study is a waste of time, money, and animal lives. Use G*Power (free software) to calculate required sample sizes based on expected effect sizes and your chosen significance level.

Crossover designs use each animal as its own control. Animal receives Treatment A, then a washout period, then Treatment B (or vice versa, randomized). This controls for individual variation and requires fewer subjects. It’s commonly used in nutrition studies and pain management research. The limitation: carryover effects from Treatment A might influence the response to Treatment B.

Latin square designs extend the crossover concept to multiple treatments and account for temporal effects. They’re efficient but assume no treatment-by-period interaction, which you should verify.

Quasi-Experimental Studies

When randomization isn’t possible — and in field ecology, it often isn’t — quasi-experimental designs make the best of imperfect conditions. Before-after comparisons (how did animal behavior change after a habitat restoration?) and natural experiments (comparing populations exposed to different conditions by geography or circumstance) fall here.

These designs are weaker for causal inference but sometimes they’re all you’ve got. Report their limitations honestly. Reviewers will notice if you don’t.

Building an Ethogram

An ethogram is a catalog of defined behaviors used for systematic observation. Building a good one is both science and craft.

Start with preliminary observations. Watch your study animals for 10-20 hours before defining your ethogram. You’ll see behaviors you didn’t expect and realize some behaviors you planned to distinguish are impossible to differentiate reliably in the field.

Each behavior definition must be:

  • Mutually exclusive — an animal performing behavior A cannot simultaneously be performing behavior B (within the same behavioral category)
  • Exhaustive — every possible behavior fits into a category (include an “other” category as a safety net)
  • Operationally defined — described in physical terms observable by anyone, not interpreted. “Head lowered below withers with ears pinned” is a definition. “Feels threatened” is an interpretation.

Test your ethogram with inter-observer reliability trials. Two or more observers independently code the same video or observation session. Calculate Cohen’s kappa for each behavior category. Kappa values below 0.60 mean your definitions need work. Below 0.40, the behavior category is essentially unreliable and shouldn’t be included.

A published ethogram should include: behavior name, operational definition, behavioral category (state vs. event), recording rule (frequency, duration, or both), and any modifiers (intensity levels, recipients of social behaviors).

IACUC Protocols: Getting Approval

The Institutional Animal Care and Use Committee (IACUC) reviews and approves all research involving vertebrate animals at U.S. institutions receiving federal funding. This is a legal requirement under the Animal Welfare Act and the Public Health Service Policy on Humane Care and Use of Laboratory Animals.

Your protocol must address:

Justification. Why is this research necessary? Why can’t the question be answered without using live animals? What is the expected benefit relative to the potential harm? These aren’t rhetorical questions. The IACUC evaluates whether the scientific value justifies animal use.

The 3Rs. Replacement (can you use a non-animal model?), Reduction (are you using the minimum number of animals necessary for valid results?), and Refinement (are you minimizing pain, distress, and suffering?). Every protocol must demonstrate consideration of all three.

Pain and distress categories. The USDA classifies animal use into pain categories: Category B (animals held but not used in procedures), Category C (procedures causing no more than momentary pain — vaccinations, blood draws), Category D (painful procedures with appropriate anesthesia/analgesia), and Category E (painful procedures without anesthesia — requires strong scientific justification and is heavily scrutinized).

Veterinary care. The protocol must identify the attending veterinarian and describe provisions for veterinary care, including humane endpoints — the criteria at which an animal will be removed from the study to prevent unnecessary suffering.

Personnel qualifications. Everyone handling animals must be trained. The protocol lists all personnel and their training records.

Plan for the IACUC review to take 4-8 weeks. Revisions are common on first submission. Start the process early — no data collection until you have written approval.

For observational studies of wild animals that don’t involve capture or manipulation, IACUC review may not be required. But check with your institution. Many universities require IACUC review for all vertebrate animal research, including purely observational work, as a matter of institutional policy.

Data Collection and Management

Bad data management has killed more studies than bad statistics. Set up your system before you collect a single data point.

Use structured data sheets. Whether paper or digital, every data point should have a timestamp, observer ID, subject ID, and environmental conditions. Tablets running data collection apps (CyberTracker, KoBoToolbox, or custom apps in ODK) have largely replaced paper in field research, though paper backup is still wise in wet or remote conditions.

Back up daily. A lost field notebook or corrupted SD card can erase months of work. Upload data to cloud storage every day you collect. Multiple backup locations. No exceptions.

Use consistent naming conventions. “Dog_behavior_data_final_v3_REAL_FINAL.xlsx” is a sign of a researcher who didn’t plan ahead. Use a file naming convention from day one: species_project_datetype_YYYYMMDD (e.g., canis_enrichment_behavioral_20260215).

Record metadata. Who collected the data? What equipment was used? What version of your ethogram was in effect? What were the weather conditions? Metadata seems tedious until you’re trying to interpret results six months later and can’t remember whether “moderate wind” meant 10 mph or 30 mph.

Statistical Analysis

Animal science data is almost never normally distributed, rarely meets assumptions of independence, and frequently involves repeated measures on the same individuals. Standard parametric tests (t-tests, ANOVA) are often inappropriate.

Common Analysis Approaches

Generalized linear mixed models (GLMMs) are the workhorse of modern animal science statistics. They handle non-normal distributions (Poisson for count data, binomial for yes/no data), include random effects for individual animals and repeated measures, and accommodate nested designs. The lme4 and glmmTMB packages in R are standard tools.

Survival analysis (Kaplan-Meier curves, Cox proportional hazards models) applies whenever your outcome is time-to-event: time to adoption, time to recovery, survival duration. It handles censored data — animals still alive at the end of the study — which standard tests can’t.

Non-parametric tests (Mann-Whitney U, Kruskal-Wallis, Wilcoxon signed-rank) remain useful for small sample sizes where GLMM assumptions can’t be verified.

Multivariate methods (PCA, discriminant function analysis, cluster analysis) help when you’re measuring many behaviors or variables simultaneously and need to reduce dimensionality or identify groupings.

Common Mistakes

Pseudoreplication. If you have 10 animals in each of two housing conditions but only one pen per condition, your sample size is 2, not 20. The pen is the experimental unit, not the individual animal. This mistake appears in published literature disturbingly often.

Multiple comparisons without correction. Running 20 statistical tests at alpha = 0.05 means you expect one significant result by chance alone. Bonferroni correction is conservative but safe. False Discovery Rate (Benjamini-Hochberg) is less conservative and increasingly accepted.

Confusing statistical significance with biological significance. A p-value of 0.03 tells you the result is unlikely due to chance. It tells you nothing about whether the effect matters biologically. Always report effect sizes alongside p-values. A statistically significant difference of 0.5 beats per minute in heart rate is biologically meaningless.

Building applied behavior analysis skills helps researchers frame better research questions, while effective study strategies can make the steep statistical learning curve more manageable.

Reporting and Publication

Animal science journals increasingly require adherence to reporting guidelines. The ARRIVE 2.0 guidelines (Animal Research: Reporting of In Vivo Experiments) provide a checklist for transparent reporting of animal research. Most journals recommend or require ARRIVE compliance.

Key reporting elements:

  • Sample sizes and how they were determined (power analysis)
  • Exact statistical tests used, including software and version
  • All results, including non-significant findings
  • IACUC approval number
  • Full description of housing, husbandry, and animal source
  • Whether blinding and randomization were used, and if not, why not

Pre-registration — publicly recording your hypotheses and analysis plan before collecting data — is gaining traction in animal science. The Open Science Framework (OSF) hosts free pre-registration. It doesn’t prevent exploratory analysis, but it distinguishes confirmatory from exploratory findings, which increases the credibility of your results.

Frequently Asked Questions

Do I need to know programming for animal science research?

Increasingly, yes. R is the de facto standard for statistical analysis in animal science publications. Python is used for data processing, image analysis, and machine learning applications. You don’t need to be a software developer, but functional proficiency in R — importing data, running GLMMs, generating plots — is expected for graduate-level research.

How do I get IACUC approval for observational wildlife studies?

Submit a protocol even if you think it’s exempt. Your institution’s IACUC will make the exemption determination — you don’t get to make it yourself. For purely observational studies with no capture or disturbance, the review is usually expedited. Include a description of your observation methods, potential for disturbance, and how you’ll minimize impact on study subjects.

What sample size do I need for a behavior study?

Run a power analysis using G*Power before you start. Required sample size depends on your expected effect size, chosen significance level (typically 0.05), desired power (typically 0.80), and the variability in your response variable. As a rough guide: for comparing two groups with a medium effect size (d=0.5), you need about 64 animals per group for a standard t-test. Mixed models can work with fewer subjects if you have many observations per subject.

What’s the difference between an ethogram and a behavioral inventory?

An ethogram is a structured catalog with operational definitions designed for systematic data collection. A behavioral inventory is a broader list of behaviors a species is known to perform, compiled from published literature. You use a behavioral inventory as a starting point to build an ethogram specific to your research question. Your ethogram will typically include only a subset of the species’ full behavioral repertoire — the behaviors relevant to your study.

How important is inter-observer reliability testing?

Non-negotiable. If two observers can’t agree on whether a behavior occurred, your data is meaningless regardless of your sample size or statistical methods. Test reliability before data collection begins, and re-test periodically throughout the study. Report reliability statistics (Cohen’s kappa or intraclass correlation coefficient) in your methods section. Reviewers will ask for them, and they should.

Dr. Sarah Mitchell

Articles by Dr. Sarah Mitchell

Plan Your Next Career Move

Career Quiz Salary Calculator Compare Platforms

Found this useful? Get more like it.

Weekly career insights for animal welfare and education professionals. No spam.