Evaluation Design | Healthy Marriage & Responsible Fatherhood Resource Site for 2020 Grantees

Questions and Resources for Grant Applicants to Consider When Developing Their Local Evaluation Plans

This page provides tips and resources for grant applicants to review as they plan local evaluations for Responsible Fatherhood or Healthy Marriage programs. Each item in the list below expands into a full section discussing issues that arise.

In each section:

First, consider a set of questions that prompt you to dig deeper;
Then, click on the links after the questions, which lead to resources with more information.

The glossary provides definitions of many key terms related to program evaluation.

It is strongly recommended that proposed grantee staff and local evaluators review these items together.

Selecting a research topic of interest. In what ways will your local evaluation be important to you and to your stakeholders? For example, will evaluation results help you plan future programming? Think about your proposed program, and consider the multiple parts of it, such as the format and approach, length of services, type of services provided, populations served, and how a participant finds out about the program, signs up, participates, and completes it. In which issues are stakeholders particularly interested? What part of the program would you like to learn more about? Is there anything innovative you are proposing that you would like to evaluate? For example, are you using a new recruitment method or workshop curricula that you would like to evaluate? Are you interested in whether the program as a whole or a particular component is effective?

Check whether your research area of interest fits into one of the topic areas outlined in the Funding Opportunity Announcement; that is, recruitment and program participation (for example, target population identification/screening, participant recruitment, participant engagement), programming (for example, program components, program structure, primary workshop participation, cultural competency, grant-funded participation supports (for example, legal supports, child support assistance, mental health supports, housing assistance, health assistance), and overall program effectiveness.

Defining the research questions. What are the research questions you have about the topic of interest? Do you want to know whether the program overall, a particular service, or another component of the program (such as recruitment methods) is effective (an impact evaluation)? Do you want to identify whether participants in the program have changed over time (a descriptive outcomes evaluation)? Do you want to learn more about and document program operations (a descriptive implementation evaluation)?

Defining the “program of interest” i.e., the part of your program that you want to assess. Be sure your research question clearly states the specific program component(s) (including recruitment/participation method, and grant-funded participation supports) you’d like to investigate.

Identifying a research design. A research design should flow from a specific research question, and the outcomes of focus should link to the program logic model. See the section on identifying outcomes, below, for more information on the links between the program logic model, research questions, and outcomes of focus. The figure linked here illustrates some of the high-level decisions you will need to make to choose the appropriate design. Based on information in the figure, do your research questions align with the proposed design?

Use the table below to double-check your design: does the focus of your evaluation and your evaluation design match what you propose?

Types of evaluation

Focus of evaluation

Program implementation / Descriptive evaluations - Studies of program implementation features, documenting those features and/or participant experiences in the program or program component
Program implementation / Impact evaluations - Studies of the impacts of program implementation features, such as the effectiveness of recruitment or retention strategies or sequences of program content
Participants / Descriptive evaluations - Studies of outcomes for participants (including outcomes before and after participating in a program or program component)
Participants / Impact evaluations - Studies of a program’s impacts on participant outcomes

Focus of design

Descriptive evaluations / Program implementation - Studies of program implementation features, documenting those features and/or participant experiences in the program or program component
Impact evaluations / Program implementation - Studies of the impacts of program implementation features, such as the effectiveness of recruitment or retention strategies or sequences of program content
Descriptive evaluations /Participants - Studies of outcomes for participants (including outcomes before and after participating in a program or program component)
Impact evaluations / Participants - Studies of a program’s impacts on participant outcomes

Of the research designs available to choose from, some are suited to examining whether the program or program component caused changes in participant outcomes, others provide a descriptive view and document changes in outcomes of program participants, and still others focus on the delivery of the program itself (implementation). The design you choose should be driven by the research questions you want to answer and by available resources, such as time, funds, and expertise.

For more questions and resources to consider for specific evaluation designs, click the following links:

Issues in impact evaluations: defining the program group and the comparison group
Issues in randomized-controlled trial (RCT) evaluations: conducting random assignment
Issues in quasi-experimental design (QED) evaluations: selecting a comparison group
Issues in evaluations documenting program implementation
Issues in pre/post design evaluations

Consider registering your evaluation. Evaluation or study registration is becoming increasingly common. To “register” an evaluation, you submit information to a registry about key aspects of your plans for the evaluation—including evaluation design, research questions, and plans for analysis. The registry makes this information available publicly. The goal of study registration is to increase transparency of evaluation and improve the overall quality of evidence in the field.

For more information on the evaluation design, see:

U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. “The Program Manager’s Guide to Evaluation, Second Edition.” January 2010. This guide explains what program evaluation is, why evaluation is important, how to conduct an evaluation and understand the results, how to report evaluation findings, and how to use evaluation results to improve programs that benefit children and families.
Coalition for Evidence-Based Policy. “Which Study Designs Can Produce Rigorous Evidence of Program Effectiveness? A Brief Overview.” January 2006. This brief describes the advantages and disadvantages of random assignment and comparison group designs. Although the brief recommends that programs implement a random assignment evaluation design whenever possible, it also gives tips on how to design a comparison group evaluation when random assignment is not feasible.
Community Toolbox. “Selecting an Appropriate Design for the Evaluation.” Work Group for Community Health and Development at the University of Kansas. This online resource gives an overview of different types of experimental and quasi-experimental research designs, outlining the pros and cons of using each type. The site also guides readers on how to select a design based on the research questions your program would like to investigate and your program resources and constraints.
FRIENDS. The Evaluation Toolkit is a collection of information and resources about developing an evaluation plan for a descriptive or impact evaluation focused on participant outcomes. It was designed for programs for preventing child maltreatment, but much of the guidance is relevant for other types of program evaluation.
James Bell Associates. “Evaluation Brief: Selecting an Evaluation Approach.” September 2009. This brief discusses the steps for developing an evaluation approach, including defining your objectives and considering the designs and methods that would help you achieve those objectives.
Kirby, Gretchen, and Emily Sama-Miller. “Types of Evaluation: A Basic Training.” Mathematica Policy Research. April 2014. This presentation provides a basic overview of different evaluation approaches, such as descriptive, impact, and implementation evaluations. The presentation provides general logic model and evaluation design templates, and it poses six crucial questions that programs should consider when designing an evaluation.

Identifying outcomes to measure. For evaluations that will focus on outcomes for participants, what are the intended outcomes of the program? That is, what short- and long-term changes do you seek for program participants? What are the outcomes you will focus on in your evaluation?

Focal outcomes for the evaluation should flow directly from the program’s logic model and evaluation research questions. Revisit your research questions and logic model to ensure that the focal outcomes are closely linked to both the logic model and research questions.

For more information on identifying outcomes, see the resources listed in the Research Questions and Design of the Evaluation section, above. In addition, see:

Corporation for National and Community Service. “Collecting High Quality Outcome Data Part 1.” 2012. The first part of this two-part PowerPoint slideshow offers a primer on the benefits of collecting high quality outcome data, including how to use a theory of change to think about measurement.
Fatherhood Research and Practice Network. “Measuring Outcomes for Fatherhood Programs March 2014. This brief discusses how to ensure that your evaluation is measuring appropriate outcomes, and the connections between the program logic model and the expected outcomes from program participants.

Program group. If you will compare two groups, what services will the "program group" recieve? Are there core components of the program that all members of the program group will receive? Are there other components that only some will receive?

In considering your evaluation design, a critical first step is to carefully define the program of interest. Decide what program services you will evaluate, and define who will receive these services. For example, a core component intended for all clients might be a relationship skills workshop. In contrast, case management might be a program component intended only for clients with specified needs or characteristics. You might want to design your evaluation to assess either impacts of the entire program (in this example, the relationships skills workshop and available case management) or one component, for example, the relationship skills workshop. From an evaluation perspective, it is best to have uniform services for your program group.

Comparison group. If you are designing an RCT or QED evaluation, what services can the control or comparison group receive? Will the control or comparison group be able to receive any services from your program/agency?

The greater the contrast between services that the program group receives versus the services that the comparison group receives, the greater your likelihood of detecting an effect of your services on the program group. Considering the conditions of your community and target population, think about whether the control or comparison group will access any services from your program and how receiving that service might affect or even dilute the contrast between the program and control or comparison groups.

Assessing the appropriate contrast. What is the difference that you will be evaluating when comparing the services the program group received to those the control or comparison group received?

An impact evaluation evaluates the contrast between services. A common contrast is assigning the program group to receive the program services offered by the grantee and assigning the control or comparison group to not receive the grantee’s program services. (Members of this group may choose to receive any other service in the community or no services; you cannot prevent someone from receiving services offered by agencies other than your own.) The greater the contrast in services—for example, if your program services are more comprehensive or intensive than other available services—the greater the likelihood of finding an effect or impact of the program.

For more information on defining the program group and the comparison group, see:

U.S. Department of Health and Human Services, Administration for Children and Families, Children’s Bureau. “What’s the Difference? Constructing Meaningful Comparison Groups.” This brief video is targeted to programs serving children and their families. The video provides useful general information on the advantages of random assignment and other impact evaluation designs. It also uses easy-to-understand language and visuals to talk through options for evaluation design.
U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. “The Program Manager’s Guide to Evaluation, Second Edition.” January 2010. The section “Evaluating participant outcome objectives” (pages 53–57) discusses defining the program group and comparison group.
Strengthening Families Evidence Review. This review identified and assessed the research on programs for fathers and couples. Studies are rated on the strength of their design for detecting impacts of the program. The study ratings page has standards for impact research designs that applicants should consider, including evaluation rating criteria for high and moderate ratings.
James Bell Associates. “Evaluation Brief: Utilizing a Comparison Group in Evaluation.” September 2007. This brief explains how a comparison group can help identify program impacts. It also discusses ways to identify a comparison group and what to do if a comparison group is not available.

Considering an RCT design. Is there excessive demand for program services (that is, are there more people who want services than the program can serve)? Are you able to randomly assign potential participants to a program group, which receives the services of interest, or a control group, which does not receive the services being evaluated but may receive other services? Are there enough potential participants to form both a program and a control group?

An RCT, or random assignment evaluation, is an impact evaluation designed to assess the impacts of a program. In an RCT, people are assigned by chance to receive or not receive the program of interest. Those assigned to receive the program are known as the program group or treatment group. Those who may not receive the program of interest are known as the control group. It is important to note that people in the control group may receive other services from your program, or other services in the community. For example, you may test the effectiveness of one component of your program. You could offer the program group the entire program plus that component, and you could offer the control group the entire program without that additional component.

If well designed and implemented, an RCT is one of the best designs to definitively show whether your program caused changes in participants’ behaviors, beliefs, or intentions. It is often considered the gold standard of effectiveness research. Random assignment ensures that both groups are the same at the beginning of the evaluation (on average) so that any differences in outcomes following the program can be confidently attributed to the program. However, it is not always feasible to conduct an RCT. For example, if you do not have enough potential participants to create both a program and a control group, then an RCT is not practical. An RCT also might not address the research question you are most interested in.

Programs almost never have enough resources to deliver services to everyone in the community who is in need. In most cases, grantees accept participants (adults, couples, or youth) on a rolling basis and turn them away once they reach capacity. Random assignment, which works like a lottery, ensures that all potential participants have an equal chance of receiving services, regardless of when they enroll. For this reason, random assignment is sometimes used even if an evaluation is not conducted, for instance, in assigning students to schools (for example, see "Applying to Denver Public Schools") and awarding housing vouchers (for example, see "What is the CHA's waitlist lottery?").

Unit of random assignment. What is the unit of random assignment? Will individuals, couples, or families be assigned to each group?

If it is important that particular individuals (such as members of a couple) have the same random assignment result to answer your research questions, consider “grouping” the unit of random assignment (for example, randomly assigning a couple rather than each partner). When estimating the effects of the program, you will need information about which individuals were grouped together to conduct appropriate statistical tests.

Random assignment process. For programs conducting an RCT impact evaluation, how will you conduct random assignment (for example, drawing results from a hat, random number generation in Excel, odd- or even-numbered birthdays of a participant)? What will the probabilities of being assigned to the program or control group be (typically 50/50, though in some cases it is helpful to assign different proportions to the program and control group, such as 60 percent to the program group and 40 percent to the control group)? When will random assignment be conducted? Who will conduct it? How will you notify participants of the results?

Maintaining random assignment. What mechanisms will you put in place to ensure the random assignment is truly random? How will you make sure that participants who are randomly assigned to the control group do not receive services from the program you are evaluating?

Make sure there are not ways that the random assignment process could be “gamed” or tampered with. That is, no one should be able to influence whether someone ends up in the program or control group. Random assignment can be difficult if staff are not adequately prepared, which can lead to intentional or unintentional actions that go against random assignment. For example, sometimes staff think it is important to place a particular person in the program and do not want that person to have any chance of ending up in the control group. Such situations are why it is important to develop procedures beforehand to ensure true random assignment.

If someone in the control group receives program services, we call this contamination or cross-over. Contamination weakens the contrast between the program and control groups and makes it harder to detect the effects of the program. Before beginning the evaluation, implement procedures to make sure that control group members do not receive program services. Also, be sure to have a way to document instances of contamination or cross-over if they occur.

Mandatory participation. Are there any potential participants who must receive program services, for example, someone was mandated to participate in a program that you are offering or face jail time (in a sense, these participants are mandated to attend your program, even though Healthy Marriage and Responsible Fatherhood services must be voluntary)?

To allow for any situations of mandatory participation, consider “wild cards.” These are reserved for cases (that is, adults, couples, or youth) who cannot go through random assignment because they must be in the program group. You may serve these wild cards, but they should not be included in the evaluation data.

For more information on conducting random assignment, see:

U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. “The Program Manager’s Guide to Evaluation, Second Edition.” January 2010. Chapter 6 (pages 48–62) describes how to create an evaluation plan, including the advantages of and considerations for a random assignment evaluation.
Coalition for Evidence-Based Policy. “Key Items to Get Right When Conducting a Randomized Controlled Trial in Education.” December 2005. Although this brief focuses on evaluation for educational programs, its discussion about ensuring and maintaining rigor is applicable to most random assignment evaluations. The brief walks through key points to keep in mind while planning the evaluation, during the random assignment process, while measuring outcomes of the evaluation sample, and when analyzing results.
Fatherhood Research and Practice Network. “Randomized Controlled Studies.” March 2014. This brief describes key issues in conducting RCT evaluations and provides examples of such designs that have been used to evaluate fatherhood programs.
James Bell Associates. “Evaluation Brief: Commonly Asked Questions about Random Assignment.” November 2007. This brief answers frequent questions about random assignment, including whether it is fair and ethical. It also includes key steps in implementing an RCT evaluation.
Slavin, Robert. “Educational Research in an Age of Accountability.” Chapter 2: Randomized Experimental Designs. 2007. This chapter uses accessible, illustrative examples to explain different options for how to conduct random assignment, and units for random assignment. Because the target audience for the chapter is educational researchers, the text discusses random assignment of classes, schools, or teachers, but the lessons are applicable to other types of programs.

Considering QED. If you do not have an adequate number of potential participants or cannot randomly assign potential participants, can you identify or create a well-matched comparison group? Can you assess characteristics of the program group and comparison group before the start of the program?

In a QED evaluation, there is a program group and a comparison group, but individuals are not randomly assigned to groups. Rather, a comparison group is deliberately selected. The comparison group should be chosen to match, as closely as possible, the background characteristics of the program group (for example, fathers or couples from a neighboring community without a program such as yours). Carefully assessing whether the program and comparison group are similar on a range of characteristics before the program begins, including conducting statistical testing for equivalence on the outcome variable at baseline (if possible), and statistically adjusting for any differences in background characteristics (which might happen even if you have matched the populations) is critical for a credible test of program impact.

Identifying an appropriate group for comparison. For a QED, how will you select and define the comparison group? Once selected and defined, double-check: does the comparison group closely match the program group on characteristics and behaviors?

The comparison group represents what would have happened to those in the program group had the program not existed. For this reason, it is essential that the program and comparison groups be similar at the start of the evaluation—before the program group receives any program services. It is particularly important that the program and comparison groups be similar in characteristics that are expected to influence the outcomes of interest. If the groups are different at the beginning of the evaluation, there is no way to know if differences in post-program outcomes between the comparison and program groups were caused by the program, or if they were just a result of initial differences in the groups. Non-random program and comparison group designs are less rigorous than random assignment because you cannot rule out all initial differences (for example, the personality traits that led the program group to volunteer for the program might be different from those in the comparison group).

The way in which you create the program and comparison groups will make underlying initial differences more or less likely. For example, it is better to create a comparison group that does not have the option to enroll in the program, rather than a group of those who decline to enroll or participate. If your comparison group consists of people who expressed interest in your program but never enrolled, they might in turn be less motivated to change their behaviors than those who did enroll. The evaluation would not be able to distinguish whether any behavioral effects are from differences in motivation or from the program.

Assessing similarities between the program and comparison groups. Can you assess similarities between the program and comparison groups before program initiation? On what characteristics will you compare participants and comparison groups? How will you know whether differences are significant or meaningful?

An approach to identifying additional key characteristics is considering whether a typical reader or practitioner would consider two groups to be similar using common sense, experience, and practice-based knowledge. Consider, for example:

Characteristics that affect participant engagement and outcomes, such as motivation to change (that is, why someone engaged with the program). Keep in mind, these might be difficult to assess.
Demographic characteristics, such as race and ethnicity, age, relationship status, and number of children
Socioeconomic characteristics, such as education, earnings, public assistance receipt, and stable housing
Initial levels of the outcome variable(s) you will be assessing, such as relationship or parenting quality, conflict management, and contact with children
Other characteristics that you think are important for selecting a group similar to those you serve, such as incarceration history, substance use disorders, and physical and mental health

It is important to assess the similarities of the program and comparison groups before the program group gets involved in the program. Assess the magnitude of the difference in average levels of key variables across groups with a statistical test (such as a t-test) to determine if the difference is statistically significant.

Statistically controlling for non-matched characteristics. Were the program and comparison groups not equivalent on some baseline characteristics? If so, how will you account for those differences when you analyze the final data?

If the program and comparison groups are very well matched on most key characteristics, you can statistically adjust for any remaining differences in baseline characteristics during your analysis phase. This is possible through statistical techniques such as regression adjustment.

If the two groups are not well matched on most characteristics, regression adjustment is likely to be insufficient to adjust for the differences.

For more information on selecting comparison groups, see:

U.S. Department of Health and Human Services, Administration for Children and Families, Children’s Bureau. What’s the Difference? Constructing Meaningful Comparison Groups.” This brief video is targeted to programs serving children and their families. The video provides useful general information on the advantages of random assignment and other impact evaluation designs. It also uses easy-to-understand language and visuals to talk through options for evaluation design.
U.S. Department of Health and Human Services, Office of Family Assistance. “Evaluation Resource Guide for Responsible Fatherhood Programs.” July 2010. This resource guide includes a chapter titled, “How Do We Measure Change?,” which covers types of evaluation designs to consider, including QEDs.
Child Trends. “Quasi-Experimental Evaluations.” January 2008. This brief describes the information that QED evaluations can provide and the various types of QED evaluations.
Coalition for Evidence-Based Policy. “Which Study Designs Can Produce Rigorous Evidence of Program Effectiveness? A Brief Overview.” January 2006. This brief describes the advantages and disadvantages of different comparison group designs. Although the brief recommends that programs implement a random assignment evaluation design whenever possible, it also gives tips on how to design a comparison group evaluation when random assignment is not feasible.
Fatherhood Research and Practice Network. “Non-Random Research Designs.” March 2014. This brief describes issues to consider when developing a non-random research design, such as who and what you will evaluate.

Considering a descriptive implementation design. Do you want to know more about how your program is operating? If you are using a proven curriculum or specific model, do you want to know if it is being delivered with fidelity? Do you want to explore why some participants are dropping out of the program and others are completing it? Do you want to know the average dosage of services that participants receive (such as the number of meetings with case manager or hours of curriculum)?

Evaluations that are often called “implementation” or “process” evaluations assess the elements of how the program operates. This type of evaluation might examine whether the curriculum is being delivered with fidelity and which program components are functioning smoothly or facing difficulties. Implementation descriptive evaluations can also help you understand the reasons behind program successes or challenges by highlighting factors such as whether and why participants are initiating and completing the program. Implementation impact evaluations can test different strategies or procedures that might improve program operations. Typically, implementation evaluations assess the program activities rather than the participants’ outcomes.

Identifying evaluation focus. What aspects of program operations are of most interest? This could include any service or activity, staffing, management structure, or fidelity to the intended model. For example, if the program is implementing a curriculum for the first time, you might be interested in how closely staff adhere to the intended model (that is, fidelity), their perceptions of the curriculum, and participants’ satisfaction with the new material.

An implementation descriptive evaluation enables you to learn more about how the program operates and identify what seems to be working well and what you could improve. It can involve data from a wide range of sources and respondents (see questions below).

Possibility of random assignment. Not all implementation evaluations use descriptive evaluation designs. You can also use impact evaluation designs to assess program implementation. If you are interested in two (or more) approaches to program implementation, would a random assignment design be possible? For example, you could randomly assign one-half of program participants to receive reminder calls about upcoming classes and the other half to receive text messages. You could then examine whether there are differences between the groups in class attendance. If you are considering a randomized controlled trial, see the section on conducting random assignment for more questions and resources to consider.

Random assignment does not have to be limited to whether an adult/couple/youth receives your program or other services in the community. You could rigorously evaluate program operations (such as curriculum A versus curriculum B) or approaches (such as offering a $25 or $50 incentive).

Inputs and outputs. What aspect of program operations do you want to measure?

Program inputs are the aspects of program design and implementation that might contribute to program outputs. Inputs include the type and quantity of services offered, staff qualifications and experience, fidelity to a specified model, curriculum content, strategies to promote enrollment and participation, service delivery approach, and other factors. Outputs include measures of enrollment, participation, retention, and dosage (the amount of services participants receive on average), as well as participants’ satisfaction with program services. Outputs differ from outcomes, which are measures of participants’ behavior or attitudes.

Mode of data collection. How will you collect data? Will this include interviews, observations, surveys, or focus groups? Will you use data from a management information system (for example, to measure number of activities provided or average program participation)? Will you combine different modes of data collection?

The best mode of data collection depends on the type of information you hope to learn. For example, to learn about organizational culture or group norms, consider focus groups or observations. To collect detailed information on a topic, consider interviews. To understand how many people have a certain opinion or had a certain experience, consider surveys.

Match the mode of data collection to the study research questions and consider using multiple modes of data collection. For example, you might want to know how adults/couples/youth respond to the program. This could include looking at measures of participation, conducting focus groups (to obtain more in-depth information from a smaller number of people), administering a brief survey (to collect broader information from a larger number of people), and interviewing staff about what they have heard and seen.

Data collection instruments. Will you develop data collection instruments, such as questions or topics to cover, for all modes of data collection (for example, interviews, surveys, focus groups)? If conducting observations, will you have forms for the observers to complete or checklists to focus their attention on what is of most interest?

Developing protocols helps ensure consistent data collection, for example, between different interviews or focus groups. A detailed protocol also is a tool to identify and document all the questions of interest and make sure information needed to address each question is consistently covered in the planned data collection.

Respondents. From whom will you collect data? Who is most knowledgeable or has the perspective you are interested in? How will you identify the appropriate respondents? For example, will you interview all staff or only some staff? If only some, how will you pick the staff? Do you want a range of characteristics (such as length of time with program), and if so, how will you ensure that range is represented across respondents? If you want program participants to respond, how will you invite them? Who will you invite (for example, those who never attend the program or attend only sporadically—and if so, how will you encourage them to participate in the evaluation)?

Data collectors. Who will collect the data? Does that person have a vested interest in certain outcomes? Will having that person collect the data potentially affect the way people respond?

Ideally, the data collector is an objective person outside of the organization. If you use internal staff for collecting data, it is important to think very carefully how you will facilitate an open environment and protect respondents’ confidentiality. For example, you would not want supervisors to interview their staff on their perceptions of the program’s management.

Timing of data collection. When will you collect data? Will this differ by respondent? Will you collect data at multiple times for the same respondent to capture change over time?

Consider when you are most likely to get the information of interest. If you want to know about program start-up or initial performance, then collect data near the beginning. However, if you are interested in ongoing operations, perhaps after some of the “bugs” have been worked out, then consider collecting data after a year or so.

For more information on conducting an implementation evaluation see:

U.S. Department of Health and Human Services, Office of Family Assistance. “Evaluation Resource Guide for Responsible Fatherhood Programs.” July 2010. This resource guide includes a chapter titled “Process Evaluation,” which discusses key steps in developing implementation and process evaluations.
Fatherhood Research and Practice Network. “FRPN Webinar: Inside the Black Box: Measuring Service Delivery, Client Engagement and Fidelity” March 2017. This archived webinar discussed how researchers measure key aspects of program implementation: service delivery, client engagement, and fidelity.
Fatherhood Research and Practice Network. “FRPN Webinar: Inside the Black Box, Part 2: Fathers’ Attendance in Fatherhood Programs” February 2018. This archived webinar discussed how researchers measure client attendance, a key aspect of program implementation.
Fatherhood Research and Practice Network. “The Whys and Hows of Process Evaluation” June 2018. This brief reviews the importance of conducting an implementation evaluation or process evaluation, types of process evaluations, and the steps in conducting a process evaluation.
James Bell Associates. “Evaluation Brief: Conducting a Process Evaluation.” August 2008. This brief, developed for Office of Family Assistance Healthy Marriage and Responsible Fatherhood grantees, describes the what and why of implementation evaluations (also called process evaluations), as well as the steps for conducting them.
James Bell Associates. “Evaluation Brief: Measuring Implementation Fidelity.” October 2009. This brief describes different components of fidelity and steps to develop criteria for assessing fidelity.
National Implementation Research Network. “Implementation Drivers: Assessing Best Practices.” 2015. Although this resource does not focus on the steps involved in conducting an implementation evaluation, it provides useful ideas for what to consider or focus on in such evaluations, such as leadership competencies and organizational processes.

Considering a descriptive pre-post evaluation design. If you cannot identify a well-matched comparison group, can you assess participants both before and after the program?

A pre-post evaluation assesses participants before and after a program to learn whether their behaviors, beliefs, or well-being changed. This design can provide descriptive information on the people the program serves but cannot determine whether the program caused any change that occurred. Any observed changes might have occurred because of some other factor.

Designing a pre-post evaluation. If you plan on a descriptive pre-post design evaluation, there are many other factors to consider. What outcomes will you measure? How will you identify participants? When will you collect data and how will you ensure data quality? What modes and methods will you use for data collection? What measures will you use to capture changes in participants’ outcomes? How will you encourage participation in the evaluation?

See the sections on outcomes, data collection design and logistics, measures and outcomes, and participation in the evaluation for information to help plan your pre-post design evaluation.

Once you have chosen your evaluation design, plan your data collection activities. The sections below highlight issues to consider as you plan those activities.

Identifying respondents. Given your chosen evaluation design, who are the appropriate respondents for your data collection? In other words, from whom will you collect data? Will you collect data from any or all of the following groups?

Those who enrolled
Those who enrolled but did not participate in any services
Those who enrolled and partially completed services
Those who enrolled and completed services
Those who did not enroll and did not receive services, but are in your target population
Those who were assigned to the control or comparison group

What can you learn from these different groups? If you focus only on one group, what are you potentially missing?

For an RCT or quasi-experimental evaluation, it is important to collect data on all the participants who were assigned to enroll in the program, whether or not they completed services (sometimes called an “intent-to-treat,” or ITT, design). Doing so allows the evaluation to provide information on the effect of offering the program. In contrast, collecting data only from those who participated or finished services might prevent you from understanding the true effect of the program because you would no longer be comparing the entire program group with the control or comparison group. Such an analysis is sometimes described as a “treatment on the treated,” or TOT, analysis and is considered a less rigorous level of evidence than the ITT analysis.

You can learn useful descriptive information by comparing different groups of participants. For example, you could compare initial (baseline) characteristics of those who did and did not participate to learn more about the two groups, such as whether those who do not participate have more (or fewer) challenges in their lives.

Data collection timing. When will you collect data? Will you collect data before program services begin so that you have a baseline? Are there measures that you will want to collect data on only once and/or measures you will want to collect data on repeatedly? Will the timing of your data collection allow enough time for the change you want to see to occur? Note that longer-term follow-ups require substantially more effort (see the sections on logistics of data collection and tracking).

Timing of data collection affects what you can learn from the evaluation. To understand attitudes or behaviors before the program, it is necessary to collect data before services start (often known as baseline data). In impact evaluations, baseline data are also essential for comparing the program and control or comparison groups. To understand possible changes over time, you need follow-up data collection. Consider which outcomes are of interest when making decisions about follow-up timing. For example, if you think participants’ beliefs might change as a result of your program, you might want to measure these right after program completion. However, if you expect change in longer-term outcomes, such as obtaining an educational certificate or employment, it might be necessary to follow sample members for a longer period of time to observe changes.

If you choose to collect follow-up data only from participants who attend the final session of the program, keep in mind that you will not be getting information from those who do not finish the entire program. The results will not be representative of the full group of people who enrolled in the program. A better approach is to collect follow-up data from all participants at a particular point in time, such as around the expected program completion date (for example, two months after the start of services).

Data quality. Who will check data quality? What kinds of data quality checks will you conduct? When will you check data quality? What are the procedures if you identify quality issues?

Data quality checks can include making sure all questions are answered and answers are in the right range (for example, date of birth should not be in the future). Checking this information regularly is important for data reliability and accuracy.

Respondents should be allowed to skip questions; they might be uncomfortable providing certain information. However, check missing data to ensure there are no systematic problems (for example, certain sections are frequently missing), which might occur because of data collector training or formatting of the instruments.

For more information on data collection, see:

Community Toolbox. “Collecting and Analyzing Data.” Work Group for Community Health and Development at the University of Kansas. This online resource outlines considerations and options for data collection. This page discusses ways in which qualitative and quantitative data can be collected and optimal timing for data collection during program implementation.
Corporation for National and Community Service. “Collecting High Quality Outcome Data Part 2.” 2012. The second part of this two-part presentation describes steps in implementing a data collection effort, including developing a schedule, training data collectors, and testing instruments.
McNamara, Carter. “Basic Guide to Outcomes-Based Evaluation for Nonprofit Organizations with Very Limited Resources.” This gives a basic overview of each step in the data collection process, starting with planning and proceeding through choosing outcomes, selecting indicators, collecting data, and analyzing and reporting results.
W.K. Kellogg Foundation. “The Step-by-Step Guide to Evaluation.” November 2017. Chapter 8 of this handbook contains detailed information about how to implement data collection for a project evaluation. The chapter contains information on how to develop questions, determine data collection methods, and collect data.

Mode of data collection. Will you obtain data on the outcomes from self-report by the respondents, from observation, or from administrative data (such as state data on child support)? See also the section on working with an evaluator.

You can obtain data on knowledge and attitudes by having respondents complete a structured questionnaire, among other methods. You can also obtain data on behaviors both by asking respondents questions about their experiences and by observing them. Administrative data on behaviors (such as child support) can improve the accuracy of outcomes, but using these data usually requires special permission and additional steps, such as working with agencies that collect and store the data.

Methods for data collection. If you are collecting self-report data from participants, will you administer the measures through paper-and-pencil or electronically (for example, tablet or laptop)? Will you ask participants to respond to a written survey on their own, or will someone interview the respondents and record the answers? Will you interview respondents face-to-face or over the phone?

Using self-administered instruments (a respondent completes the answer himself or herself) reduces the need for trained, independent data collectors. However, having a trained staff member available can still be beneficial: the staff member can provide instructions and encouragement to respondents, as well as answer their questions.

Having program staff interview the respondents and record the answers can address any literacy issues respondents might have, but you must weigh this advantage against potential issues regarding privacy, interviewer bias, and cost.

Identifying data collectors. Who will collect the data? Depending on mode, this could mean conducting interviews and observations or collecting completed instruments.

Ideally, people who are not affiliated with the program should collect data. If program staff collect data, they could potentially and inadvertently affect respondents’ answers because of their involvement with the program. For example, respondents might be less honest or forthcoming about the program or changes they have experienced when giving answers to program staff. This is especially problematic in an impact evaluation if program staff collect data from the program group and evaluation staff collect data from the control group.

Training. How will you train data collectors? Will ongoing training be available? How will you monitor the quality of their work? Who will supervise them?

To obtain high quality data, train data collectors on appropriate procedures to ensure valid and reliable responses. Ideally, you should supervise and monitor data collectors throughout the evaluation to make sure they maintain appropriate practices and procedures for data collection.

For more information on the logistics of data collection, see:

U.S. Department of Health and Human Services, Office of Family Assistance. “Evaluation Resource Guide for Responsible Fatherhood Programs.” July 2010. This resource guide includes a chapter titled “Selecting Your Evaluation Instruments” that might be helpful in logistical planning for data collection within the context of responsible fatherhood programs. The chapter discusses instrument and measure selection and development, gives tips for survey development, and identifies sample survey items that measure common outcomes of responsible fatherhood programs.
Fatherhood Research and Practice Network. “FRPN Webinar: Achieving High Response Rates and Dealing with Missing Data in Fatherhood Evaluations.” June 2017. This archived webinar discussed techniques researchers have used to achieve high response rates for follow-up data collection with participants in a responsible fatherhood program.
Northwest Center for Public Health Practice. “Data Collection for Program Evaluation.” This toolkit compares and contrasts different evaluation modes and methods, gives examples of evaluation plans and templates for data collection, and lists common data collection items. The examples are framed in a public health context but are widely applicable.
U.S. Centers for Disease Control and Prevention. “Program Evaluation: Data Collection and Analysis.” August 2018. This web page hosts several “evaluation briefs” that give overviews and tips on different data collection methods, including focus groups, questionnaires, observations, interviews, and document review. The briefs are targeted to programs on adolescent health, but the tips and other information are relevant to other types of programs.

Selecting measures. How will you measure the intended outcomes of the program? Are there existing measures you can draw on, or will you create your own?

There are pros and cons to creating your own measures for an evaluation. A benefit is that you can develop measures that are highly tailored to your program and evaluation. However, one of the drawbacks of this approach is that, unless the measure is extensively pilot tested, the measure might not provide valid or reliable data. In addition, using measures that are widely accepted in the field might give your study more credence than measures devised by the program. Using an instrument previously developed and field-tested for a similar purpose and population provides some assurance that the measures are appropriate and capture the underlying concepts of interest.

Ensuring quality of measures. How will you choose measures that accurately reflect the outcomes you are trying to measure? How will you ensure that the measures capture the highest-quality data possible? Are there likely to be ceiling effects (wherein almost all respondents select the highest category; for example, most people strongly agree that “being a parent is the most important role to me”) or floor effects (wherein almost all respondents pick the lowest category)? Do the measures have good reliability and validity? Are there concerns about bias in the measures?

There is an art and a science to creating measures that capture behaviors, attitudes, and knowledge as intended. For example, the wording should be easily understandable, and it should not lead a respondent to respond in a certain way. Measurement quality is often defined in terms of reliability, validity, and bias. You might have to consider problems with social desirability and whether you can minimize this bias through question wording.

Instruments for data collection. How many questions will be on the instrument? Will you allow for open-ended answers, closed-ended (multiple-choice or scaled) responses, or both?

Think about the length of the survey, protocol, or other features of the instrument that you will use to collect data. If the instrument is too short, you might not get all of the information you want; if it is too long, you might discourage respondents from participating in or completing the data collection.

Additionally, think about the type of answers you want to collect. Open-ended answers might allow you to collect rich qualitative data, but closed-ended responses will allow you to compare averages or conduct other numerical, quantitative analyses.

Finally, consider how the visual layout and language of a written survey will affect respondents. Crowded surveys might be confusing or unappealing. Language in surveys and protocols should be simple, clear, easy to understand, and appropriate to your target population.

Pilot testing instruments. Will you pre-test your instruments and measures?

Evaluators sometimes pilot test their data collection instruments, such as written surveys and protocols, to ensure they will collect reliable, valid, and unbiased data. Pilot testing involves asking volunteers to complete the instrument so that you understand how well it works and can make any needed adjustments before beginning data collection with study participants. During pilot testing, you can also ask respondents to explain their thinking about questions to identify any problems with the wording. The respondents should be members of your target population, but not people you would want to include in your study—for example, alumni of your program or similar programs might be good candidates for pilot testing.

For more information on collecting high quality data, see:

Corporation for National and Community Service. “Collecting High Quality Outcome Data Part 1” and “Collecting High Quality Outcome Data Part 2.” 2012. The first part of this two-part PowerPoint slideshow offers a primer on the benefits of collecting high quality outcome data, how to use a theory of change to think about measurement, identifying and evaluating different data sources and instruments, and identifying some data collection methods. Part 2 of this presentation explains different attributes of data quality, including social desirability, reliability, validity, and bias.
Fatherhood Research and Practice Network. “FRPN Measurement Resources.” The Fatherhood Research and Practice Network developed a series of new measures related to responsible fatherhood, including an assessment of father-child contact (which focuses on non-resident fathers) and a measure of fathers' challenges. For each measure, you can download a PDF of the items and watch a video that demonstrates how to administer the measure.
Fatherhood Research and Practice Network. “Measuring Outcomes for Fatherhood Programs.” March 2014. This brief suggests outcomes to consider for fatherhood programs.
Rosinsky, Kristina L., and Mindy E. Scott. “Healthy Marriage and Relationship Education: Considerations for Collecting Outcome Data from Adolescents.” Child Trends, with the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. June 2015. This tip sheet describes issues to consider when deciding methods of collecting outcomes data from youth.
Rosinsky, Kristina L., and Mindy E. Scott. “Healthy Marriage and Relationship Education: Considerations for Collecting Outcome Data from Parents in Complex Families.” Child Trends, with the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. June 2015. This tip sheet describes issues to consider when deciding methods of collecting outcomes data from parents in complex families (families in which one or both partners have children from previous relationships).
Scott, Mindy E., Kristen A. Moore, Artemis Benedetti, Heather Fish, and Kristina Rosinsky. “Healthy Marriage and Relationship Education: Recommended Outcome Measures for Parents in Complex Families.” Child Trends, with the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. June 2015. This table presents recommended items and measures for assessing a range of outcome domains for healthy marriage programs that serve parents in complex families (families in which one or both partners have children from previous relationships).
Scott, Mindy E., Kristen A. Moore, Heather Fish, Artemis Benedetti, and Sage Erikson. “Healthy Marriage and Relationship Education: Recommended Outcome Measures for Adolescents.” Child Trends, with the U.S. Department of Health and Human Services, Administration for Children and Families, Office of Planning, Research, and Evaluation. June 2015. This table presents recommended items and measures for assessing a range of outcome domains for healthy marriage programs that serve high school-age adolescents.

Before you conduct an evaluation that involves people, or “human subjects,” your plans for the evaluation must be approved by an institutional review board (IRB). The IRB is an independent ethics committee that will make sure that you have planned appropriate protections from risk or harm to the people involved in your evaluation.

Maintaining privacy and confidentiality of data. How will you obtain research approval and informed consent from your respondents? What safeguards will you put in place to ensure that your respondents’ data are kept secure and confidential?

To conduct research with people as your subjects, you must obtain approval from an institutional review board (IRB), an independent ethics committee. Before collecting data, you should submit your research design, data collection instruments, and consent forms to an independent IRB and obtain approval.

As part of your research design, you will need to include information on how you will keep respondents’ data secure. Personally identifiable information (PII) is private information that can be used on its own or with other information to identify, locate, or contact a single person. Examples of PII include first and last name, Social Security number, home address, phone number, and any financial information or other personal information. Keep PII secured and confidential. Think about how you will maintain security for all of the data that you collect on individuals. For example, you might want to password-protect certain drives or folders on your organization’s computers or network and limit access to only certain people. You might also want to lock up physical, hard copy files and have a plan for destroying those files when they are no longer needed.

Working with an IRB. Do you have access to an IRB that could approve your research?

Universities typically have IRBs, and independent IRBs will review research evaluations for a fee.

Collecting consent. How will you collect informed consent from participants? Do you have a consent form? When and how will you administer it? Will participants sign the consent form or provide verbal consent? If signed, where will you store the consent forms? If someone does not consent to the evaluation, will you still allow that person to participate in the program?

Respondents should be fully informed about the evaluation, including random assignment procedures, risks, and whom to contact with questions. The consent language should be easy to understand and avoid jargon. The IRB will likely review the consent form as part of the approval process.

For more information about participant protections, see:

James Bell Associates. “Understanding the IRB.” January 2008. This brief describes what an IRB is and what it does and offers guidance on whether IRB approval is needed.
National Healthy Marriage Resource Center. “Marriage and Relationship Education Program Development and Management Manual.” August 2013. Chapter 12 of this manual (“Evaluation”) briefly discusses the need for IRB approval and the role of IRBs in ensuring participant protections.
Social Innovation Fund. “Working with Institutional Review Boards” includes tips for selecting an IRB, when to consult with one, and suggestions for working together.
U.S. Department of Health and Human Services, The Office of Research Integrity. “Handling Information.” This website provides a tutorial and case study examples about maintaining confidentiality of data. The “Handling Information” section of the tutorial briefly discusses important issues such as proper documentation, secure storage, informed consent, and other precautions to take when conducting research with human subjects.

Determining sample size. How many participants do you plan to have in the evaluation? How many do you expect to enroll per month? Per year? How many will you have in your program group (who will be eligible for program services) and in the control or comparison group (who will not be eligible for your program’s services)?

Consider flow of enrollments into your program when estimating sample size, including any seasonal or other typical fluctuations. Also keep in mind that not everyone may want to participate in the evaluation, so the number of participants in your evaluation might be smaller than the number who are eligible.

The size of your sample will affect whether differences are statistically significant (known as statistical power). Even a large difference (from beginning to end of the program or between the program and comparison groups) might not be statistically significant if the sample is small. You should work with a professional evaluator to estimate the sample size you will need to be able to detect statistically significant differences of an expected size.

For more information about sample size, see:

Fatherhood Research and Practice Network. “Sampling, Recruitment, and Retention.” March 2014. This brief discusses the relationship between attrition and sample size and provides links to additional resources on sample sizes in quantitative research.
Pennsylvania State University Cooperative Extension. “How to Determine a Sample Size.” This tip sheet explains issues to consider when determining sample size and how to calculate an appropriate sample size. It also includes tables that can help you determine an appropriate sample size for your evaluation, given the number of participants you expect to enroll and other factors.

Encouraging participation in data collection. How will you encourage individuals to participate in data collection? How will you encourage participation among those who do not have contact with the program, including the control or comparison group and any program group members who have dropped out?

Few, if any, evaluations get all participants to respond to a follow-up survey. Unfortunately, some evaluations have response rates of 50 percent or lower. Low response rates reduce our confidence that the findings of the evaluation are accurate.

It may be helpful to provide incentives to your respondents to encourage their participation in the data collection. Examples of incentives include gift cards or bus passes. You may also consider nonmonetary benefits that might be appropriate for your target population, such as free child care.

Staying in touch with those who have little or no contact with the program (for example, sending a newsletter) might be useful for reminding them of later follow-up data collection.

Tracking respondents over time. How will you track respondents over time for follow-up data collection? What contact information will you collect to help with tracking (such as addresses, phone numbers, or social media information)? Will you ask respondents for contact information from people they know in case their contact information changes? How much time will pass before you reach out to respondents?

If you are following respondents over an extended period of time, it might be helpful to collect contact data from them at several points in time. Many respondents move frequently, and their contact information will become out of date. Collecting good contact information and identifying someone else who might know where the respondent is (such as the respondent’s mother) can help with tracking. Regular check-ins are an opportunity to update contact information even if you are not collecting data.

For more information on participation in an evaluation, see:

U.S. Department of Health and Human Services, Office of Adolescent Health. “Using Incentives to Boost Response Rates” and “Increasing Questionnaire Response Rates.” August 2018. These evaluation briefs give practical tips for increasing response rates on surveys. The first brief focuses on using incentives, including information on amounts and other considerations when offering monetary incentives. The second brief offers information on non-monetary tactics for increasing response rates.
Fatherhood Research and Practice Network. “Sampling, Recruitment, and Retention.” March 2014. This brief provides suggestions for keeping sample members engaged and participating in the program and the evaluation.

Hiring an external evaluator. Where might you find an external evaluator, such as local universities? What are the evaluator’s qualifications and experience? Has he or she worked on similar evaluations? Does he or she have a background evaluating similar programs?

If you have the resources, an external evaluator can be an asset. An external evaluator can help design the evaluation, including the measures, methods, and instruments; obtain research approval and informed consent from respondents; create data security measures; and collect and analyze results. An external evaluator also does not have any conflict of interest, as program staff might have, which is another safeguard for conducting a good evaluation.

Working with an external evaluator. What responsibilities will the evaluator have? What responsibilities will program staff have?

It is important to be clear about each party’s responsibilities. For example, a team might decide that an evaluator will conduct the random assignment, but program staff will tell participants to which group they are assigned.

ACF might require grantees conducting local evaluators to develop plans, such as evaluation design and analysis plans, and reports of results. Consider these future products when planning roles and responsibilities.

For more information about using an evaluator, see:

Coalition for Evidence-Based Policy. “How to Find a Capable Evaluator to Conduct a Rigorous Evaluation of an Educational Practice or Program.” June 2007. Although this brief focuses on evaluators for educational programs, much of its advice on how to find an appropriate evaluator is applicable to other social programs. The brief focuses on finding evaluators who can provide rigorous RCT or QED or impact evaluation services. The brief emphasizes finding evaluators with a proven track record and expertise in the program area being evaluated. It includes tips and a step-by-step vetting process for finding an evaluator.
Community Toolbox. “Choosing Evaluators.” Work Group for Community Health and Development at the University of Kansas. This online resource gives tips on why it may be important to hire an outside evaluator and what to look for when selecting an evaluator. The resource also addresses the timeline for selecting an evaluator in the context of program start-up and implementation.
James Bell Associates. “Evaluation Brief: Locating and Hiring an Evaluator for Your Grant.” July 2007. This brief describes issues to consider and suggestions for working with an independent evaluator.
National Healthy Marriage Resource Center. “Marriage and Relationship Education Program Development and Management Manual.” August 2013. Chapter 12 of this manual (“Evaluation”) gives an overview of evaluation in the context of healthy marriage programs. The chapter discusses selecting an external evaluator, planning and conducting an evaluation while considering outcomes of interest, and analyzing and reporting findings.
Social Innovation Fund. The “Evaluator Screening” document includes tips on finding a good evaluator and a worksheet to complete when considering different evaluators.

Estimating and planning for costs. How much will it cost to conduct the evaluation? Have you factored these costs into your budget? What funding sources are available?

Many factors affect the cost of an evaluation, including its design, sample size, the number of sites, type of data being collected, the instruments and measures used, number and length of follow-ups, inclusion of multiple components (such as impact and implementation evaluations), and the external evaluator’s rates.

Please refer to your Funding Opportunity Announcement for detail on the range of funds that may be allocated to evaluation.

Resources for Evaluation Design

Questions and Resources for Grant Applicants to Consider When Developing Their Local Evaluation Plans

Research Questions and Design of the Evaluation

Types of evaluation

Focus of evaluation

Focus of design

Identifying Outcomes

Issues in Impact Evaluations: Defining the Program Group and the Comparison Group

Issues in Randomized Control Trial (RCT) Evaluations: Conducting Random Assignment

Issues in Quasi-Experimental Design Evaluations: Selecting a Comparison Group

Issues in Evaluations Documenting Program Implementation

Issues in Descriptive Pre-Post Design Evaluations

Plan for Data Collection

Logistics of Data Collection

Measures

Participant Protections

Sample Size

Participation in the Evaluation

Working With an Evaluator

Cost of Conducting an Evaluation

Quick Links