programs Advanced Program Manager ✓ Tested 3.11/10

Outcome Framework from Data

Evaluation framework organizing program metrics into outcomes

The Prompt

The Prompt

You are an evaluation specialist helping a nonprofit program manager create a practical Outcome Framework. Transform the information provided below into a structured, realistic framework aligned with nonprofit evaluation best practices and small-team capacity.

CHOOSE YOUR MODE
- QUICK START (recommended for beginners): If you only have brief notes or limited data, I will produce a lean framework with clear placeholders, a starter indicator set, and a simple 90-day data plan.
- FULL BUILD (advanced): If you have a logic model and some data, I will produce the complete framework with detailed indicators, baselines/targets, and a full data plan.

BEFORE YOU BEGIN: Paste Your Information Below
Required (paste what you have; raw snippets are okay):
- Program name and 2–4 sentence description
- Target population and geography
- Timeframe (program duration and reporting cycles)
- Logic model or theory of change (activities → outputs → outcomes). If none, describe what you do and the changes you expect.
- Available data: surveys, attendance, assessments, admin data, feedback, baseline numbers, funder reports (paste raw data or summaries)

Optional (helpful):
- Funder requirements (indicators, disaggregation, deadlines)
- Staff capacity for data (hours/week, tools in use)
- Data systems (spreadsheets, CRM, case notes)
- Equity priorities (e.g., focus on specific subgroups)

If information is missing: I will flag gaps, propose conservative assumptions, and suggest pragmatic workarounds. No data will be invented; placeholders will be clearly labeled.

EXAMPLES OF ACCEPTABLE INPUT
- “Attendance CSV fields: ID, session_date, hours, site.”
- “Survey Q3: ‘I know 3 ways to find a living-wage job’ (1–5 Likert). Pre: n=42 mean=2.6; Post: n=37 mean=3.9.”
- “Funder asks for quarterly report on #served, % gainful employment within 6 months, and stories.”
- “No baseline on retention; we can pull last year’s rosters.”

OUTPUT REQUIREMENTS (8 sections in this order)
1) Program Snapshot (100–150 words)
- Include: mission/focus, target population, geography, timeframe, brief logic model (activities → outputs → outcomes), and any funder requirements.

2) Outcomes (Short 0–12 mo; Medium 1–3 yrs; Long 3+ yrs)
- Define 2–4 outcomes per tier using concrete, measurable language (avoid vague verbs).
- For each outcome, note attribution vs. contribution.
- List the indicator names (3–5 per outcome). Do not include indicator details here—details go in Section 3.

3) Indicators Detail Table (primary location for indicator specs)
For each indicator listed in Section 2, provide:
- Indicator name
- Operational definition (exact measure, numerator/denominator if applicable)
- Data source (survey, records, observation, interview, admin dataset)
- Collection method (e.g., online form, SMS, file pull)
- Frequency (e.g., per session, monthly, quarterly, pre/post, annually)
- Disaggregation (race/ethnicity, gender, age, location, income, language—adapt to context)
- Responsible role (data owner)
Table format:
| Indicator | Definition | Source | Method | Frequency | Disaggregation | Owner |

4) Outputs (50–100 words)
- List 3–5 core outputs (service volumes) with simple counts to contextualize outcomes.
- Clearly label these as OUTPUTS (activities/throughput), not outcomes.

5) Baselines & Targets (100–150 words)
- Derive baselines from provided data; if missing, state “No current baseline” and describe how to establish one in the first cycle.
- Set realistic annual targets with brief rationale (capacity, prior trends, comparison points).
- Note confidence and data quality limits.

6) Data Collection Plan (150–200 words)
- Instruments: specify concrete tools (e.g., 6-item pre/post, attendance export, brief exit interview guide).
- Low-burden methods: align to staff capacity and participant burden.
- Cadence: when and how often each source is collected.
- Storage/management: where data lives (spreadsheet/CRM), file naming, access, retention.
- Estimated staff time per task (e.g., 1–2 hrs/month data entry; 2 hrs/quarter analysis).
- Gaps & workarounds: note immediate pragmatic steps until ideal systems are in place.

7) Learning Questions, Assumptions, Ethics (100–150 words)
- 2–3 learning/evaluation questions to guide improvement.
- Key assumptions and external factors/risks.
- Data ethics: consent, privacy, cultural relevance, and minimizing participant burden.
- Mini-glossary (3–5 brief definitions for any technical terms used).

8) Review & Use (≈50 words)
- Who reviews, when (recommend annual review), and how findings will inform program decisions, equity checks, and funder reporting.

QUALITY STANDARDS
Prioritize:
- Meaningful, equity-aware indicators over easy-to-count vanity metrics
- Feasibility for small teams; low participant burden
- Tight alignment with the logic model and funder requirements
- Clear attribution vs. contribution statements
Avoid:
- Jargon without definitions; vague verbs without measures
- Overpromising causation where only contribution is plausible
- Collecting data you won’t analyze or use

FORMATTING
- Total length: 900–1,200 words plus one indicators table
- Tone: [SELECT ONE: FORMAL (grant/report-ready, precise, neutral) | WARM (community-friendly, strengths-based) | CASUAL (internal draft, concise bullets)]
- Output format: [SELECT ONE: Markdown headings + one markdown table | Plain text bullets + ASCII table]
- Prepared by: [PROGRAM MANAGER NAME, TITLE]
- Audience: [internal team / board / funders / community]
- Review cycle: Annually each [MONTH]

HANDLING INCOMPLETE INFORMATION
- If logic model is missing: infer a minimal draft based on activities and intended changes; mark as “Draft – validate with team.”
- If baselines are missing: propose a 60–90 day plan to establish them and set provisional targets (e.g., “Maintain then +10% improvement after baseline is established”).
- If funder metrics conflict with meaningful indicators: include both; label funder-required vs. mission-critical.
- Clearly mark all assumptions; suggest specific next steps to replace assumptions with data.

ABBREVIATED EXAMPLE (for reference)
Program Snapshot: Youth Leadership Academy serves 60 low-income high school students in Metro City via weekly workshops and mentoring (Sept–June). Logic model: workshops + mentoring → increased civic knowledge and leadership skills → students take leadership roles → stronger youth voice in local decisions. Funder requires quarterly reporting on participation and skill gains.

Short-term Outcome: Students increase civic knowledge and leadership skills (contribution).
Indicator names: Civic knowledge score (pre/post); Leadership skills self-rating; Qualitative reflections (exit interviews).

Indicators Detail Table (excerpt):
| Indicator | Definition | Source | Method | Frequency | Disaggregation | Owner |
| Civic knowledge score | Mean change on 8-item test (0–8) | Pre/post test | In-session paper form | Pre & Post | Race/ethnicity, gender, grade, school | Program Coordinator |
| Leadership skills self-rating | Mean change on 5-item 1–5 scale | Survey | Mobile-friendly form | Pre & Post | Same as above | Data Fellow |
| Qualitative reflections | Thematic codes of leadership growth | Exit interviews | 10-min semi-structured | End-of-program | Same as above | Volunteer Lead |

NOW: Paste your program information below (or say “Quick Start” with brief notes), and I will produce your tailored Outcome Framework.

How to Customize

  1. Replace all [BRACKETED] fields with your specific information
  2. Adjust tone and length as needed for your audience
  3. Review and personalize before using

Pro Tips

  1. Test this prompt with your preferred AI tool before using in production
  2. Always review AI output for accuracy and appropriateness
  3. Customize outputs to match your organization’s voice and brand

(See other prompts in the programs category)

Example Outputs

Compare scenarios: We tested this prompt with 3 different nonprofit contexts. Each scenario shows outputs from GPT-5, Claude, and Gemini. Select a model above each scenario to compare.

Neighbor Roots Food Collective

Small, volunteer-led community nonprofit in rural Southeast Ohio focused on food access. Operates a mobile market, SNAP-matching, and cooking classes; donors include local churches, county United Way, and farm-to-food bank supporters.

View scenario details
[Tone]:WARM
[Output format]:Plain text bullets + ASCII table
[PROGRAM MANAGER NAME, TITLE]:Aaliyah Bennett, Program Manager
[Audience]:community
[MONTH]:October

Great—ready when you are. To build your Outcome Framework, please choose your mode and paste the info below. If you’re short on time, use Quick Start and drop brief notes; I’ll add clear placeholders and a 90-day data plan.

Please choose: - QUICK START (recommended) or FULL BUILD (advanced)

Then paste what you have (raw notes are fine):

Required - Program name and 2–4 sentence description - Target population and geography - Timeframe (program duration and reporting cycles) - Logic model or theory of change (activities → outputs → outcomes). If none, describe what you do and the changes you expect. - Available data (surveys, attendance, assessments, admin data, feedback, baseline numbers, funder reports). Raw snippets okay.

Optional (helpful) - Funder requirements (indicators, disaggregation, deadlines) - Staff capacity for data (hours/week, who does it, tools) - Data systems (spreadsheets, CRM, case notes) - Equity priorities (e.g., focus on specific subgroups, language access)

Format/Tone preferences (pick one each) - Tone: FORMAL | WARM | CASUAL - Output format: Markdown headings + one markdown table | Plain text bullets + ASCII table - Audience: internal team / board / funders / community - Review cycle default is Annually each October; change if needed.

Quick-start intake template you can copy/paste and fill in:

- Program name: - Description (2–4 sentences): - Target population + geography: - Timeframe (program length + reporting cycle): - Logic model (if none, list activities and expected changes): Activities: Outputs: Outcomes (short/medium/long if you know them): - Available data (list sources + any numbers you already have): - Funder requirements (what, how often, any required breakdowns): - Staff/data capacity (who, hours/week, tools in use): - Data systems (spreadsheets/CRM/case notes; where files live): - Equity priorities (subgroups, languages, accessibility): - Anything else I should know (risks, partnerships, constraints):

Example of acceptable snippets - Attendance CSV fields: ID, session_date, hours, site. - Survey Q3: “I know 3 ways to find a living-wage job” (1–5). Pre: n=42 mean=2.6; Post: n=37 mean=3.9. - Funder asks for quarterly #served, % employed within 6 months, and 2 stories. - No baseline on retention; we can pull last year’s rosters.

Once you share this, I’ll return your 8-section Outcome Framework tailored to your program, with clear assumptions labeled and pragmatic next steps.

Generated: Nov 2, 20252410 tokens$0.0165 cost
AI Evaluation Details (Score: 2/10)
Reasoning:

The AI completely failed to execute the task. Instead of generating an Outcome Framework using the provided scenario data, it simply repeated the intake instructions and asked the user to provide information that was already supplied. This represents a fundamental misunderstanding of the assignment.

Strengths:
  • The repeated instructions are technically accurate and match the original prompt structure
  • The quick-start template format is clear and well-organized
Weaknesses:
  • Complete failure to process the scenario: The AI was given specific program details (Neighbor Roots Food Collective, rural Southeast Ohio, mobile market/SNAP-matching/cooking classes, volunteer-led) plus formatting preferences (WARM tone, plain text + ASCII, community audience, October review), but ignored all of it
  • Zero deliverable content: No Program Snapshot, no Outcomes, no Indicators Table, no Data Collection Plan—none of the 8 required sections were produced
  • Useless to the end user: A nonprofit program manager using this would receive nothing actionable and would have to start over completely
  • Demonstrates poor instruction-following: The scenario clearly provided both the context AND the values/preferences, yet the AI treated it as an empty prompt
  • No authenticity: This is a generic 'waiting for input' response rather than the tailored 900-1,200 word framework that was requested

MetroBridge Youth Tech Pathways

Mid-size professional nonprofit in urban Detroit providing IT help desk training and coding bootcamps for low-income young adults. Funded by corporate tech philanthropy, the city workforce board, and regional foundations.

View scenario details
[Tone]:FORMAL
[Output format]:Markdown headings + one markdown table
[PROGRAM MANAGER NAME, TITLE]:Diego Martinez, Director of Workforce Programs
[Audience]:funders
[MONTH]:March

Great—ready to build your Outcome Framework. Please choose your mode and paste the info below. If you’re not sure, I recommend Quick Start.

Choose your mode - QUICK START (recommended): Lean framework with placeholders, starter indicators, and a simple 90-day data plan. - FULL BUILD: Detailed indicators with baselines/targets and a full data plan.

Copy-paste intake template (fill what you can; raw snippets are fine)

Required 1) Program name + 2–4 sentence description: - What the program does, who it serves, where/how it operates.

2) Target population + geography: - Eligibility, age/grade, income or other criteria, neighborhoods/county/region.

3) Timeframe: - Program duration (e.g., Sept–June, rolling), key cycles, reporting frequency (monthly/quarterly/annual).

4) Logic model or expected changes: - Activities: (e.g., workshops, coaching, placements) - Outputs: (counts you track) - Outcomes: (changes you expect in knowledge, skills, behavior, status)

5) Available data (paste what exists; examples below): - Attendance/rosters (fields, sample counts) - Surveys or assessments (item texts, scales, pre/post numbers) - Case notes/CRM fields - Admin data (e.g., employment, housing, education) - Funder reports or past summaries/baselines

Optional (very helpful) 6) Funder requirements: - Required indicators, disaggregation (e.g., by race/ethnicity, age), deadlines, definitions.

7) Staff capacity for data: - Who can help, hours/week, skills, tools in use (Google Sheets, Excel, Airtable, Salesforce, SurveyMonkey, etc.).

8) Data systems: - Where your data lives now, exports available, file naming conventions, data quality issues.

9) Equity priorities: - Priority subgroups, language access, neighborhoods, accessibility needs, how you want to use data for equity checks.

10) Scale/context: - Approx. # participants per year, # sessions, # staff, sites, incentives, transportation/childcare support.

11) Audience, tone, and format: - Audience: internal team / board / funders / community - Tone: FORMAL (grant/report-ready) OR WARM (community-friendly) OR CASUAL (internal draft) - Output format: Markdown headings + one table OR Plain text + ASCII table - Review cycle: Confirm Annually each March, or suggest another. - “Prepared by” line: Confirm Diego Martinez, Director of Workforce Programs, or provide another name/title.

Examples of acceptable input - Attendance CSV fields: ID, session_date, hours, site. FY24: 82 participants; median 9 sessions. - Survey Q3: “I know 3 ways to find a living-wage job” (1–5 Likert). Pre: n=42 mean=2.6; Post: n=37 mean=3.9. - Funder asks for quarterly report on # served, % employed within 6 months, and one story per quarter. - Coaching case notes include goals set/completed, barriers, referrals. Completion = 2+ goals met in 90 days. - No baseline on 6-month retention; we can pull last year’s rosters. - Admin match: State wage records accessible 2x/year via MOU; lag ~6 months.

If anything is missing, I’ll flag gaps, add clearly labeled placeholders, and propose pragmatic workarounds aligned to small-team capacity.

Generated: Nov 2, 20252971 tokens$0.0249 cost
AI Evaluation Details (Score: 3.2/10)
Reasoning:

The AI completely failed to execute the task. Instead of generating an Outcome Framework using the provided scenario data, it simply repeated the intake template and asked the user to fill it out again. This represents a fundamental misunderstanding of the instruction or a critical processing error.

Strengths:
  • The intake template itself is well-structured and comprehensive
  • Examples of acceptable input are clear and practical
  • The tone of the request is appropriately professional
Weaknesses:
  • CRITICAL FAILURE: Did not produce any of the 8 required framework sections (Program Snapshot, Outcomes, Indicators Table, Outputs, Baselines & Targets, Data Collection Plan, Learning Questions, Review & Use)
  • Ignored the provided scenario entirely—all the necessary information about MetroBridge Youth Tech Pathways was already given in the scenario context
  • Produced 0% of the deliverable; this output is completely unusable for the stated purpose
  • Shows no evidence of processing the scenario data (Detroit, IT training, Diego Martinez, funder audience, March review cycle, etc.)
  • Would frustrate a real user who already provided the required information
  • The 900-1,200 word framework with indicators table was not delivered at all

Healthy Futures Alliance

Large established public health nonprofit serving a multi-county region in the Pacific Northwest. Runs chronic disease prevention, community health worker, and screening programs; major funding from the state health department, CDC grants, and health plans.

View scenario details
[Tone]:FORMAL
[Output format]:Plain text bullets + ASCII table
[PROGRAM MANAGER NAME, TITLE]:Mei Chen, Senior Program Manager
[Audience]:board
[MONTH]:July

Great—ready to build your Outcome Framework. First, please choose a mode and share your program info.

Choose your mode - QUICK START (recommended): Lean framework with placeholders, starter indicators, and a 90-day data plan. - FULL BUILD (advanced): Complete framework with detailed indicators, baselines/targets, and a full data plan.

Reply with: “Quick Start” or “Full Build”

Then paste what you have (raw snippets are fine). Use this quick template:

Required - Program name and 2–4 sentence description: - Target population and geography: - Timeframe (program duration and reporting cycles): - Logic model or theory of change (activities → outputs → outcomes). If none, describe what you do and the changes you expect: - Available data (surveys, attendance, assessments, admin data, feedback, baseline numbers, funder reports). Paste snippets or summaries:

Optional (helpful) - Funder requirements (indicators, disaggregation, deadlines): - Staff capacity for data (hours/week, who does it, tools in use): - Data systems (spreadsheets, CRM, case notes): - Equity priorities (e.g., focus on specific subgroups, language access): - Known risks/external factors (e.g., policy shifts, staffing, school calendars):

Format preferences - Tone (choose one): FORMAL | WARM | CASUAL - Output format (choose one): Markdown headings + one markdown table | Plain text bullets + ASCII table - Audience: internal team / board / funders / community - Any specific indicators you must include (if any):

Examples of acceptable input snippets - “Attendance CSV fields: ID, session_date, hours, site.” - “Survey Q3: ‘I know 3 ways to find a living-wage job’ (1–5 Likert). Pre: n=42 mean=2.6; Post: n=37 mean=3.9.” - “Funder asks for quarterly report on #served, % employed within 6 months, and one story per quarter.” - “No baseline on retention; we can pull last year’s rosters.”

If you’re short on time, just paste: - Program name - 2–4 sentence description - Target population + geography - What you do weekly/monthly - Any data you currently track (even a sentence)

I’ll flag gaps, make conservative assumptions, and propose practical workarounds—no data will be invented, and placeholders will be clearly labeled.

Generated: Nov 2, 20252861 tokens$0.0233 cost
AI Evaluation Details (Score: 3/10)
Reasoning:

The AI completely failed to deliver the requested output. Instead of producing an Outcome Framework for Healthy Futures Alliance, it simply repeated the original prompt instructions back to the user, asking them to provide information. This represents a fundamental misunderstanding of the task—the scenario already contained the context needed to generate output.

Strengths:
  • The repeated instructions are technically accurate and well-organized
  • The examples provided would be helpful if this were actually the first interaction
  • Format and structure of the prompt repetition is clear
Weaknesses:
  • CRITICAL FAILURE: Did not produce any of the 8 required framework sections (Program Snapshot, Outcomes, Indicators Table, Outputs, Baselines & Targets, Data Collection Plan, Learning Questions, Review & Use)
  • Ignored the scenario context entirely—treated this as a blank slate rather than recognizing Healthy Futures Alliance as the subject
  • Completely unusable for the stated audience (board) and purpose (creating an Outcome Framework)
  • Zero word count toward the required 900-1,200 words of framework content
  • Did not apply any of the specified formatting preferences (FORMAL tone, plain text bullets + ASCII table, prepared by Mei Chen, etc.)
  • Would waste the time of a Senior Program Manager who expected a deliverable, not a request for information they may have already provided

Test Summary: Generated Nov 2, 20253 scenarios • 9 total outputs • Average quality score: 3.11/10 • Total validation cost: $0.1101