- Use AI to draft plain-language rubric explanations for staff to review and approve.
- Use AI to generate calibration debrief questions based on program criteria.
- Use AI to draft the FAQ section of the onboarding packet from approved source text.
- Select and anonymize anchor examples ourselves; AI does not choose what gets used in calibration.
- Have program director approve all rubric explanation language before distribution to reviewers.
- Include an equity watch note for each scoring criterion flagging common privilege-marker inflation risks.
- Use AI to score any application, including anchor examples used in calibration.
- Include applicant names, school names, or demographic data in any AI prompt.
- Distribute AI-drafted rubric language to reviewers without director sign-off.
- Use AI to resolve disagreements between reviewers during the review period (program director function).
- Imply to reviewers that AI was used to create any final policy or decision guidance.
Planning: Variance Audit, Scope, and Anchor Example Selection
Pull individual reviewer scores by criterion from last cycle. For each criterion, calculate the spread between the highest and lowest score across reviewers for the same application. High variance is your onboarding target.
| Criterion | Score Range (Last Cycle) | Avg. Spread | Common Reviewer Confusion | Onboarding Priority |
|---|---|---|---|---|
| Community impact / leadership | Fill in | Fill in | Distinguishing "participation" from "leadership"; privilege-marker inflation | High |
| Academic achievement | Fill in | Fill in | Weighted vs. unweighted GPA; rigor context | Medium |
| Essay quality / written expression | Fill in | Fill in | Language fluency vs. idea quality; editing access | High |
| Financial need / context | Fill in | Fill in | Interpreting narrative vs. documentation | Medium |
| Future goals / mission alignment | Fill in | Fill in | Vague goals vs. specific vision; writing polish | Medium |
Staff (not AI) select three anchor examples from last cycle. These represent the scoring range and become the calibration exercise material. Complete all anonymization steps before any AI prompt work begins.
- Pull essays/applications for the two high-priority criteria from last cycle's reviewer score data.
- Select three: one scored consistently HIGH (strong agreement), one consistently MID, and one that generated the most reviewer disagreement (EDGE).
- Remove all identifying information: applicant name, school name, city, specific program or club names, demographic details. Replace with generic placeholders.
- Have two returning reviewers confirm the anonymization is complete. Document their initials and the date.
- Program director approves the three anchor examples for use in calibration. Record approval in writing.
- Store in shared drive: Calibration_Anchors_[YEAR]_Anonymized. Access: pilot team only.
- DecideLive calibration session (30–45 min) with facilitated debrief: recommended for first-time use or new reviewer cohort.
- DecideAsynchronous calibration (reviewers score independently, share via form): works for experienced cohorts when live sessions are impractical.
- NoteIf asynchronous, build an explicit debrief step: share anonymized score distributions before the review period opens.
- SetOnboarding packet distributed at least 5 business days before review period opens.
- SetCalibration completed before any live applications are scored.
- SetDebrief results shared with all reviewers within 24 hours of calibration close.
Building: Rubric Explanations, FAQ, and Calibration Exercise
The calibration exercise uses the three approved anchor examples. Reviewers score them independently before seeing any discussion. The exercise is designed to surface scoring differences before live applications are assigned.
- Distribute the three anchor examples (anonymized) to all reviewers with the scoring rubric but without annotations or expected scores.
- Ask reviewers to score each anchor on the two high-priority criteria only (highest variance from the audit).
- Collect scores before the calibration session. Do not share individual scores until the session debrief begins.
- In the debrief, share the distribution of scores (not individual reviewer names). Use AI-generated debrief questions to guide discussion.
- For each criterion where variance remains above the threshold after discussion, program director clarifies the rubric interpretation. Document the clarification; add it to the FAQ.
- After calibration, share a one-page score distribution summary with all reviewers so they can see where they landed relative to the group.
Review & Wrap-Up: Measure Variance, Update FAQ, and Document for Next Cycle
- MeasurePull criterion-level scoring data from the current cycle. Calculate spread between reviewers on the two high-priority criteria.
- CompareCompare to last cycle's variance baseline. Did the spread narrow? By how much?
- SurveyPost-review confidence survey: ask reviewers how confident they felt applying each criterion.
- ReviewDid the calibration anchor examples generate useful debrief discussion, or were they too easy? Note for next cycle.
- AuditWhat questions did reviewers ask during the review period that the FAQ should have covered? Add them now while they are fresh.
- LogWhich AI-drafted rubric explanation sections required the most staff editing? Consider whether human-written versions serve better for those sections.
- UpdateAny rubric clarifications made during the calibration debrief must be formalized in the official rubric document, not just in the FAQ.
- NoteReview the bias awareness one-pager for relevance. Update examples if they did not resonate with this cycle's reviewers.
You are a training writer helping a nonprofit scholarship program create plain-language rubric explanations for volunteer reviewers. Your task: Explain the following scoring criterion in clear, accessible language that a first-time reviewer can apply without ambiguity. Criterion name: [CRITERION NAME â e.g., Community Leadership] Official rubric language: [PASTE RUBRIC TEXT FOR THIS CRITERION] Score scale: [e.g., 1â5 / 1â10 / Exemplary / Proficient / Developing] Write a plain-language explanation that includes: 1. What this criterion is measuring in plain terms (1 short paragraph) 2. What a HIGH score looks like (2â3 bullet points using "A high score shows..." sentence starters) 3. What a LOW score looks like (2â3 bullet points using "A low score shows..." sentence starters) 4. One common mistake reviewers make when applying this criterion Do not invent scoring examples or applicant scenarios. Use only the rubric language provided above.
You are a training writer helping a nonprofit scholarship program create reviewer onboarding materials. Your task: Produce a complete rubric explanation card for the criterion below, including equity watch notes. Criterion name: [CRITERION NAME] Official rubric language: [PASTE RUBRIC TEXT] Score scale: [e.g., 1â5] Program context: [BRIEF DESCRIPTION â e.g., "This scholarship serves first-generation college students from rural communities"] Produce the following sections: 1. PLAIN-LANGUAGE DEFINITION (1 paragraph): What is this criterion actually measuring? 2. SCORE LEVEL GUIDE: - High score looks like: (2â3 bullets) - Mid score looks like: (2â3 bullets) - Low score looks like: (2â3 bullets) 3. COMMON ERRORS (2â3 bullets): Where do reviewers typically go wrong on this criterion? 4. EQUITY WATCH (2â3 bullets): What privilege markers or socioeconomic factors might unfairly inflate or deflate scores on this criterion? What should reviewers watch for? 5. SELF-CHECK QUESTION: One question a reviewer should ask themselves before finalizing their score on this criterion. Cite only the rubric language provided. Do not invent applicant scenarios or scoring examples.
You are helping a nonprofit scholarship program facilitate a reviewer calibration session. This is a governed workflow. Follow all steps in order. STEP 1 â Acknowledge scope: Confirm: (a) no applicant PII is present in this prompt, (b) you are generating facilitation questions only â you are not scoring or evaluating any application, (c) you will flag any uncertainty rather than speculate. STEP 2 â Generate calibration debrief questions using only this context: Criterion being calibrated: [CRITERION NAME] Rubric language for this criterion: [PASTE RUBRIC TEXT] Score scale: [e.g., 1â5] Typical sources of reviewer disagreement on this criterion: [PASTE FROM YOUR VARIANCE AUDIT â or write "unknown"] Program context: [BRIEF DESCRIPTION] Generate the following: A. OPENING QUESTION (1): A warm-up question that invites reviewers to share their score and initial reasoning without defensiveness. B. DIVERGENCE QUESTIONS (2â3): Questions for when scores are spread across the range. Focus on helping reviewers articulate what evidence they were looking for. C. EQUITY PROBE QUESTIONS (2): Questions that prompt reviewers to examine whether privilege markers (unpaid internships, travel, private school resources, polished writing) may have influenced their score. D. CONSENSUS CLOSE (1): A closing question that helps the group arrive at a shared interpretation of the criterion for this review cycle. STEP 3 â Self-audit: - [ ] Do any questions suggest a "correct" score? (If yes, revise to be neutral.) - [ ] Do equity probe questions name specific applicants or schools? (They must not.) - [ ] Is any question based on information not in the rubric text I provided? (If yes, flag it.) STEP 4 â Output format: A. Self-audit results B. Debrief questions (labeled A through D as above) C. Facilitator notes: one practical tip for each question section
This is what an AI-drafted, staff-reviewed rubric explanation card should look like. Use as the benchmark for evaluating AI output quality. All program-specific examples are added by staff, not AI.
- Unpaid internships, international service trips, and private-school leadership roles may reflect family resources rather than individual initiative. Score the evidence of action, not the prestige of the opportunity.
- Students from rural or under-resourced communities may describe leadership in informal settings (caring for siblings, translating for family, supporting a faith community). These count.
- If the essay is highly polished and describes prominent activities, ask: "Would a student with fewer resources describe this same quality of engagement?" Score the engagement, not the presentation.
Three anchor examples are used. Reviewers score independently before any debrief begins. Scoring forms are collected before the session opens to prevent anchoring.
What reviewers receive: Anonymized essay excerpt (150–200 words) describing community engagement. No school name, applicant name, or identifying details.
Staff annotator notes (not shared with reviewers until debrief):
- This applicant organized a recurring tutoring program that continued after their graduation.
- Leadership is evidenced by described action, not by title or organization prestige.
- No privilege markers present; activity was self-initiated in a public school setting.
What reviewers receive: Anonymized essay excerpt describing consistent volunteering without a described leadership role.
Staff annotator notes (not shared until debrief):
- Reviewer disagreement on this anchor is expected and intentional. The debrief should surface what "contribution" means vs. "leadership."
- Watch for reviewers who scored high because the writing is polished (essay quality artifact).
What reviewers receive: Anonymized essay excerpt describing informal caregiving and family support responsibilities in a non-institutional setting.
Staff annotator notes (not shared until debrief):
- Some reviewers will score this low because no "organization" is named. The debrief should challenge whether the rubric requires institutional affiliation.
- This is the equity probe anchor. Listen for language that privileges formal leadership structures over informal community roles.
- If the rubric does not clearly address informal community roles, the director must clarify before the review period opens.
The complete onboarding packet contains five sections. AI drafts sections 1, 3, and 4. Staff write sections 2 and 5. Director approves the full packet before distribution.