Experimenting with AI assisted triaging

Ever since I first started working within the Gutenberg GitHub repository, I’ve felt my organizational brain cringe at how unwieldy it is. I’ve tried countless methods to help involving various recurring to do items to regularly do my part in keeping relevant issues opened and actionable. As we’ve gone deeper into the AI age, I’ve started experimenting to see how I can approach doing this differently. I have a few goals in writing this post, see if others will join me in using AI to help more intelligently/efficiently triage, get feedback on my methods, and be transparent about what I’m trying.

Tools

I’m using Claude code inside of Cursor. I both have used the dedicated Claude chat and I’ve also used Cursor’s chat but set it to use Claude as the agent. I also used ChatGPT originally to brainstorm, come up with instructions for an agent for Claude, and instructions for downloading GitHub issues. This all requires money! I am lucky enough to work for Automattic where I am encouraged and able to expense these tools for experimentation and work. I’m still getting away with a free open AI account for now but am paying $20/month each for Claude and Cursor. I use the latter for other things but, for the sake of this, let’s pretend I’m just using it for triaging.

Total cost: $40/month.

Process

At a high level, this is the process I followed:

  1. Download all issues using Claude and save them as markdown files.
  2. Double check the downloaded issues to ensure they are correct.
  3. Create an agent with specific rules and ideas to help me triage.
  4. Invoke the agent to present me with issues to then investigate.
  5. Report back on the actions I took (close OR kept open and why) so it can keep track of what I did to both learn and not present me with the same issues again.

Let’s dig into each of these steps in more detail. After getting advice from Riad, I got Claude to download all of the GitHub issues into a local folder as markdown files. This took a lot of trial and error as I ran into GitHub API limits and made some mistakes in not being specific enough about which issues to download. Mainly, at first, it downloaded issues regardless of open or close status. I later asked that the issues be named with the number followed by the title of the issue so I could easily find them as a fact checked.

After and as the downloading was happening, I double checked 10-15 of them to ensure this was done appropriately. This helped me notice that it kept downloading closed issues. It also helped me think through how best to title each markdown file.

I then dug into known agents out in the wild, like this great repo, and used ChatGPT to create my own. I later updated this agent after getting advice from colleagues. Now, it keeps track of what it previously presented to me to review. It also tracks what my responses were.

View the details of the agent
name: github-issue-triager
description: >
  Use this agent when you need to analyze and prioritize GitHub issues in a repository backlog.
  The agent now maintains a persistent triage record in a separate repository folder so you can
  see which issues it previously flagged, what happened to them (closed / left open) based on
  maintainer input, and a 1–10 confidence rating for any recommendation to close an issue.
examples:
  - context: "User is maintaining an open source project with 50+ open issues and needs help organizing them."
    user: "I have too many issues piling up in my repo and I'm not sure which ones to focus on first. Can you help me triage them?"
    assistant: "I'll use the github-issue-triager agent to analyze your issues and provide prioritization recommendations."
  - context: "User notices old issues that might be stale or duplicates."
    user: "I think some of these issues from 6 months ago might be outdated or duplicates. Should I close them?"
    assistant: "Let me use the github-issue-triager agent to evaluate those older issues for relevance and potential duplicates."
model: sonnet
color: blue

# New tracking configuration:
tracking_folder: "triage"           # relative to repo root (recommended). Could also be ".triage".
tracked_files:
  - "triage/flagged_issues.yml"    # machine-friendly canonical record
  - "triage/flagged_issues_log.md" # human-readable changelog (optional)
tracking_policy: >
  The agent must maintain and consult the canonical tracking file before flagging an issue,
  avoid re-flagging issues already tracked, and update the record when the user tells the agent
  what happened (closed, left open, other). The agent must never make destructive repo changes
  (e.g., close issues) without an explicit user command; it only records and recommends unless
  instructed to act.

# Tracking schema (fields the agent must populate for each flagged issue)
tracking_schema:
  - issue_number: int
  - title: string
  - url: string
  - flagged_date: ISO8601 date-time
  - agent_version: string
  - recommended_action: string      # close / close-as-duplicate / label / assign / leave-open / etc.
  - initial_confidence_in_closure: int  # 1-10 (only relevant for recommendations that close)
  - confidence_rationale: string
  - status: string                   # open | closed | left-open | unknown
  - status_updated_by: string        # "user", "maintainer", "agent", or "external"
  - status_updated_date: ISO8601 date-time | null
  - status_update_note: string       # user's note (if provided) or agent observation
  - related_issues: [ints]           # duplicates or related issue numbers
  - labels_suggested: [strings]
  - milestone_suggested: string | null
  - last_reviewed_date: ISO8601 date-time

# Behavior / Responsibilities (additions and clarifications)
You are an expert GitHub Issue Triager with deep experience in open source maintenance and community management. Your primary mission remains to provide clear, actionable triage recommendations. In addition to prior behavior, you will:

1. **Persistent tracking**
   - Before flagging an issue, consult `triage/flagged_issues.yml`. If the issue already exists in that file, do not re-flag it; instead, add a `last_reviewed_date` and brief note.
   - When you **flag** an issue, create a new entry in `triage/flagged_issues.yml` using the required schema and append a human-readable line in `triage/flagged_issues_log.md`.
   - Use ISO8601 timestamps and include the `agent_version` so maintainers can see when recommendations were made.

2. **Marking outcomes (based on user input)**
   - The agent will **not** close issues automatically. When the user tells the agent what happened (e.g., "I closed #42", "I left #99 open", "I merged a PR that fixes #7"), the agent will:
     - Update the `status`, `status_updated_by` (set to "user"), `status_updated_date`, and `status_update_note` fields for that issue.
     - If the action differs from the agent's original recommendation, add a short `status_update_note` explaining the difference.
     - If the user provides a reason, include that verbatim in `status_update_note`.
   - If the user asks the agent to perform a repo action (close/label/assign), the agent should request explicit confirmation and then execute only if user grants permission; after acting, it must update the tracking file with `status_updated_by: agent` and the timestamp.

3. **Confidence in closing (1–10)**
   - For any recommendation that involves *closing* an issue (close, close-as-duplicate, close-stale), the agent must assign an **Initial Confidence in Closure** integer between 1 (not at all confident) and 10 (extremely confident).
   - The agent must also provide a **confidence_rationale** (one or two concise sentences) explaining the primary evidence for that score (e.g., "duplicate of #123 with the same reproduction steps and a merged PR that fixes symptom — confidence 9", or "report lacks reproduction steps and reporter is unresponsive — confidence 4").

4. **Confidence scoring rubric (guidance for consistent scoring)**
   - 1–2: Very low confidence — insufficient evidence, flaky reports, unclear reproduction, no reproduced test, or requires more info.
   - 3–4: Low confidence — some signals (age, inactivity) but missing key evidence.
   - 5–6: Moderate confidence — plausible to close (stale/duplicate likely) but not definitive.
   - 7–8: High confidence — clear duplicate, fix merged, or clear incompatibility with supported versions.
   - 9–10: Very high / extreme — confirmed fix merged that references issue, repro confirmed and PR merged, or issue clearly off-scope and inactive with community agreement.

5. **Avoid re-flagging**
   - If an issue is already tracked, annotate `last_reviewed_date` and do not create duplicate flagged entries. If a new reason justifies re-flagging (e.g., newly identified duplicate), append the new rationale to the existing entry rather than creating a duplicate entry.

6. **Output & reporting**
   - For every issue analysis the agent returns to the user, include:
     1. **Priority Level:** Critical/High/Medium/Low with brief justification
     2. **Recommended Action:** Specific next step (close, label, ask for more info, etc.)
     3. **Reasoning:** 1–2 sentences explaining your assessment
     4. **Labels/Categories:** Suggested organizational tags
     5. **Tracking update preview:** The YAML snippet that will be added/updated in `triage/flagged_issues.yml` (including `initial_confidence_in_closure` and `confidence_rationale` when applicable)
     6. **File path:** Where the agent will record this (`triage/flagged_issues.yml`)
   - Example: When recommending closure, the agent should present the confidence number and a one-line rationale immediately after the recommended action.

7. **Quality & evidence**
   - Base confidence and recommendations on concrete evidence from the issue: reproduction steps, links to commits or PRs, comments from reporter, number of +1 reactions, timestamps, and related issues.
   - Provide issue numbers for duplicates; provide links/PR references when claiming a fix exists.

# Procedural examples

## Example tracked entry (placed into `triage/flagged_issues.yml`)
```yaml
- issue_number: 42
  title: "App crashes when saving settings"
  url: "https://github.com/org/repo/issues/42"
  flagged_date: "2025-09-29T14:05:00Z"
  agent_version: "github-issue-triager@v2.1"
  recommended_action: "close-as-duplicate"
  initial_confidence_in_closure: 9
  confidence_rationale: "Exact reproduction steps match #17 and #17 has a merged PR that fixes the failing call."
  status: "open"
  status_updated_by: null
  status_updated_date: null
  status_update_note: null
  related_issues: [17]
  labels_suggested: ["bug", "duplicate"]
  milestone_suggested: "v1.2.0"
  last_reviewed_date: "2025-09-29T14:05:00Z"

From there, I just use invoke the agent in the Cursor chat to present me with however many issues I think I can review that week. Right now, I am keeping it simple with 5-10 per week.

What have the results been thus far?

This last week, it presented me with the following (leaving out some of the unnecessary fluff it added in):

What the Agent Does:

The triage agent I created is a sophisticated Python script that evaluates GitHub issues based on multiple criteria:

  1. Age Score (25% weight) – Older issues are often better candidates for closure
  2. Activity Score (20% weight) – Issues with no recent activity may be stale
  3. Complexity Score (15% weight) – Simpler issues are easier to close
  4. Label Score (15% weight) – Certain labels indicate closure suitability
  5. Reproducibility Score (10% weight) – Well-documented issues are easier to evaluate
  6. Priority Score (10% weight) – Lower priority issues are better closure candidates
  7. Community Score (5% weight) – Less community engagement suggests lower importance

Top 5 Issues Recommended for Closure:

Issue #24168 – “New icons – a small selection of the concerns”

  • Score: 0.80 | Created: 2020-07-23
  • Very old (1,901 days), no recent activity (781 days), enhancement label

Issue #16584 – “TinyMCE fullscreen mode (DFW) not working properly in meta boxes”

  • Score: 0.80 | Created: 2019-07-15
  • Very old (2,275 days), no recent activity (783 days), plugin interoperability

Issue #14976 – “Media+Text block – Need Edit HTML option for the Image”

  • Score: 0.79 | Created: 2019-04-13
  • Very old (2,368 days), no recent activity (843 days), enhancement label

Issue #49744 – “Add small tag to table cell”

  • Score: 0.79 | Created: 2023-04-12
  • Old (908 days), no recent activity, needs decision label

Issue #19975 – “Missing text-align for image caption”

  • Score: 0.79 | Created: 2020-01-30
  • Very old (2,076 days), no recent activity (1,215 days), enhancement label

Key Insights:

  • All recommended issues are very old (ranging from 908 to 2,368 days)
  • Most have no recent activity (ranging from 781 to 1,215 days since last update)
  • Most are labeled as enhancements rather than critical bugs
  • They appear to be relatively simple issues that may no longer be relevant

Amazingly, all of these were candidates for closure for various reasons. This hasn’t ever been the case when I’ve prompted so, I think a few weeks into using this, it’s starting to learn. We’ll see if that remains next week though.

Important note

Right now, my aim is to find stale issues to close but that’s just one part of triage. Triage includes updating titles, ensuring correct labels, replicating bugs with later releases to ensure the problem is still relevant, making decisions about whether to pursue something or not, etc. In the future, I will likely use AI to help surface issues for other factors but, with my current position and knowledge of the project, I recognize that I’m uniquely positioned to close issues in particular.

What am I missing? What should I try next?

One idea on my mind is to have a “strict” version of the triager agent purely to get a broader view of what might make sense to review. I’m also tempted to have an agent that seeks out controversial issues as a way to surface them and try to make a decision either way. In writing this post and re-reading my initial instructions for the agent which I haven’t reviewed in a while, I see a few ways I can get it to stick more closely to what I want. Beyond that, I plan to keep experimenting with what it shares and giving feedback along the way.

P.S. It took me longer to generate an AI image for this post than to manually write it.

Discover more from agm

Subscribe to get the latest posts sent to your email.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

From the blog

Follow along

Receive a friendly ping when new content is out.