Veterans’ Affairs Launches Public Beta of AI Search While Testing Claims Extraction Tool

The Department of Veterans’ Affairs (DVA) has flipped the switch on a public beta of AI-powered website search, the first federal agency to do so, and disclosed parallel experiments with Microsoft Copilot and a proof-of-concept claims tool built inside the government’s secure GovAI sandbox. The July 2025 launch puts generative AI directly in front of veterans navigating a notoriously complex benefits ecosystem, while the department moves cautiously to test whether the technology can speed claims triage without touching personal data.

DVA assistant secretary of communications Ingrid Nagy revealed that roughly 90 percent of users start their search for DVA information on Google, bypassing the department’s own site. “Our preference was to create something that would lead people to come to our website and stay here,” she said at an AI showcase in Canberra. The new tool returns plain-English summaries, suggests follow-up prompts, and cites source links pulled from public DVA content—including Open Arms, the Anzac Portal, and the Veteran Employment Program—with a thumbs-up/down feedback mechanism to refine the experience.

Crucially, the tool does not access internal records, store personal data, or make decisions on claims. Andrew Powrie, DVA’s AI lead, said a toxicity and trust layer filters abusive or unsafe language and enforces professional standards, though neither the vendor nor the full filter stack is publicly documented in the agency’s transparency statement. Independent reporting confirms the search uses a large language model from OpenAI, consistent with GovAI’s onshore Azure OpenAI instances, but DVA’s official materials stop at “AI-enhanced search” without naming a vendor. That gap leaves an open question about model training, telemetry, and data retention defaults that privacy-conscious users and advocates will want resolved.

Behind the public-facing chatbot, DVA is running two internal pilots that show how the agency thinks about generative AI’s operational potential. A small-scale Microsoft Copilot trial, disclosed in the department’s updated AI transparency statement, mirrors similar productivity experiments across government. The statement frames Copilot as an employee augmentation tool, not a decision-making engine. More ambitious is MyClaims, a proof-of-concept tool developed in GovAI that extracts structured medical details—body systems, body parts, dates—from claims-related documents to shorten manual triage. DVA built a synthetic dataset and redaction capability first, then invited staff to volunteer their own records as pilot inputs under explicit consent. That staged pathway—synthetic to redacted volunteer data to controlled pilot—reflects a deliberate privacy-first design that other agencies could replicate.

The technology surface merits careful inspection. GovAI, the whole-of-government sandbox, provides an Azure-hosted, APS-only environment with platform guardrails that initially restrict deployments to synthetic or public data. It offers access to Azure OpenAI onshore instances and open-source models via Azure Machine Learning. DVA’s prototypes live inside that perimeter, which reduces the risk of ungoverned data egress but introduces a vendor lock-in concern that procurement teams must address with contractual guarantees on data residency, non-use for training, and breach notification. Public statements emphasize that no veteran personal data flows through GovAI today, but as pilots progress toward operational use, the privacy and consent architecture must scale.

Why does any of this matter beyond Canberra’s IT circles? The DVA serves a population with higher-than-average rates of disability, older age, and lower digital literacy. An on-site AI that summarises complex policy into plain English could reduce contact-centre calls and help veterans find entitlements faster. But if the tool hallucinates, misstates deadlines, or buries source links, it could mislead users in ways that have real financial and legal consequences. DVA’s initial design includes source attribution and a brief explanation of how answers are generated, but the department must make verification effortless—no scrolling, no jargon—for users who may be vision-impaired, non-native speakers, or navigating trauma.

Governance architecture provides some reassurance. DVA published an AI transparency statement that names an accountable official, describes current AI activities, and commits to an external AI Advisory Board. That board gives the veteran community a seat at the table as risks and benefits emerge. The statement also tethers the department to the Australian Government’s Policy for the Responsible Use of AI in Government, which mandates human oversight, contestability, and transparency. In practice, that means every AI-generated answer should carry a prominent “informational, not authoritative” label, and high-stakes queries—anything touching benefit eligibility or claims status—should route to a human reviewer by default. DVA has not detailed such routing rules publicly, leaving implementation fidelity as a key metric for advocates to track.

Hallucination risk is not hypothetical. Generative models can produce confident-sounding errors that cite non-existent policies or misinterpret source text. For a government service, even a 1% error rate on a high-traffic page can misinform hundreds of users daily. DVA’s built-in source links and feedback slider are necessary but not sufficient; the department should log every query and response, maintain immutable audit trails, and publish regular accuracy reports. The toxicity filter adds another layer—useful for public-facing chatbots but itself susceptible to overblocking legitimate queries or underblocking coded abuse. Independent testing and disclosure of filter performance would build trust faster than opaque assertions.

The MyClaims proof-of-concept highlights both promise and peril. Extracting structured metadata from long medical PDFs is a grinding clerical task that AI can accelerate dramatically, freeing staff for higher-value judgment calls. DVA staff reportedly fed the tool synthetic data and used a redaction utility to strip identifiers before analysis. That is a textbook privacy-preserving approach for early development. However, medical documents contain nuances—handwritten notes, abbreviations, contradictory assessments—that generative models may misinterpret or flatten. If downstream decisions about compensation rely on AI-extracted summaries without a human cross-check against the original record, errors could compound. DVA’s leadership has not suggested AI will replace professional assessment, but the line between triage and decision-support is easily blurred once deployment accelerates. Safeguards must mandate that every extracted data point traces back to its source document and that final determinations remain in human hands.

Broader implications for Australian government IT are substantial. The DVA’s approach—public transparency, GovAI sandboxing, synthetic data, explicit human oversight—offers a replicable template for agencies from Home Affairs to Services Australia. IT teams now face a new operational checklist: prompt-engineering governance, model version control, post-generation verification workflows, and DLP controls that prevent sensitive information from leaking to external training corpora. Security architects must design telemetry that logs model interactions without itself becoming a privacy risk, and procurement officers must negotiate contracts that forbid vendor reuse of Australian government data for model improvement. These are not hypotheticals; they are the practical preconditions for scaling AI beyond beta.

Several unanswered questions linger. The precise model vendor for the public search remains unconfirmed in official documentation, leaving users to guess whether Microsoft, OpenAI, or a combined stack is under the hood. That matters because different providers have different terms on data usage, retention, and liability. DVA’s transparency statement is a strong step forward, but the next iteration should name the model, document its training cutoff, and disclose any system prompt constraints that shape answers. The toxicity layer’s exact filter stack and false-positive rate are also unreported. Veterans and their advocates have a right to know whether the department’s safety net is catching problems or creating new ones.

On the Copilot front, DVA’s small-scale trial suggests a measured, learn-then-scale philosophy. But Copilot’s core functionality—summarising documents, drafting emails, generating meeting notes—pulls data from the Microsoft 365 graph, which can include sensitive personnel and claims files depending on access controls. DVA must ensure that data-loss-prevention policies and sensitivity labels are rigorously applied before any wider rollout, and that staff understand what Copilot can and cannot see. The agency’s advisory board should review Copilot access logs periodically for anomalies.

What does success look like? DVA should set measurable targets: reduction in contact-centre inquiries for common topics, time-to-first-contact resolution, accuracy of medical metadata extraction, user satisfaction scores segmented by age and disability status, and demonstrable improvements in site-search retention. Publishing these metrics, even in aggregate, would give the veteran community concrete evidence that AI is serving people rather than just cutting costs. The department’s transparency commitment is welcome; delivering on it requires moving from statements to action.

The road ahead will test whether a government agency can marry the speed of generative AI with the deliberative care that vulnerable groups deserve. DVA has built a solid foundation—sandboxed experimentation, community governance, public feedback loops. Now it must execute on the details: vendor transparency, hallucination safeguards, accessibility testing, and an auditable record of every AI-assisted interaction. The veterans who rely on this department have earned nothing less.