Case Studies

How can responsible AI be delivered at the scale and speed the world now demands?

Stakeholders

United Nations International Computing Centre (UNICC), the United Nations' strategic partner for cybersecurity, cloud, data, and AI services across more than 100 international organizations.
New York University, School of Professional Studies (NYU SPS)

Scope

20 cross-functional teams • 80+ practitioners • 6 UN official languages • 3 media formats

20

Cross-functional teams

80+

Practitioners

6

UN official languages

3

Media formats

Context

The engagement began as a strategic partnership between UNICC and NYU SPS within a graduate STEM education program.

The objective was to deliver a production-ready, responsible AI solution within a tight delivery cycle, addressing an urgent global need while building applied capacity in the next generation of technology professionals through hands-on, project-based collaboration.

The initiative was positioned within the global AI-driven content moderation market, projected to exceed $20 billion by 2032, with rising demand for multilingual, multimodal, and ethically grounded solutions.

$20B+

Projected global AI content moderation market by 2032

The Challenge

UNICC needed an AI solution that could detect harmful narratives, xenophobic language, and misinformation across all six UN official languages and three media formats: text, audio, and video.

Three constraints shaped the engagement.

Tight timeline. The project required rapid development and deployment within one academic cycle without compromising responsible AI principles or technical quality.

Responsible AI as a baseline standard. AI solutions deployed at this scale and level of visibility must be designed around fairness, bias mitigation, cultural sensitivity, and human oversight from day one. Responsible AI was treated as a core architectural principle — not as a layer added at the end.

High coordination complexity. Delivery required aligning cross-functional teams, two institutional partners, and multiple stakeholder groups within a fast-paced, competitive, and time-constrained environment.

The Approach

I led the design and execution of the AI solution, structured around four operating principles.

Responsible AI embedded into the architecture. Fairness assessment, bias mitigation, and transparent design documentation were established as sprint-level deliverables. Ethical reflection became an ongoing engineering discipline, not a final compliance checkpoint.

Agile as the operating model. A unified product roadmap was established across all teams, organized through themes, epics, and user stories. OKRs aligned execution with UNICC objectives, while KPIs tracked both technical progress and team performance. Biweekly sprints, daily stand-ups, retrospectives, and recurring stakeholder reviews with UNICC leadership created a continuous feedback-driven delivery rhythm. Iterative and incremental development enabled rapid adaptation throughout the project lifecycle.

Competition as a delivery accelerator. Multiple teams pursued capability areas in parallel, including xenophobia detection, sentiment analysis, topic-based harm classification, and dashboard integration. This created a portfolio of validated approaches while reducing single-track delivery risk.

Embedded stakeholder coordination. I served as the primary coordination point between UNICC, NYU faculty, and delivery teams, translating strategic objectives into execution discipline, maintaining Agile rigor across teams, and ensuring responsible AI requirements were upheld throughout delivery.

Execution

The program was delivered in two integrated phases.

Phase 1: Proof of Concept. Four competing teams developed a text-based prototype focused on xenophobia detection, fact and language validation, and topic-based narrative analysis, unified through an interactive dashboard and API-backed architecture.

Phase 2: Multilingual and Multimodal Expansion. Sixteen competing teams scaled the solution across six languages and three media modalities. This phase introduced LLM-driven multimodal processing and an integrated fact-checking layer designed to address reliability and validation challenges associated with generative AI outputs.

Recurring stakeholder reviews with UNICC representatives informed each release cycle, while retrospectives translated sprint-level insights into continuous delivery improvements.

Impact

Validated Technical Performance

Independent testing against a 159,000-entry labeled dataset produced:

91% precision
87% recall
89% F1 score

91%

Precision

87%

Recall

89%

F1 score

98.29%

Real-world accuracy

Phase 2 validation on real-world public content achieved 98.29% accuracy across text, audio, and video modalities.

Measurable Delivery Outcomes

Comparative analysis across 111 projects over four years demonstrated statistically significant improvements when the integrated AI-and-Agile delivery model was applied:

Top-tier client satisfaction increased from 29.8% to 45.3%, a 52% relative improvement (p = 0.0008)
Publication-grade outcomes increased from 6.4% to 26.6%, more than a fourfold increase (p = 0.013)
Projects with embedded coordination and delivery leadership outperformed those without by 22 percentage points in client satisfaction (p = 0.013)

+52%

Increase in top-tier client satisfaction

4x

Growth in publication-grade outcomes

+22pp

Lift from embedded coordinating leadership

Market and Positioning

The prototype is positioned to address the global AI-driven content moderation market, projected to exceed $20 billion by 2032. Its modular architecture enables future adaptation across additional UN programmatic areas, including human rights monitoring, crisis response, gender and social inclusion, and conflict early-warning systems.

The engagement was documented in the co-authored UNICC–NYU SPS white paper, Responsible AI Innovation for Social Impact: A Case Study in Multilingual Media Moderation (September 2025), as well as in related peer-reviewed academic research titled Integrating AI and Agile in STEM Education: A Case Study of Project-Based Learning in Technology Management.

How can a mission-driven organization scale rapidly without losing impact?

Stakeholder

A mission-driven technology company developing digital learning solutions for students with severe disabilities and complex learning needs, serving educators and specialists across more than 32,000 schools nationwide.

32,000+

Schools nationwide served by the organization

Scope

AI Strategy Lead · Operational Transformation · Product & Workflow Design · Cross-Functional Execution · Multi-Team Coordination · Process Standardization · Scalable Delivery Operations

Context

The organization develops digital learning tools to support learners with complex learning profiles, including students with autism, intellectual disabilities, and developmental differences. Its solutions span serving learners from Pre-K through adulthood.

As the company scaled and expanded its digital product portfolio and cross-functional initiatives, the informal operational systems that had supported earlier growth were no longer sufficient. Product, Design, Engineering, QA, and Marketing and Sales teams began experiencing increasing execution friction, including misaligned timelines, unclear ownership structures, fragmented communication, and inconsistent delivery processes.

The engagement addressed the growing operational gap between the organization's mission and its ability to execute that mission consistently, collaboratively, and at scale.

The Challenge

Scaling a mission-driven product organization introduced a distinct set of operational tensions. Development required tight coordination across research, curriculum, product, engineering, QA, and go-to-market functions, a delivery chain that became increasingly difficult to manage as the organization and product portfolio expanded.

Three constraints shaped the engagement.

Research-to-product fidelity. Translating curriculum features into digital learning products required structured coordination across academic research, curriculum design, product, and engineering teams. Without deliberate operational alignment, execution inconsistencies began emerging across workflows, approvals, and delivery processes, creating risk to both product quality and learner experience.

Cross-functional alignment gaps. Product, Design, Engineering, QA, and Marketing teams lacked a unified operational model for moving initiatives from concept through release. Handoffs remained largely informal, approval structures were inconsistent, and the absence of defined decision gates created rework loops that compounded across multiple simultaneous product lines.

Scalability without a repeatable execution model. As several products and digital initiatives scaled in parallel, the organization required a standardized operational framework capable of supporting growth consistently across teams and initiatives, rather than rebuilding execution processes independently for each launch.

The Approach

I led the operational transformation through the deployment of the LOOP Execution Framework™, a continuous, feedback-driven operating model designed to translate strategy into structured, scalable execution. The engagement was organized around four operating principles.

Structured phase gates with defined ownership. Each stage of the product lifecycle, from requirements definition through release, was mapped with explicit team responsibilities, entry criteria, approval checkpoints, and delivery expectations. This replaced ambiguous handoffs with a standardized execution structure and established a shared definition of readiness at every phase.

Embedded feedback loops instead of sequential reviews. Rather than concentrating feedback at the end of delivery cycles, the LOOP Framework integrated structured cross-functional input throughout execution. Product, Design, Engineering, QA, Marketing, and Leadership participated in recurring review cycles at defined stages, enabling teams to identify misalignment early before it evolved into costly downstream rework.

A Technical Program Manager as the orchestration layer. A dedicated Technical Program Manager function was established as the operational coordination layer across teams, maintaining execution visibility, documentation integrity, delivery alignment, dependency management, and communication continuity from Product through Release. The role also served as the central escalation and decision-management point across simultaneous initiatives.

AI-augmented execution embedded into operational workflows. AI capabilities were integrated directly into the execution lifecycle to improve delivery speed, consistency, and operational quality without disrupting existing workflows or requiring new platforms. This included AI-assisted requirements documentation, curriculum content support, accessibility pre-validation, QA preparation workflows, release coordination artifacts, and cross-functional communication support operating within the organization's existing tooling ecosystem.

Execution

The transformation was delivered through structured discovery, phased implementation, and continuous operational iteration.

Phase 1: Product. The Product team defined requirements, feature scope, delivery priorities, and workflow specifications through structured documentation processes. Cross-functional feedback loops brought Design, UI/UX, Engineering, QA, and Marketing into the discussion early, while accessibility alignment and instructional consistency were incorporated from the outset. The phase concluded with the establishment of a centralized source of truth, a consolidated documentation and asset repository containing all approved materials before design or engineering execution began.

Phase 2: Design. UI/UX and Graphic Design teams translated approved product requirements into Figma-based workflows, interaction models, visual systems, and learning-flow specifications. Structured review cycles validated functional logic, edge cases, accessibility considerations, and instructional accuracy in collaboration with Product, Engineering, and Marketing stakeholders. The phase concluded with a fully approved design system and finalized stakeholder sign-offs before implementation handoff.

Phase 3: Engineering. Engineering teams developed features against approved requirements, workflows, and design specifications within a structured delivery model. QA participation was embedded early in the phase, enabling test planning, integration review, and issue identification during development rather than after completion. Continuous feedback loops supported rapid resolution of technical dependencies, UI refinements, accessibility adjustments, and functional questions throughout implementation. The phase delivered validated, demo-ready builds ahead of Marketing and release readiness timelines.

Phase 4: Testing and Release. QA conducted comprehensive validation across functionality, accessibility, performance, compatibility, and instructional user-flow accuracy. Structured review cycles involving Product, Design, Engineering, QA, and Marketing teams resolved defects, UI corrections, final refinements, and release readiness requirements collaboratively. The release phase closed only after formal cross-functional stakeholder sign-off and verification that all operational, product, and launch requirements had been completed.

Impact

The deployment of the LOOP Execution Framework™ delivered measurable operational improvements across the organization's product development lifecycle, cross-functional coordination model, and delivery culture.

Delivery discipline at scale. Structured phase gates, defined ownership models, and formal approval checkpoints eliminated many of the ambiguous handoffs that had previously driven rework cycles and delivery misalignment. Teams operated with shared execution expectations across all phases, improving coordination and reducing late-stage delivery risk across multiple simultaneous product lines.

Faster and more predictable release cycles. Early QA integration, centralized documentation workflows, and AI-assisted preparation processes improved testing readiness and compressed delivery timelines. Product teams were able to move from development through validation with greater predictability while consistently supporting Marketing and release-readiness timelines across a growing digital portfolio.

Cross-functional alignment embedded into operational culture. The TPM function and recurring feedback loops transformed Product, Design, Engineering, QA, and Marketing from largely sequential handoff teams into a coordinated delivery ecosystem. Over time, the LOOP methodology evolved from a project framework into an operational discipline embedded within day-to-day execution practices.

Research-to-product fidelity maintained at scale. Embedding accessibility validation, curriculum oversight, and structured instructional review throughout the lifecycle ensured that evidence-based literacy methodologies remained consistently reflected across product features, learning flows, and digital experiences, preserving the organization's research-driven quality standards as delivery scaled.

A repeatable operational model for portfolio growth. By establishing a standardized execution framework, the organization created a durable operational capability capable of supporting continued portfolio expansion, parallel product delivery, and future digital initiatives with greater consistency, scalability, and execution confidence.

How can organizations reduce costly hiring failures by making cultural alignment measurable, scalable, and data-driven?

An AI-powered recruitment platform built to bring objectivity and structure to one of the costliest problems in hiring.

Stakeholder

Google – "AI for Collaboration" Challenge, a competitive program built for the Google Workspace ecosystem using Gemini.

Scope

AI Strategy Lead • Product & Workflow Design • Cross-Functional Execution • Google Workspace Integration • Multi-Time-Zone Team Coordination

Context

Gemini HR was developed as part of the Google "AI for Collaboration" Challenge, where the team was competitively selected from more than 400 participants to design AI-driven solutions for the Google Workspace ecosystem.

The initiative addressed a persistent and costly operational challenge: organizations continue to make hiring decisions based heavily on resumes and subjective judgment, often overlooking the cultural alignment that determines long-term team performance and retention.

The platform was designed for the Google Workspace environment, serving more than 12 million small and medium-sized businesses worldwide — organizations facing the same retention and hiring challenges as large enterprises, but often without dedicated assessment infrastructure.

12M+

Small and medium-sized businesses in the Google Workspace ecosystem

The engagement operated within a market where voluntary turnover costs U.S. businesses an estimated $1 trillion annually, with replacement costs ranging from 50% to 200% of an employee's annual salary.

The Challenge

Hiring the right people has become one of the most significant operational and strategic challenges organizations face today. While most hiring systems are optimized to assess technical qualifications and resume matching, they often struggle to evaluate the behavioral and cultural factors that determine long-term retention and team performance.

Hiring failures rarely come from a lack of skill. A Leadership IQ study of 20,000 hiring managers found that 89% of hiring failures are driven by attitude and cultural misalignment, not technical ability, yet 46% of new hires fail within their first 18 months. SHRM separately reports that a poor cultural fit can cost an organization up to 60% of that person's annual salary.

89%

of hiring failures driven by cultural misalignment, not skill (Leadership IQ)

46%

of new hires fail within their first 18 months (Leadership IQ)

60%

of annual salary — typical cost of replacing a poor cultural fit (SHRM)

Gallup estimates that voluntary departures cost U.S. businesses nearly $1 trillion annually. At the same time, many organizations continue to rely on traditional Applicant Tracking Systems (ATS), primarily designed for resume filtering, keyword matching, and workflow management, not for evaluating organizational fit, communication patterns, or long-term team alignment.

$1T

annual cost of voluntary turnover to U.S. businesses (Gallup)

50–200%

of annual salary — cost to replace an employee (SHRM)

75%

of voluntary turnover is considered preventable (Work Institute, 2025)

Three constraints shaped the engagement.

Subjective evaluation. Cultural fit assessment relies heavily on gut feeling rather than structured, measurable criteria.

Inconsistent hiring standards. Evaluation varies across teams, interviewers, and departments, making outcomes difficult to compare or improve.

Limited scalability. As organizations grow, maintaining consistent, objective evaluation becomes increasingly difficult — particularly for small and medium-sized businesses that lack dedicated assessment infrastructure.

The challenge was to make a cultural-fit evaluation objective, structured, and scalable without removing human judgment from the hiring decision.

The Approach

I led the AI strategy, workflow design, and cross-functional execution for the platform, structured around four operating principles.

Benchmark definition as the foundation. HR teams upload company mission statements, values, and culture documentation into Google Docs. Gemini HR analyzes these materials and helps translate qualitative organizational values into measurable hiring criteria using a structured five-point evaluation framework. This benchmark becomes the reference point against which candidates are assessed.

Automated, objective assessments. Based on the organizational benchmark, the system automatically generates customized candidate questionnaires through Google Forms, designed to evaluate both technical capabilities and cultural alignment against predefined criteria.

AI-driven scoring through natural language processing. As candidate responses are collected, Gemini HR applies natural language processing (NLP) techniques to analyze communication patterns and behavioral alignment. Responses are converted into vector representations and compared against organizational benchmarks across dimensions such as teamwork, communication, and collaboration using cosine similarity scoring.

Workflow integration over disruption. Rather than introducing a separate platform, Gemini HR was designed to integrate within existing Google Workspace tools, including Google Docs, Forms, Sheets, and Drive, enabling HR, leadership, and hiring teams to collaboratively review candidate reports, rankings, and evaluation insights within familiar workflows.

Throughout the engagement, the objective was to support human decision-making rather than replace it. The platform surfaces structured, data-informed insights, while hiring decisions remain guided by human judgment.

Execution

The platform was developed through structured stakeholder discovery, Agile collaboration processes, and cross-functional coordination.

400+

participants in the Google AI for Collaboration Challenge

100+

stakeholder interviews conducted in one month

3

time zones aligned across product, engineering, and business

Stakeholder discovery and requirements analysis. More than 100 stakeholder interviews in one month were conducted to define operational requirements, identify risks, validate assumptions, and shape the product direction based on real recruitment and organizational challenges.

Agile and iterative execution. The engagement followed an iterative and incremental Agile approach, enabling rapid prototyping, continuous feedback loops, and ongoing refinement throughout development. Cross-functional collaboration between engineering, product, and business stakeholders allowed the team to quickly validate ideas, adapt priorities, and improve workflows based on stakeholder input and evolving requirements.

Cross-functional coordination across time zones. Priorities were coordinated across engineering, product, and business functions operating across three time zones, translating a broad market problem into a focused and buildable solution while maintaining alignment across distributed teams.

Concept demonstration and workflow validation. The team demonstrated the platform through AI-generated avatars, walking through the complete workflow, from defining organizational cultural benchmarks and generating assessments to analyzing candidate responses through NLP and producing fit reports, all integrated within the Google Workspace environment.

Impact

Validated Market Challenge

The platform addressed a verified, high-cost operational challenge supported by current industry research:

89% of hiring failures are driven by attitude and cultural misalignment, not technical ability (Leadership IQ)
46% of new hires fail within their first 18 months (Leadership IQ)
60% of annual salary typical cost of replacing a poor cultural fit (SHRM)
50%–200% of annual salary total cost to replace an employee, depending on role and seniority (SHRM)
$1 trillion estimated annual cost of voluntary turnover to U.S. businesses (Gallup)
75% of voluntary turnover is considered preventable (Work Institute, 2025)

At the same time, many organizations continue to rely on traditional hiring systems optimized primarily for resume filtering and workflow management rather than evaluating long-term organizational and cultural alignment.

Functional Platform Outcome

Gemini HR demonstrated a viable end-to-end approach to cultural-fit hiring, helping move candidate evaluation from subjective judgment toward more structured and data-informed analysis within the Google Workspace ecosystem.

The platform combined:

Organizational benchmark definition
AI-generated candidate assessments
NLP-driven response analysis
Structured candidate comparison and reporting
Collaborative hiring workflows across HR and leadership teams

Market Positioning and Scalability

The platform was designed for the Google Workspace ecosystem, serving more than 12 million small and medium-sized businesses worldwide, organizations that often face enterprise-level hiring and retention challenges without dedicated assessment infrastructure.

Its modular design enables future adaptation across:

Talent acquisition
Internal mobility
Workforce alignment
Organizational culture assessment
Team effectiveness analysis

The concept was developed and presented as part of the Google "AI for Collaboration" Challenge, where the team was competitively selected from more than 400 participants.

Note: "Gemini HR" was a prototype developed by the team as part of the Google "AI for Collaboration" Challenge and is not a Google product.

How can governments accelerate digital transformation when public infrastructure was never designed for digital service delivery?

A national GovTech digital transformation initiative that modernized disability assessment and public-assistance delivery for civilians and veterans.

Stakeholder

A post-Soviet government undergoing a national digital transformation of social services for veterans and civilians with disabilities following the 2021 regional conflict.

Scope

Government Digital Transformation (GovTech) · Digital Public Infrastructure (DPI) · Data Governance & Interoperability · API-First Integration · Citizen-Centered Service Delivery

Context

A post-Soviet country's history of regional conflict and geopolitical tension created an urgent need to modernize public support systems for veterans and civilians affected by the conflict.

In the aftermath of the war, the government prioritized a GovTech SaaS platform designed to assess disability benefits eligibility using the World Health Organization's International Classification of Functioning, Disability and Health (ICF) standards. The platform would determine qualification for targeted government financial and medical assistance.

The engagement represented one of the country's first large-scale government digitalization initiatives, transforming a traditionally manual and fragmented public-service process into a centralized digital platform.

The initiative operated at the intersection of two difficult realities: an urgent need to deliver support services quickly to a vulnerable post-war population, and a national data infrastructure that had never been built for modern digital delivery. Much of the country's historical population data remained paper-based or stored across fragmented, ad-hoc systems, creating significant challenges for large-scale digital transformation and service modernization.

The Challenge

The objective was to launch a centralized national digital platform capable of accurately, securely, and efficiently processing disability assessments while operating within decades of fragmented, low-quality, and incomplete legacy data.

Three constraints shaped the engagement.

Poor and fragmented legacy data. Years of paper-based and ad-hoc record-keeping meant the historical data supporting the new system was incomplete, inconsistent, and often unreliable, creating direct risk to assessment accuracy and citizen outcomes.

Migration into a modern digital infrastructure. Restructuring and migrating legacy records into a centralized platform across incompatible data formats became one of the project's primary technical and operational challenges, with high stakes for both data integrity and security.

National-scale urgency under crisis conditions. As the platform was intended to become the primary pathway for citizens applying for disability assessment and government support, the system needed to launch on time and operate reliably from day one, with minimal tolerance for service disruption, migration errors, or processing delays affecting a vulnerable post-war population.

The Approach

I led the end-to-end lifecycle of the digital transformation initiative from strategy through execution, structured around four operating principles.

Research-led scoping over full migration. Rather than assuming the entire historical dataset needed to be migrated, I began with user research and identified that nearly 80% of target users — individuals with disabilities following the war — were under the age of 30. This shifted the migration strategy significantly: a full historical migration was unnecessary, and a more focused approach would reduce both operational risk and implementation complexity.

Live, on-demand data migration with middleware. Instead of migrating all historical records upfront, I introduced a live migration model in which data would migrate dynamically as each applicant's case was processed. Middleware was developed to restructure incompatible legacy data formats in real time before insertion into the new system, reducing the risks associated with large-scale bulk migration.

Interoperability and the "once-only" principle. To improve citizen experience and reduce repetitive data collection, I applied the "once-only" principle — not requesting information from citizens that already existed within government systems. RESTful API integrations enabled data aggregation from other government databases, improving efficiency, reducing manual errors, and streamlining application processing.

Data governance through a single source of truth. By mapping data ownership and defining accountability across agencies, I introduced a "single source of truth" model that clarified responsibility for maintaining data accuracy and integrity. This improved both data quality and security across the broader government ecosystem, not only within the platform itself.

Throughout the engagement, the objective was to modernize critical public services around the citizen, reducing operational burden on a vulnerable population while strengthening the long-term reliability of government digital infrastructure.

Execution

The platform was delivered through research-led planning, targeted technical implementation, and cross-agency coordination.

Discovery and demographic research. User research established that the majority of applicants were under the age of 30, significantly reshaping the migration strategy away from full historical transfer toward a targeted, demand-driven approach.

Targeted data architecture. A live, on-demand migration model and custom middleware were implemented to ingest, restructure, and migrate legacy records dynamically as individual applications were processed into the new system.

Cross-agency interoperability. RESTful API integrations aggregated verified information from other government databases, operationalizing the "once-only" principle and reducing redundant citizen data collection across agencies.

Governance and accountability model. Data mapping and clearly defined ownership structures established a single-source-of-truth model, improving accountability for data quality, strengthening security practices, and creating clearer governance across participating institutions.

Impact

86% → 21%

Data collection errors reduced via on-demand migration

+96%

Faster citizen service delivery, no in-person visits

3 months

Delivered against a one-year scope, ~75% compression

Validated Operational Outcomes

The targeted modernization approach delivered measurable improvements compared to a traditional full-migration model:

Data collection errors were reduced from 86% to 21% through live, on-demand migration rather than full historical migration
Citizen service-delivery speed increased by 96%, eliminating the need for citizens to repeatedly visit government agencies in person
The platform was delivered in three months against an original one-year timeline, a roughly 75% compression, without compromising data integrity or launch reliability, despite the operational complexity and urgency of the post-war environment

National-Scale Service Transformation

The platform is now live and serves as the centralized national system through which citizens apply for disability assessment scores, enabling the government to provide targeted financial and medical assistance to veterans and civilians affected by the conflict.

Durable Data Governance

By establishing clear ownership structures and a single-source-of-truth model across agencies, the initiative improved not only the platform's own data quality and security but also the accountability and integrity of the broader government data ecosystem, creating a reusable foundation for future public-sector digitalization initiatives.

Each of these case studies reflects the work behind the results. Partnerships where strategy met execution, and the frameworks, systems, and people behind initiatives that turned complex challenges into measurable outcomes.

Stakeholders

Scope

Context

The Challenge

The Approach

Execution

Impact

Market and Positioning

Stakeholder

Scope

Context

The Challenge

The Approach

Execution

Impact

Stakeholder

Scope

Context

The Challenge

The Approach

Execution

Impact

Market Positioning and Scalability

Stakeholder

Scope

Context

The Challenge

The Approach

Execution

Impact

SK