Shikshasaathi: AI Recap Agent

Shiksha Saathi is a voice-based AI learning companion that helps students learn in their regional language by allowing them to talk to it like a teacher, with responses in their native tongue. It aims to reduce language barriers, support teachers in managing classrooms, and provide students with a personalized, accessible way to reinforce learning without depending on English-only content.

Authors: Soham Mondal, Farmaan Elahi, Soumodeep Maity, Ankit Dhapke, Karthik Padmanabhan, S Swathi Dharshna
Project Demo: https://shiksha-saathi.intelligentagents.co/
GitHub Repository: https://github.com/intelligentagentsco/shiksha-sathi
License: MIT
Solution partners: Google For Developers, Google For Education, Hope Foundation

1. Problem Statement

Students in government schools face two major learning challenges. First, most educational content and AI tools are only available in English, making it difficult for Kannada-speaking students to learn effectively. These students understand concepts much better when explained in their mother tongue, but they're stuck with English materials that create barriers instead of removing them.

Second, teachers are overwhelmed with repetitive tasks and struggle to give individual attention to all students in large classrooms. They spend significant time answering the same questions repeatedly, leaving less time for creative teaching and personalized support.

Students don't know about AI models or technical prompts - they just want to learn without language barriers holding them back.

1.1 The Solution:

We created Shiksha Saathi as a simple web-based solution where students talk naturally in Kannada and get immediate help with their studies. No technical knowledge needed - just speak and learn

For Students:

  • Ask questions by voice in Kannada about any lesson

  • Get clear explanations in their native language

  • Build confidence to ask more questions without English barriers

  • Learn at their own pace with a patient AI tutor

For Teachers:

  • AI handles repetitive questions, freeing up classroom time

  • Focus on creative teaching and complex student interactions

  • Support diverse learning needs without being overwhelmed

  • Use technology without needing technical expertise

The system works seamlessly: students speak their questions in Kannada, an intelligent agent processes their natural language, and responds with clear explanations in their preferred language.

1.2 Technical Challenges

System Components

Shiksha Saathi AI Recap Agent consists of four interconnected parts working together seamlessly. The voice input processing component handles speech-to-text conversion, while the intelligent agent powered by Gemini Pro generates appropriate AI responses. The system then delivers clear text output to users through an intuitive web interface designed for easy access.

Key Technical Challenges:

Voice Processing:

One of our primary technical hurdles involves accurate Kannada speech recognition that can handle the diverse regional accent variations found across Karnataka. Different districts have distinct pronunciation patterns and colloquial expressions, requiring our system to adapt dynamically to these linguistic nuances without losing comprehension accuracy. Additionally, we needed to achieve real-time processing capabilities that enable natural conversation flow, ensuring students don't experience frustrating delays that would break the interactive learning experience. The system must process voice input, convert it to text, analyze the educational content, generate appropriate responses, and deliver them back to students within seconds to maintain the feel of a natural classroom discussion.

AI Agent Complexity:

Educational AI Model Selection: Gemini 2.5 Pro + LearnLM
We chose the combination of Gemini 2.5 Pro and LearnLM for specific educational advantages that address the unique challenges of multilingual learning environments.

Why Gemini 2.5 Pro:

We selected Gemini 2.5 Pro as our primary AI engine because it excels in multilingual educational environments where students need seamless communication in their native language while accessing complex academic content.

  • Superior multilingual capabilities for accurate Kannada language processing

  • Advanced context retention across extended conversations

  • Robust natural language understanding for educational queries

  • High-performance reasoning for complex concept explanations

Why LearnLM Integration:

We integrated LearnLM as our specialized educational layer because traditional AI models, while powerful, aren't designed with classroom dynamics and teaching principles in mind. LearnLM bridges this gap by bringing education-specific intelligence that understands how students actually learn, ensuring our platform delivers content that resonates with both students and educators in meaningful ways.

  • Specifically fine-tuned for educational content delivery and teaching methods

  • Understands learning progression and age-appropriate explanations

  • Optimized for different learning styles and educational frameworks

  • Built-in safety measures for student interactions

Combined Benefits:

The integration creates a powerful synergy where Gemini 2.5 Pro handles the conversational intelligence and complex language processing requirements, while LearnLM ensures that every response follows educational best practices and maintains curriculum alignment. Together, they generate responses that are not only linguistically accurate and culturally appropriate but but they're delivered in ways that actually help students learn better.

Additional Complexity Challenges:

Creating context-aware educational responses while maintaining conversation continuity presents significant technical challenges. The system must adapt in real-time to different student comprehension levels, recognizing when a student needs simpler explanations or more detailed examples. Ensuring curriculum-appropriate and age-suitable content delivery requires constant calibration, as the AI must understand not just what information is correct, but what information is appropriate for each grade level. Perhaps most challenging is balancing technical accuracy with simplified explanations, ensuring that complex scientific or mathematical concepts remain factually correct while being accessible to young learners.

Infrastructure:

The system must handle scaling challenges for multiple simultaneous users across different schools, each potentially having dozens of students asking questions concurrently. Additionally, ensuring reliable performance with limited rural internet connectivity requires careful optimization of data transmission and local caching strategies to maintain functionality even when bandwidth is constrained.

2. Solution Architecture

2.1 System Overview

The AI Recap Agent employs a microservices architecture built on Google Cloud Platform, leveraging multiple AI services for comprehensive educational support.

Shikshasaathi architecture: From query to learning support

System Architecture Overview

The diagram illustrates the complete flow of the Shiksha Saathi AI Recap Agent, showing how student voice interactions are processed through multiple AI agents and integrated services.

User Interaction Flow

The system begins when a user interacts through the web-based UI, which handles voice input, chat interface functionality, and session management. Students can speak naturally in Kannada, and the system maintains context across their entire learning conversation.

Voice Processing Pipeline

When students ask questions verbally, their voice input is captured and sent to Google's Speech Recognition API, which converts the spoken Kannada into text format. This text is then processed by the system to understand the student's educational query.

Dual Agent Architecture

The system employs two specialized AI agents running within the Vertex AI Agent Engine using Google's Agent Development Kit (ADK):

The title_summarizer_agent handles conversation management by generating appropriate titles for learning sessions based on the message content. This agent uses the Gemini-2.5-pro model to create meaningful summaries that help organize and recall previous conversations.

The recap_agent serves as the primary educational assistant, processing student questions about topics that have been taught by their subject teachers. This agent also runs on Gemini-2.5-pro but includes additional tool integration with Google Spreadsheets to access curriculum data.

Curriculum Integration

The recap_agent connects directly to Google Spreadsheets containing structured educational data about what topics have been covered in each class, which subjects are being taught, and which teachers are responsible for different content areas. This ensures the AI only helps students with material that has actually been taught in their classroom.

Session Management

The system maintains conversation continuity through Vertex AI's managed session store, which preserves context from past conversations. This allows students to build upon previous questions and enables the AI to provide more personalized, contextual responses over time.

Intelligent Context Handling

The architecture shows how the system retrieves past conversation history to maintain educational context, ensuring that the AI understands not just the current question but the student's ongoing learning journey and previous interactions with the system.

2.2 Core Components

Core Language Models

Gemini 2.5 Pro drives the agent's primary intelligence, enabling context-aware responses and advanced natural language understanding capabilities that allow students to communicate naturally in their preferred language.

LearnLM has been custom-tuned for grade- and subject-specific educational delivery, enhancing personalization by understanding the unique requirements of different academic levels and learning contexts.

Prompt Engineering Framework

The PARTS Framework provides structured prompt design to ensure consistency across all interactions, example-based guidance that helps the AI understand educational contexts, and behavioral alignment that keeps responses appropriate for student learning environments.

Agent Orchestration Layer
Primary Agent: Handles user input, context management, and response generation
Secondary Agent: Summarizes conversations and auto-generates titles for recall/searchability

Google Cloud Integration

Gemini 2.5 Pro is accessed through Vertex AI, Google's unified machine learning platform. This provides enterprise-grade security, scalability, and seamless integration with other Google Cloud services. Vertex AI handles model deployment, version management, and automatic scaling based on usage patterns.

Google ADK serves as the bridge connecting our AI agent with Google services including Docs, Sheets, and other educational tools that teachers already use in their daily workflows.OAuth 2.0 manages secure authentication for resource access, ensuring that student data and educational content remain protected while allowing seamless integration with school systems.
For example, our system integrates directly with Google Sheets containing structured data about topics, subjects, and other educational metadata that teachers maintain as part of their curriculum planning.

Session & Data Management

Session Logic:
A session represents a continuous conversation between a student and the AI agent. Using Vertex AI's session management, the system maintains context across multiple interactions, remembering previous questions and building upon them. Sessions are automatically updated through Vertex AI's managed service, which handles state persistence, conversation history, and context window management. This ensures students can have natural, flowing conversations without repeating context.

Google Sheets Integration: Manages academic metadata—topics, subjects, teachers, and class mappings centrally.
Shiksha Sathi Data Template

Shikshasaathi - Academic metadata sheet

How Teachers use this templateThis Google Sheet template helps teachers keep track of what they've taught in class. Teachers simply fill in five columns: their class students are from, the subject they taught, the specific topic covered, their name, and whether they finished teaching that topic completely (Yes/No). This way, the AI system knows exactly what has been covered in each class and can help students with those specific topics.

3. Educational Impact Case Study

3.1 Early Observations

Our initial deployment revealed high student engagement with the voice-based learning interface, particularly among students who previously struggled with English-only educational tools. Students showed a strong preference for receiving explanations in Kannada, with many expressing that concepts became clearer when delivered in their native language. The technical performance remained stable throughout testing, with voice recognition accuracy staying consistently high and minimal latency issues. Students provided positive feedback about concept clarity, noting that the AI's explanations felt more natural and easier to understand compared to traditional English-based learning materials.

3.2 Research Hypotheses

Primary Hypothesis: Multilingual AI-powered voice agents will significantly improve learning outcomes for regional language speakers by reducing language barriers and increasing engagement.

Secondary Hypotheses:

  • Students will demonstrate higher retention rates when concepts are explained in their native language.

  • Shy or introverted students will participate more actively through voice interactions

  • Teachers will observe improved classroom dynamics and discussion quality

4. Open Source Initiative

GitHub Link: https://github.com/intelligentagentsco/shiksha-sathi
License: MIT

4.1 Why We Chose Open Source

Our open source approach is designed to make AI-powered education available to everyone around the world. We believe technology can bridge learning gaps, but only if it's freely available and adaptable to local needs. By making Shiksha-Saathi open source, we're ensuring that any community, anywhere in the world, can take this foundation and build upon it.

4.2 What We Have Open Sourced

We have made our complete voice processing pipeline available, including all the specialized components needed for regional language support. The AI agent framework integrates seamlessly with both Gemini 2.5 Pro and LearnLM, and we've included our entire session management and conversation handling system. Our Google Sheets integration for curriculum tracking allows teachers to easily manage and update educational content, while the web interface and user experience components ensure students can access the system effortlessly. We've also provided comprehensive documentation and deployment guides to help communities implement and customize the system for their specific needs.

4.3 What We Believe In:

  • Education Equity: Every child deserves access to quality learning tools, regardless of language barriers

  • Community Innovation: Local developers understand their community's needs better than anyone else

  • Collaborative Learning: When one community improves the system, everyone benefits from those enhancements

  • Transparency: Open source ensures the technology can be trusted and understood by educators and institutions


5. Conclusion

The AI Recap Agent represents a significant step forward in democratizing AI-powered education for regional language speakers. By combining Google's advanced AI capabilities with thoughtful educational design, we've created a system that not only bridges the technology gap but also enhances learning outcomes.

5.1 Key Achievements:

  • Successful deployment of voice-enabled AI in Kannada

  • Proven educational impact through rigorous testing

  • Scalable architecture ready for expansion

  • Strong foundation for open-source community development

5.2 Future Impact:

  • Potential to serve millions of regional language speakers

  • Template for similar solutions in other languages

  • Contribution to global educational equity

  • Advancement of multilingual AI research

5.3 Future Possibilities:

The system's architecture provides a strong foundation for implementing assessment while learning capabilities, where the AI can evaluate student understanding.

Expanded language support represents another significant opportunity, as the current Kannada-focused system can be adapted to serve students in other regional languages across India and globally, making quality AI-powered education accessible to millions more students who currently face similar language barriers in their learning journey.

The project's open-source release will enable broader adoption and continuous improvement, fostering a global community dedicated to making quality education accessible to all, regardless of language barriers.


Authors: Soham Mondal, Farmaan Elahi, Soumodeep Maity, Ankit Dhapke, Karthik Padmanabhan, S Swathi Dharshna
Project Demo: https://shiksha-saathi.intelligentagents.co/
GitHub Repository: https://github.com/intelligentagentsco/shiksha-sathi
License: MIT
Solution partners: Google For Developers, Google For Education, Hope Foundation

1. Problem Statement

Students in government schools face two major learning challenges. First, most educational content and AI tools are only available in English, making it difficult for Kannada-speaking students to learn effectively. These students understand concepts much better when explained in their mother tongue, but they're stuck with English materials that create barriers instead of removing them.

Second, teachers are overwhelmed with repetitive tasks and struggle to give individual attention to all students in large classrooms. They spend significant time answering the same questions repeatedly, leaving less time for creative teaching and personalized support.

Students don't know about AI models or technical prompts - they just want to learn without language barriers holding them back.

1.1 The Solution:

We created Shiksha Saathi as a simple web-based solution where students talk naturally in Kannada and get immediate help with their studies. No technical knowledge needed - just speak and learn

For Students:

  • Ask questions by voice in Kannada about any lesson

  • Get clear explanations in their native language

  • Build confidence to ask more questions without English barriers

  • Learn at their own pace with a patient AI tutor

For Teachers:

  • AI handles repetitive questions, freeing up classroom time

  • Focus on creative teaching and complex student interactions

  • Support diverse learning needs without being overwhelmed

  • Use technology without needing technical expertise

The system works seamlessly: students speak their questions in Kannada, an intelligent agent processes their natural language, and responds with clear explanations in their preferred language.

1.2 Technical Challenges

System Components

Shiksha Saathi AI Recap Agent consists of four interconnected parts working together seamlessly. The voice input processing component handles speech-to-text conversion, while the intelligent agent powered by Gemini Pro generates appropriate AI responses. The system then delivers clear text output to users through an intuitive web interface designed for easy access.

Key Technical Challenges:

Voice Processing:

One of our primary technical hurdles involves accurate Kannada speech recognition that can handle the diverse regional accent variations found across Karnataka. Different districts have distinct pronunciation patterns and colloquial expressions, requiring our system to adapt dynamically to these linguistic nuances without losing comprehension accuracy. Additionally, we needed to achieve real-time processing capabilities that enable natural conversation flow, ensuring students don't experience frustrating delays that would break the interactive learning experience. The system must process voice input, convert it to text, analyze the educational content, generate appropriate responses, and deliver them back to students within seconds to maintain the feel of a natural classroom discussion.

AI Agent Complexity:

Educational AI Model Selection: Gemini 2.5 Pro + LearnLM
We chose the combination of Gemini 2.5 Pro and LearnLM for specific educational advantages that address the unique challenges of multilingual learning environments.

Why Gemini 2.5 Pro:

We selected Gemini 2.5 Pro as our primary AI engine because it excels in multilingual educational environments where students need seamless communication in their native language while accessing complex academic content.

  • Superior multilingual capabilities for accurate Kannada language processing

  • Advanced context retention across extended conversations

  • Robust natural language understanding for educational queries

  • High-performance reasoning for complex concept explanations

Why LearnLM Integration:

We integrated LearnLM as our specialized educational layer because traditional AI models, while powerful, aren't designed with classroom dynamics and teaching principles in mind. LearnLM bridges this gap by bringing education-specific intelligence that understands how students actually learn, ensuring our platform delivers content that resonates with both students and educators in meaningful ways.

  • Specifically fine-tuned for educational content delivery and teaching methods

  • Understands learning progression and age-appropriate explanations

  • Optimized for different learning styles and educational frameworks

  • Built-in safety measures for student interactions

Combined Benefits:

The integration creates a powerful synergy where Gemini 2.5 Pro handles the conversational intelligence and complex language processing requirements, while LearnLM ensures that every response follows educational best practices and maintains curriculum alignment. Together, they generate responses that are not only linguistically accurate and culturally appropriate but but they're delivered in ways that actually help students learn better.

Additional Complexity Challenges:

Creating context-aware educational responses while maintaining conversation continuity presents significant technical challenges. The system must adapt in real-time to different student comprehension levels, recognizing when a student needs simpler explanations or more detailed examples. Ensuring curriculum-appropriate and age-suitable content delivery requires constant calibration, as the AI must understand not just what information is correct, but what information is appropriate for each grade level. Perhaps most challenging is balancing technical accuracy with simplified explanations, ensuring that complex scientific or mathematical concepts remain factually correct while being accessible to young learners.

Infrastructure:

The system must handle scaling challenges for multiple simultaneous users across different schools, each potentially having dozens of students asking questions concurrently. Additionally, ensuring reliable performance with limited rural internet connectivity requires careful optimization of data transmission and local caching strategies to maintain functionality even when bandwidth is constrained.

2. Solution Architecture

2.1 System Overview

The AI Recap Agent employs a microservices architecture built on Google Cloud Platform, leveraging multiple AI services for comprehensive educational support.

Shikshasaathi architecture: From query to learning support

System Architecture Overview

The diagram illustrates the complete flow of the Shiksha Saathi AI Recap Agent, showing how student voice interactions are processed through multiple AI agents and integrated services.

User Interaction Flow

The system begins when a user interacts through the web-based UI, which handles voice input, chat interface functionality, and session management. Students can speak naturally in Kannada, and the system maintains context across their entire learning conversation.

Voice Processing Pipeline

When students ask questions verbally, their voice input is captured and sent to Google's Speech Recognition API, which converts the spoken Kannada into text format. This text is then processed by the system to understand the student's educational query.

Dual Agent Architecture

The system employs two specialized AI agents running within the Vertex AI Agent Engine using Google's Agent Development Kit (ADK):

The title_summarizer_agent handles conversation management by generating appropriate titles for learning sessions based on the message content. This agent uses the Gemini-2.5-pro model to create meaningful summaries that help organize and recall previous conversations.

The recap_agent serves as the primary educational assistant, processing student questions about topics that have been taught by their subject teachers. This agent also runs on Gemini-2.5-pro but includes additional tool integration with Google Spreadsheets to access curriculum data.

Curriculum Integration

The recap_agent connects directly to Google Spreadsheets containing structured educational data about what topics have been covered in each class, which subjects are being taught, and which teachers are responsible for different content areas. This ensures the AI only helps students with material that has actually been taught in their classroom.

Session Management

The system maintains conversation continuity through Vertex AI's managed session store, which preserves context from past conversations. This allows students to build upon previous questions and enables the AI to provide more personalized, contextual responses over time.

Intelligent Context Handling

The architecture shows how the system retrieves past conversation history to maintain educational context, ensuring that the AI understands not just the current question but the student's ongoing learning journey and previous interactions with the system.

2.2 Core Components

Core Language Models

Gemini 2.5 Pro drives the agent's primary intelligence, enabling context-aware responses and advanced natural language understanding capabilities that allow students to communicate naturally in their preferred language.

LearnLM has been custom-tuned for grade- and subject-specific educational delivery, enhancing personalization by understanding the unique requirements of different academic levels and learning contexts.

Prompt Engineering Framework

The PARTS Framework provides structured prompt design to ensure consistency across all interactions, example-based guidance that helps the AI understand educational contexts, and behavioral alignment that keeps responses appropriate for student learning environments.

Agent Orchestration Layer
Primary Agent: Handles user input, context management, and response generation
Secondary Agent: Summarizes conversations and auto-generates titles for recall/searchability

Google Cloud Integration

Gemini 2.5 Pro is accessed through Vertex AI, Google's unified machine learning platform. This provides enterprise-grade security, scalability, and seamless integration with other Google Cloud services. Vertex AI handles model deployment, version management, and automatic scaling based on usage patterns.

Google ADK serves as the bridge connecting our AI agent with Google services including Docs, Sheets, and other educational tools that teachers already use in their daily workflows.OAuth 2.0 manages secure authentication for resource access, ensuring that student data and educational content remain protected while allowing seamless integration with school systems.
For example, our system integrates directly with Google Sheets containing structured data about topics, subjects, and other educational metadata that teachers maintain as part of their curriculum planning.

Session & Data Management

Session Logic:
A session represents a continuous conversation between a student and the AI agent. Using Vertex AI's session management, the system maintains context across multiple interactions, remembering previous questions and building upon them. Sessions are automatically updated through Vertex AI's managed service, which handles state persistence, conversation history, and context window management. This ensures students can have natural, flowing conversations without repeating context.

Google Sheets Integration: Manages academic metadata—topics, subjects, teachers, and class mappings centrally.
Shiksha Sathi Data Template

Shikshasaathi - Academic metadata sheet

How Teachers use this templateThis Google Sheet template helps teachers keep track of what they've taught in class. Teachers simply fill in five columns: their class students are from, the subject they taught, the specific topic covered, their name, and whether they finished teaching that topic completely (Yes/No). This way, the AI system knows exactly what has been covered in each class and can help students with those specific topics.

3. Educational Impact Case Study

3.1 Early Observations

Our initial deployment revealed high student engagement with the voice-based learning interface, particularly among students who previously struggled with English-only educational tools. Students showed a strong preference for receiving explanations in Kannada, with many expressing that concepts became clearer when delivered in their native language. The technical performance remained stable throughout testing, with voice recognition accuracy staying consistently high and minimal latency issues. Students provided positive feedback about concept clarity, noting that the AI's explanations felt more natural and easier to understand compared to traditional English-based learning materials.

3.2 Research Hypotheses

Primary Hypothesis: Multilingual AI-powered voice agents will significantly improve learning outcomes for regional language speakers by reducing language barriers and increasing engagement.

Secondary Hypotheses:

  • Students will demonstrate higher retention rates when concepts are explained in their native language.

  • Shy or introverted students will participate more actively through voice interactions

  • Teachers will observe improved classroom dynamics and discussion quality

4. Open Source Initiative

GitHub Link: https://github.com/intelligentagentsco/shiksha-sathi
License: MIT

4.1 Why We Chose Open Source

Our open source approach is designed to make AI-powered education available to everyone around the world. We believe technology can bridge learning gaps, but only if it's freely available and adaptable to local needs. By making Shiksha-Saathi open source, we're ensuring that any community, anywhere in the world, can take this foundation and build upon it.

4.2 What We Have Open Sourced

We have made our complete voice processing pipeline available, including all the specialized components needed for regional language support. The AI agent framework integrates seamlessly with both Gemini 2.5 Pro and LearnLM, and we've included our entire session management and conversation handling system. Our Google Sheets integration for curriculum tracking allows teachers to easily manage and update educational content, while the web interface and user experience components ensure students can access the system effortlessly. We've also provided comprehensive documentation and deployment guides to help communities implement and customize the system for their specific needs.

4.3 What We Believe In:

  • Education Equity: Every child deserves access to quality learning tools, regardless of language barriers

  • Community Innovation: Local developers understand their community's needs better than anyone else

  • Collaborative Learning: When one community improves the system, everyone benefits from those enhancements

  • Transparency: Open source ensures the technology can be trusted and understood by educators and institutions


5. Conclusion

The AI Recap Agent represents a significant step forward in democratizing AI-powered education for regional language speakers. By combining Google's advanced AI capabilities with thoughtful educational design, we've created a system that not only bridges the technology gap but also enhances learning outcomes.

5.1 Key Achievements:

  • Successful deployment of voice-enabled AI in Kannada

  • Proven educational impact through rigorous testing

  • Scalable architecture ready for expansion

  • Strong foundation for open-source community development

5.2 Future Impact:

  • Potential to serve millions of regional language speakers

  • Template for similar solutions in other languages

  • Contribution to global educational equity

  • Advancement of multilingual AI research

5.3 Future Possibilities:

The system's architecture provides a strong foundation for implementing assessment while learning capabilities, where the AI can evaluate student understanding.

Expanded language support represents another significant opportunity, as the current Kannada-focused system can be adapted to serve students in other regional languages across India and globally, making quality AI-powered education accessible to millions more students who currently face similar language barriers in their learning journey.

The project's open-source release will enable broader adoption and continuous improvement, fostering a global community dedicated to making quality education accessible to all, regardless of language barriers.