Vantage: оценка будущих компетенций через ИИ-симуляции
The article discusses Vantage, a research experiment for assessing future-ready skills by leveraging generative AI to create conversations in simulated environments. It emphasizes the importance of durable human competencies such as critical thinking, collaboration, and creative thinking—skills that remain valuable regardless of technological shifts or automation. Developed in partnership with pedagogy experts and researchers from New York University, Vantage offers high school and college students a sandbox environment for practice and validated assessment, built with the same systematic methodology traditionally used for core academic subjects like math or science. Vantage is now available in English for sign-up on Google Labs.
At the heart of any effective learning process is feedback and assessment. In global education systems, what is measured is often what is taught. Future-ready skills, however, are notoriously hard to measure. Typical tests are too rigid to capture people’s thought processes and interactions and are far removed from how these skills are used in the real world. While testing these skills in real human interactions would be ideal, it is also too resource-intensive and hard to standardize and grade consistently across many students.
The experimental setup in Vantage places learners in dynamic, multi-party conversations with AI avatars working together to complete tasks. An Executive LLM uses a provided assessment rubric to steer the AI avatars toward an effective assessment, constantly analyzing the state of the conversation to dynamically introduce specific challenges—such as pushing back on an idea or introducing a conflict—providing the learner with targeted opportunities to demonstrate their skills. Upon completion, an AI Evaluator analyzes the conversation transcript against the same assessment rubric to identify and measure specific evidence of skill application. The learner then receives a detailed skill map, consisting of a visual score and qualitative feedback specific to the skills demonstrated during the conversation.
To ensure academic and pedagogical rigor, a research partnership was established with New York University. A joint study with 188 testers aged 18–25 from the US assessed sample collaboration skills: conflict resolution and project management. The study evaluated whether the Executive LLM could successfully steer conversations to produce high-density information about assessed skills, and whether the AI Evaluator’s scores matched those of human experts. Results showed high agreement between the AI Evaluator and human raters, comparable to inter-human rater agreement. A second collaboration with OpenMic tested the AI Evaluator on creativity and English language arts, analyzing 180 students’ work on creative multimedia tasks; a high correlation (Pearson’s r = 0.88) was observed between AI and human expert scores.
Looking ahead, this kind of simulated environment could enable a measurable “skills layer” atop existing curricula, integrated into academic tasks. For example, students could debate social science topics with AI avatars or take on the role of a team lead planning a laboratory experiment, receiving feedback on both subject-matter understanding and skills such as collaboration and critical thinking. The research aims to transform future-ready skills from hard-to-measure to measurable at scale, supporting broader ecosystem research on how different pedagogical interventions shape human competencies over time.
Авторизуйтесь, чтобы оставить комментарий.
Нет комментариев.
Тут может быть ваша реклама
Пишите info@aisferaic.ru