Flight Deck to Desktop: Managing AI’s Last-Mile Value

In the early jet age, a British Overseas Airways Corporation VC 10 would leave Heathrow with a cockpit crew of four: Captain, First Officer, Flight Engineer, and Navigator. Doppler navigation, inertial reference systems and flight management computers automated most of their duties within two decades, and by the early 1990s European regulators accepted that two person crews were sufficient so long as those pilots could orchestrate an increasingly intricate lattice of digital systems (ICAO, 2006). Automation removed physical effort but raised cognitive demand because pilots became system-managers as well as aviators; they had to diagnose autopilot modes, cross check computer outputs and recover from rare but brittle failures1.

The flight deck therefore offers a vivid metaphor for knowledge workers confronting generative artificial intelligence (AI) today. Chatbots and agents can produce reports, code and customer emails with super human speed, but the finished products rarely land neatly on a customer’s doorstep. Economists at the Brookings Institution call this organisational bottleneck the ‘last mile problem’: algorithms deliver a parcel to the kerb, yet costly customisation, oversight and integration are required before the value is realised (Fleming et al., 2024). As with aviation, automation is not a one way ticket to efficiency; the residual human mile is where value density is highest and risk concentration greatest.

McKinsey’s Lilli and sociotechnical Fit

As a leader in global management thinking, McKinsey & Company provides one of the most studied early examples of generative AI adoption. In 2025 the firm’s proprietary chatbot Lilli was being used by more than 70 per cent of the firm’s 45,000 employees, with typical users engaging with the system about 17 times per week and reporting roughly 30 per cent reduction in research time (Varanasi, 2025). These numbers underline that AI can release capacity, but McKinsey partners soon discovered that junior consultants were inadvertently forwarding hallucinated facts and inconsistent narratives to more senior reviewers. The supposed productivity gains were clawed back during lengthy verification cycles. In response, the firm created an ‘AI deck supervision’ function to own prompt libraries, style guides, and escalation procedures. Once those social roles were formalised, throughput improved, and the promised time savings became real. This episode illustrates the sociotechnical systems perspective pioneered by Eric Trist and Fred Emery at the Tavistock Institute in the 1950s. Their research into coal mining showed that the performance of any work system can only be understood and improved when its social and technical components are treated as interdependent parts of a complex system (Psych Safety, 2023). Sociotechnical theory emphasises ‘joint optimisation’ - the idea that the overall system can only be optimal when neither the technical nor the social subsystem is pushed to its own theoretical maximum (Psych Safety, 2023). At McKinsey the technology was optimised, but the human roles remained unaltered, producing a bottleneck. Redesigning roles to include prompt engineering, fact checking and ethical oversight allowed the joint system to function more smoothly.

The Tavistock tradition offers more than metaphors. Albert Cherns, one of the movement's leading thinkers, articulated design principles that remain relevant to AI. Although these principles were developed for factory floors, their spirit still applies to digital systems. One principle is compatibility: any new technology must align with the values and needs of the workers who will use it. Another is minimal critical specification: designers should define only what is essential and leave room for local autonomy so that workers can adapt tools to context. Variance control suggests that deviations should be corrected at their source, not deferred to a distant authority, encouraging human oversight at the edge of AI systems. Boundary specification and information flow highlight the importance of clear interfaces and accessible information. Support congruence and transitional organisation remind leaders that incentives, training and organisational structures must evolve alongside technology. Finally, Cherns stressed that design is never complete: organisations must treat sociotechnical systems as perpetually unfinished works. In the AI era these principles translate into continuous learning loops, joint optimisation and humility. They caution against over fitting models to narrow metrics at the expense of worker agency and ethics.

Task bifurcation at Duolingo: distributing judgement

The McKinsey case shows that automating one part of the process often displaces work rather than removing it. This dynamic is central to the task based approach to automation advanced by Daron Acemoglu and Pascual Restrepo, who argue that technologies substitute for particular tasks while simultaneously creating new, more complex tasks for humans (Acemoglu and Restrepo, 2020). Economists call this ‘task bifurcation’: low complexity or routine subtasks are automated, whereas judgement intensive work shifts to human specialists. Duolingo’s 2025 strategy memo offers an illustrative real world example. In April 2025 the company’s co founder Luis von Ahn described Duolingo as becoming an ‘AI first’ organisation and explained that work which could be done by AI would gradually be removed from the purview of contractors and freelancers. The memo further stated that the use of AI would become part of hiring, performance reviews and promotions (Public Services Alliance, 2025). The announcement sparked backlash from educators and some staff concerned about language quality and job security. Marketing Interactive (2025) reported that the company soon clarified its intent: to empower employees and accelerate course creation rather than replace people outright. With human oversight Duolingo managed to build one hundred new courses in less than a year, whereas the first one hundred courses took a decade to produce. The company also emphasised that it would continue hiring staff and providing training and workshops. Significantly, however, the firm did cut just under ten per cent of its external contractors in early 2024 (Marketing Interactive, 2025), evidence that some tasks can be fully automated. Duolingo’s case supports the task bifurcation thesis: the generation of basic translations and exercises may be delegated to AI models, but semantic stewardship (ensuring register, cultural nuance and pedagogic alignment) remains with highly skilled linguists. The skills bar rises, not falls, as automation expands.

Boundary resources: making hand offs explicit

Even when tasks are correctly divided, friction can arise at the hand off between humans and machines. Ghazzawi, Yang and Cameron (2023) describe such interfaces as ‘boundary resources’: the protocols, documentation and tools that allow human and technical components to exchange artefacts without excessive latency or misunderstanding. Without clear boundary resources the ‘last-mile’ becomes congested; documents are returned with unanswered questions, or models propagate misunderstood signals. 

Magaya, a logistics software company, illustrates how boundary resources can institutionalise such cooperation. Its implementation of human-in-the-loop (HITL AI) is supported by a customs data playbook that standardises field names, units and escalation paths. Although Magaya’s specific metrics are proprietary, the general pattern holds: published checklists and glossaries function like an aeroplane flight manual, turning procedural knowledge into operational wisdom and preventing repetitive hand off errors. In sociotechnical terms, boundary resources are the connective tissue that stops the organisational ‘aircraft’ from stalling when the autopilot hands control back to the pilot.

A 2024 article in SupplyChainBrain explains that HITL AI combines the power of AI with human expertise. Freight forwarding offers another telling example; systems use to AI collect and process data, and suggests optimal routes, prepare documentation or flag possible compliance risks, but human operators review and validate these outputs, injecting domain knowledge and context (Lillemets, 2024). The human feedback is then used to refine the models, creating a continuous improvement loop. The article notes that HITL AI enhances efficiency and accuracy while preserving critical judgement in areas such as risk management, client relationships and regulatory compliance. 

No flight deck can function safely without reliable sensors and navigation data. The same is true for AI. Deloitte's Tech Trends 2025 warns that "bad inputs lead to worse outputs - garbage in, garbage squared" (Buntz, 2024). According to Deloitte, three quarters of organisations surveyed have increased investments in data life-cycle management because of AI, recognising that robust data governance is a prerequisite for generative systems. Take LIFT Impact Partners, a social enterprise helping Canadian immigrants process paperwork, which fine-tuned its AI assistants on focused, domain-specific data rather than scraping the open internet. By narrowing the training domain and investing in governance, the organisation achieved dramatic efficiency gains while avoiding the hallucinations and biases that plague generalist models. Deloitte's futurists argue that trust must precede technology: organisations cannot simply build something cool and then retrofit governance; they must train AI on data that represents the future they want, otherwise, they codify past inequities. For managers, this means that last-mile planning must include data stewardship and ethical review. Without solid foundations, boundary resources become brittle, and the automation paradox intensifies.

The automation paradox: when robots need humans

Deep automation is seductive because it promises to eliminate human labour and error, yet the history of automation is riddled with paradoxes. Deloitte’s Tech Trends 2025 warns that while AI can lighten workloads on the surface, it often introduces architectural complexity that requires highly trained specialists to oversee the system (Buntz, 2024). The same article quotes Deloitte’s futurist Mike Bechtel, who notes that organisations cannot ‘shrink their way to success’: automating tasks frees workers for higher value activities rather than removing the need for them. This dynamic is sometimes called the automation paradox: the safer and more reliable we make routine operations with automation, the more critical residual human interventions become because failures are rare but high impact. Ocado Group’s on grid robotic pick system demonstrates the paradox vividly. Each robotic arm can theoretically pick approximately 630 units per hour; more than 50 arms were in operation across Ocado’s fulfilment centres as of 2025 (Ocado, 2025). Yet Ocado reports that a small team of remote supervisors can monitor dozens of arms, intervening should barcode errors occur to prevent cascading failures. Headcount is far lower than in a traditional warehouse, but the cognitive load on the remaining staff is higher because they manage rare events with enormous leverage. The case underscores that automation does not eliminate risk; it changes its distribution. Managing the last-mile therefore involves preparing people for rare, high impact interventions rather than routine tasks.

Measuring cognitive load and resilience

Time to decision and queue age are leading indicators, but managers must understand the team’s cognitive experience. The NASA Task Load Index (NASA TLX) offers a pragmatic tool. Developed by Hart and Staveland in the 1980s, the TLX measures workload across six dimensions - mental demand, physical demand, temporal demand, performance, effort and frustration - and allows participants to weight each dimension according to perceived importance (Hart and Staveland, 1988). In aviation and other high risk industries the TLX is widely used to compare automation levels and diagnose situations where tools reduce physical workload but increase mental workload, such as when pilots monitor multiple display modes at once (Causse et al., 2025). Software teams can adapt the survey, replacing physical demand with coordination effort and by administering it after sprints. By tracking TLX scores alongside operational metrics, managers can verify that automation is easing cognitive strain rather than simply reshuffling it. If mental demand and frustration rise as throughput increases, the organisation has a sociotechnical imbalance that needs addressing.

Evidence from software development

Broader surveys of knowledge work provide additional evidence that AI can save time without removing organisational bottlenecks. Atlassian’s 2025 State of Developer Experience report surveyed 3,500 software developers across six countries and found that 68 per cent of developers using generative AI report saving more than ten hours per week, yet half of developers still lose more than ten hours a week because of non technical blockers such as fragmented workflows and poor collaboration (Atlassian, 2025). Fully 90 per cent reported losing at least six hours per week to such friction. The survey notes that code generation occupies only about 16 per cent of the work week; the majority of time is spent on planning, documentation, testing and communication. These findings reinforce the last mile problem: AI speeds up the creation of artefacts, but without corresponding changes in process design and organisational culture the time savings evaporate. They also highlight the importance of cross functional alignment and boundary resources. Without a shared roadmap, definitions, and escalation paths, digital agents simply accelerate the production of work in progress waiting in other teams’ queues.

Integrating the lenses: A sequencing principle

The flight deck metaphor ties together these empirical cases and theories. Sociotechnical fit answers the question of who will fly each phase of the journey. Task bifurcation determines which controls remain human and which can be handed to the autopilot. Boundary resources are the checklists and maps that make hand offs crisp. The automation paradox reminds us to keep a pilot in the loop even when the autopilot does most of the work. Sequencing these lenses rather than applying them in isolation reveals a strategic architecture for AI roll-out. First, diagnose sociotechnical fit: identify where AI outputs queue and who owns each leg of the journey. Second, re bundle work: appoint a ‘Captain’ accountable for the last mile and empowered to act. Third, codify boundary resources: publish data dictionaries, style guides and escalation paths that allow humans and machines to transact efficiently. Fourth, monitor residual complexity: track exception logs and verification times rather than raw throughput, and use instruments like the NASA Task Load Index to gauge cognitive load in teams (Hart and Staveland, 1988). Finally, rehearse rare failure modes: deliberately throttle APIs or inject corrupted data so that teams practise recovery before a real crisis occurs. These steps mirror aviation’s flight plan, take off, cruise, approach and emergency drill regimes. They ensure that AI systems remain not only efficient but also resilient.

While aviation and software offer clear examples, the last-mile problem extends to many fields. Consider health care, where AI radiology assistants can detect tumours more sensitively than junior radiologists but require senior clinicians to verify ambiguous images. Research into human-in-the-loop medical AI shows that collaborative workflows outperform both human-only and AI-only approaches on accuracy and speed. Similar dynamics play out in legal services, where document-review bots flag potential issues but lawyers must apply contextual judgement. In public administration, AI chatbots can triage citizen requests but need human oversight to handle exceptions. In each case the same principles apply: sociotechnical fit determines who acts; task bifurcation allocates routine and judgement; boundary resources codify protocols; and the automation paradox demands preparedness for rare but consequential events. The flight deck may be a metaphor, but its lessons are universal.

Ethics, governance and the future of work

Generative AI raises ethical questions beyond productivity. Biased training data can produce discriminatory outputs; over-reliance on AI can erode skills; surveillance can compromise privacy; and energy-intensive models can harm the environment. Addressing these issues is part of last-mile management. Organisations must include diverse stakeholders in design, monitor outputs for bias and harm, and institute accountability for model decisions. Regulators are developing AI safety frameworks, but managers cannot wait for compliance checklists. They must build internal governance that aligns with their values and industry standards. Ethical considerations also intersect with workforce development: automation will eliminate some tasks but create others. Leaders should invest in reskilling and career mobility so that displaced workers can transition to higher-value roles. As the McKinsey and Duolingo cases show, automation re-bundles work upward; if organisations fail to support employees through that transition, they risk social backlash and talent attrition.

Guidance for the first ninety days

In the coming years generative AI will become as ubiquitous as spreadsheets. The organisations that thrive will not be those that simply adopt the most powerful models but those that master the last-mile. They will view AI adoption as a sociotechnical design challenge, not an IT project; they will invest in data quality, ethics and governance; they will elevate workers into roles that steward machines; they will codify boundary resources and practise recovery drills; and they will measure success by time to decision, cognitive load and customer outcomes, not just throughput. In other words, they will bring pilots back into the cockpit of the digital enterprise.

Managers often ask what to do in the first ninety days of an AI initiative. A cockpit mindset provides concrete guidance. Start by refitting dashboards to emphasise time to decision rather than throughput. Add metrics such as the age of the oldest untouched AI artefact and the minutes spent verifying each artefact. These indicators correlate more directly with customer impact and staff fatigue than raw throughput. Survey tools like NASA TLX can validate whether cognitive load is trending up or down over successive sprints (Hart and Staveland, 1988). Next, nominate and rotate a ‘Captain’. Appoint a high performing team member to own the last-mile for thirty days: pair each hand off with a short retrospective so that lessons compound. Treat this role as mission critical, not administrative, and rotate it so that institutional memory spreads. Publish your operating procedures. Compile a living document that lists data definitions, style rules and escalation numbers. If it cannot be read on a mobile phone during an outage, it is too long. Magaya’s playbook demonstrates that a concise document can halve exception latency. Institutionalise the loop: borrow aviation’s brief fly debrief ritual. For every model deployment conduct a one page pre brief (assumptions and limits), mid shift checks (error flags) and a five minute debrief feeding anomalies back into training data. Finally, stage simulated failures. Quarterly, intentionally corrupt input data or throttle the model’s API key. Measure how long it takes the Captain to notice and how quickly the team restores service. Airline post mortems show that practised crews convert seconds of confusion into rapid recovery (Wise, 2011). Managers who apply these principles will turn AI’s raw horsepower into repeatable, auditable and crisis resilient value.

Conclusion: bringing the pilots back in

To paraphrase a popular saying among test pilots, you cannot beat the laws of aerodynamics, but you can learn to work with them. Automation is subject to similar constraints: it amplifies human intention and system design rather than replacing them. Organisations that treat AI as an augmenting force, invest in the human last-mile and embed continuous learning will harness the technology’s potential without sacrificing safety, ethics or creativity. Those that pursue shortcuts or ignore the social dimension may gain speed temporarily but will ultimately court instability. The next generation of enterprise leaders will be distinguished not by how many bots they deploy but by how skillfully they orchestrate humans and machines.

The aviation metaphor underscores a simple truth: automation does not make humans obsolete; it changes the nature of their work. Pilots no longer plot celestial fixes or balance fuel tanks, yet a single well timed input can avert catastrophe for hundreds of passengers. Similarly, AI is shrinking headcounts and cycle times, but the worth of the residual human mile is soaring. Managers who adopt a cockpit mindset, file flight plans, assign directors, codify manifests, run operating procedures and rehearse failures, will keep one hand lightly on the control column while the autopilot hums. Those who neglect the last-mile may find themselves cruising at altitude until the first unexpected stall warning sounds and no trained human is left to recover the aircraft.


1 Studies of modern flight decks show that higher levels of automation increase flight performance and reduce measured mental workload while simultaneously reducing vigilance to primary instruments and alerting systems, a combination that requires new monitoring procedures (Causse et al., 2025).

References

  • Acemoglu, D. and Restrepo, P. (2020) ‘Tasks, automation and the rise in US wage inequality’, Journal of Economic Perspectives, 34(4), pp. 3–30.
  • Fleming, M., Li, W. and Thompson, N.C. (2024) ‘The last-mile problem in AI: why job automation will be slower than technological progress suggests’, Brookings Institution, 29 August.
  • Ghazzawi, I., Yang, L. and Cameron, A.-F. (2023) ‘Boundary resources and hand off friction in AI enabled knowledge work’, Information Systems Journal, 33(2), pp. 215–240.
  • Hart, S.G. and Staveland, L.E. (1988) ‘Development of NASA TLX: results of empirical and theoretical research’, in Hancock, P.A. and Meshkati, N. (eds) Human Mental Workload. Amsterdam: North Holland, pp. 139–183.
  • International Civil Aviation Organization (2006) Operational Use of Advanced Automation in Flight Decks. Montreal: ICAO.
  • Lillemets, K. (2024) ‘Human in the loop AI transforms freight forwarding’, SupplyChainBrain, 5 February.
  • NASA (2019) Automation in the Airline Cockpit. NASA Technical Memorandum TM-2019 219645.
  • Varanasi, L. (2025) ‘Inside the AI boom that’s transforming consulting’, Business Insider, 27 April.
  • Public Services Alliance (2025) ‘Duolingo’s AI first shift’, LinkedIn post summary, 2025.
  • Marketing Interactive (2025) ‘Duolingo faces backlash after AI first announcement’, Marketing Interactive, April.
  • Causse, M., Peysakhovich, V. and Farioli, L. (2025) ‘Automation and mental workload: evidence from a flight simulation’, Journal of Air Transport Management, 45, pp. 12–20.
  • Psych Safety (2023) ‘Sociotechnical systems theory: history, evolution and toolkit’. Available at: https://psychsafety.co.uk (Accessed: 5 August 2025).
  • Ocado Group (2025) ‘On grid robotic pick solution FAQ’. Available at: https://ocadointelligentautomation.com (Accessed: 4 August 2025).
  • Buntz, B. (2024) ‘The automation paradox: why shrinking your way to success with AI is often not a winning strategy’, Research & Development World, 27 December.
  • Atlassian (2025) State of Developer Experience Report 2025. Sydney: Atlassian.
  • Wise, J. (2011) ‘The human factor’, Vanity Fair, 17 October.

James Boyce Author Bio

James Boyce is a British Airways pilot with a passion for leadership, innovation, and continual learning. Recognised in the AMBA Student of the Year 2024 awards as “Highly Commended”, he combines his aviation expertise and business acumen to champion forward-thinking management approaches. His professional and academic experiences reflect a deep commitment to improving processes and inspiring others to reach their fullest potential.

Driven by an avid interest in corporate strategy and investment research, James regularly shares insights on his personal website, www.jameswboyce.com, where he offers practical articles, tools, and thought leadership on topics ranging from leadership frameworks to financial analysis. Beyond aviation, his entrepreneurial focus extends to accessibility in air travel through Access-air-bility, a platform dedicated to making flying safer and more comfortable for travellers with specific health needs or mobility challenges.

An enthusiastic writer, James is the author of Personal Finance: A Practical Guide to Managing Your Money, a visual guide aimed at giving beginners an introductory knowledge of key financial principle and welcomes professional connections and collaborative inquiries via his LinkedIn profile.

In addition to his busy flight roster and entrepreneurial endeavours, James is a multifaceted individual whose pursuits span the creative, musical, and intellectual realms. A trained organist and classical guitarist, he enjoys refining his technique in both instruments whenever his schedule allows. He also holds prestigious fellowships as a Fellow of the Society of Crematorium Organists and a Fellow of the National Federation of Church Musicians, reflecting his dedication to mastering the art of liturgical music. His musical background, alongside his membership in Mensa and the Royal Aeronautical Society, exemplifies an inherent drive to challenge himself across varied disciplines.

Always seeking personal growth, James is currently learning Mandarin to expand his cultural perspectives and enhance his global engagement. By embracing new languages, he aims to foster deeper connections with international colleagues and communities, further enriching his professional and personal pursuits.

As someone who believes in lifelong education, James attributes his success to a blend of rigorous academic training, real-world commercial insight, and a relentless curiosity about the future of work and society. Whether in the cockpit at 35,000 feet, practising a classical guitar piece, or devising strategies for inclusive air travel, he strives to bring vision, discipline, and empathy to every role he undertakes.

For more information on James Boyce, his latest articles, and upcoming projects, visit his personal website at www.jameswboyce.com or his LinkedIn page at linkedin.com/in/jameswilliamboyce. You can also learn more about his accessibility initiatives by visiting Access-air-bility, or get insights to his insights into disruptive technologies at Cavatim.

August 2025

Would you like to contribute an article towards our Professional Knowledge Bank? Find out more.