The Rise of Desk-Side AI: Why Employee-Scale Agentic Computing Is Inevitable
Where we are, and what needs to be solved (beyond just saying "security")
The vision of local AI has morphed and exploded in popularity with agentic AI and tools like OpenClaw. At STH, we have been working on a range of local AI setups, including consumer GPUs, professional GPUs, high-end GPU servers, 128GB LPDDR5X unified memory systems, Apple Silicon systems, server CPU-based systems, and more. What started as a somewhat neat science experiment has turned into something almost wildly useful over the past few months. As everyone, from executives to individuals on their own time realize that agentic AI is doing so much more than just a chatbot, the discussion is shifting into three mega-tiers of AI inference compute:
Large-scale AI Factory compute - These are the well-known AI providers and clusters. Also, large-scale computing facilities can operate in a similar tier for some industries (e.g., in drug discovery)
Local or Employee Scale Agentic AI - This, I will argue, encompasses everything from what sits at a person’s desk or cubicle, up to small clusters
On-device AI - This is the domain of physical AI that often has hard constraints to achieve a mission with limited connectivity and power envelopes
The first tier of the frontier model, hosting and high-end AI factories, makes sense. In large-scale systems, the cost of delivering used and useful tokens can be driven down at an exceptional rate. The cost lever of scaling out these AI factories with new infrastructure generations, folks are hyper-focused on. What perhaps garners insufficient attention is the successful completion of tasks and the compute and tokens required to achieve them. If you have ever seen a task fail after a million or several million tokens burned, you will know the exact challenge. Higher success rates mean workloads transition more quickly.
Large-scale AI factories are absolutely necessary. The simple reason is this: agentic workflows need to scale out, and at speed. When agents are talking to agents, the accuracy and speed metrics will dominate. Years ago, I worked on a project for a larger enterprise IT company where the primary goal was simply to beat a larger competitor’s time to do the quote-to-order process because it found that it was losing a large number of deals (double-digit percentage of revenue) simply due to being slower than its competitor. Partners and customers would frequently pick the other vendor just because they were faster. The same will happen with agent-to-agent communication.
On-device AI has its own constraints. If you do not have strong connectivity and are operating on battery power, that naturally limits how much compute you can have, which makes this its own mega-tier.
Still, at some point, individuals and organizations will gravitate toward a mix of both, using AI factories and some sort of local or employee-scale AI infrastructure. Controlling costs and ensuring that data traversals are well-understood are reasons for this, but there is a more practical one: it fits into the existing employment paradigm.
First, we will quickly recap the common platforms for this local or employee-scale AI. Then, I want to make a case for how these platforms fit.


