Arquitectura A2A sobre WhatsApp Business: patrones y antipatrones

WhatsApp Business API es hoy el canal de mayor penetración en LATAM para comunicación B2B: tasas de apertura superiores al 95%, respuesta promedio en minutos. Sin embargo, la mayoría de las implementaciones de "agente de IA en WhatsApp" que hemos auditado en los últimos 12 meses cometen los mismos errores estructurales.

Este artículo documenta cinco patrones de arquitectura que funcionan en producción y cinco antipatrones que hemos visto fallar repetidamente, basados en despliegues reales sobre la WhatsApp Business API (no Cloud API de Meta, sino el tier Enterprise).

El patrón más crítico que funciona es la separación estricta entre el agente de calificación y el agente de cotización. Cuando un solo agente intenta hacer ambas cosas, se produce lo que llamamos "deriva de contexto": el modelo pierde coherencia entre el perfil construido en la calificación y los parámetros que usa para cotizar. Al separar los agentes en dos contextos independientes, con handoff explícito de variables tipadas, la tasa de error en cotizaciones se reduce en promedio un 67%.

El antipatrón más destructivo que hemos visto es el intento de manejar estado conversacional dentro del contexto del LLM. Es decir, confiar en que el modelo recuerde qué dijo el usuario hace cinco turnos. Los LLMs actuales, incluso con ventanas de contexto largas, degradan su rendimiento en conversaciones de múltiples sesiones. La solución correcta es externalizar el estado a una capa de persistencia (Redis, DynamoDB) y reconstruir el contexto relevante en cada llamada.

Otro patrón que funciona: throttling inteligente por usuario. La API de WhatsApp Business tiene límites de rate que varían por tier. Más importante, los usuarios perciben negativamente respuestas instantáneas en conversaciones que deberían sentirse como interacciones con un humano. Un delay artificial de entre 800ms y 2.5 segundos, calibrado al largo del mensaje, aumenta la tasa de conversión en nuestros despliegues en un promedio de 23%.

El antipatrón de los "agentes generalistas" merece mención especial. La tentación de crear un único agente que maneje ventas, soporte, cobranza y onboarding en el mismo flujo es enorme porque parece más simple. En producción, produce errores de routing que son imposibles de depurar y genera experiencias de usuario inconsistentes. La arquitectura correcta es múltiples agentes especializados con un orquestador que decide cuál invocar.

Finalmente, la métrica que más importa: no es CSAT ni tiempo de respuesta. Es el "throughput conversacional por lead": cuántas conversaciones calificadas completas puede manejar el sistema por hora por número de WhatsApp. Este número determina si el sistema escala o colapsa bajo demanda pico.

WhatsApp Business API is today the channel with the greatest penetration in LATAM for B2B communication: opening rates above 95%, average response in minutes. However, most "AI agent on WhatsApp" implementations we have audited in the last 12 months make the same structural mistakes.

This article documents five architecture patterns that work in production and five antipatterns we have seen fail repeatedly, based on real deployments on WhatsApp Business API (not Meta's Cloud API, but the Enterprise tier).

The most critical pattern that works is the strict separation between the qualification agent and the quotation agent. When a single agent tries to do both, what we call "context drift" occurs: the model loses coherence between the profile built in qualification and the parameters it uses to quote. By separating the agents into two independent contexts, with explicit handoff of typed variables, the error rate in quotes is reduced on average by 67%.

The most destructive antipattern we have seen is the attempt to manage conversational state within the LLM context. That is, trusting that the model will remember what the user said five turns ago. Current LLMs, even with large context windows, degrade their performance in multi-session conversations. The correct solution is to externalize state to a persistence layer (Redis, DynamoDB) and reconstruct the relevant context in each call.

Another pattern that works: intelligent throttling per user. The WhatsApp Business API has rate limits that vary by tier. More importantly, users negatively perceive instantaneous responses in conversations that should feel like interactions with a human. An artificial delay of between 800ms and 2.5 seconds, calibrated to the length of the message, increases the conversion rate in our deployments by an average of 23%.

The antipattern of "generalist agents" deserves special mention. The temptation to create a single agent that handles sales, support, billing, and onboarding in the same flow is enormous because it seems simpler. In production, it produces routing errors that are impossible to debug and generates inconsistent user experiences. The correct architecture is multiple specialized agents with an orchestrator that decides which one to invoke.

Finally, the metric that matters most: it's not CSAT or response time. It's the "conversational throughput per lead": how many complete qualified conversations the system can handle per hour per WhatsApp number. This number determines whether the system scales or collapses under peak demand.

Arquitectura A2A sobre WhatsApp Business: patrones y antipatrones A2A architecture on WhatsApp Business: patterns and antipatterns

Arquitectura A2A sobre WhatsApp Business: patrones y antipatrones