Before this system existed, Divar’s engineering teams had no standard way to ship features to a subset of users. Every new capability was either fully on or fully off for everyone. That made releases riskier than they needed to be — there was no safe path to roll out something to 1% of users, watch it for a week, and expand from there. A/B tests were run informally, if at all. The consequence was that teams either shipped fast and crossed their fingers, or shipped slow and lost the ability to measure whether a change was actually better.
I went to the CTO with a proposal for a centralised service. The argument was about leverage: a shared flag system that every engineering team could adopt would shift A/B testing from “something a few careful teams do” to the default way features get shipped. I wrote the design doc, walked through the architecture, and got it approved.
The service had two parts: a control plane and thin client SDKs. The control plane was a Python backend with an admin API for defining flags, setting targeting rules, and writing to an audit log. Every flag change was logged with who changed it, when, and what it changed from — this mattered for debugging, but also for the product and legal teams who occasionally needed to trace when a feature was enabled for a specific user. Targeting rules supported percentage rollouts (assign 20% of users to variant B), user-segment targeting, and explicit allow/deny lists for QA and early-access programs. The client SDKs, in Python and Go, pulled flag state locally and cached it — service calls didn’t go back to the control plane on every request. State propagation happened via pub/sub, so a flag change in the admin UI reached production within a second or two rather than at the next poll interval.
The harder design decision was the SDK’s communication model: poll versus push. Pull-based polling is simpler to implement and easier to reason about; push-based pub/sub gives lower propagation latency but adds failure modes (what happens if the SDK doesn’t receive the push?). We chose push with a polling fallback: the SDK subscribed to flag-change events for low latency, but also polled on a longer interval as a safety net. The safe default during any outage was to use the locally cached state — never drop to a hard-coded default, never block on a remote call. That made the SDK safe to put on the hot path.
The non-obvious hard part wasn’t the pub/sub. It was schema evolution for rule definitions. Flag targeting rules had to support new targeting dimensions as they were added (device type, account tier, geographic region), without breaking existing flags or requiring a coordinated deploy across all services. The solution was treating rule definitions as versioned documents with forward-compatible parsing: old SDK versions ignored unknown targeting fields, new ones evaluated them. It held up over the year we kept adding new dimensions.
The system was adopted by all engineering teams at Divar within a few months of launch. A/B testing became a standard part of how features shipped, not a one-off effort. Kill-switches for live incidents went from “deploy a code change” to “flip a flag.” The service was still in use after I moved on, which is the most useful kind of artefact: infrastructure that outlasts the person who built it.
Stack
Python (control plane) and Go (client SDKs), Redis for pub/sub-based cache invalidation, PostgreSQL for flag definitions and the audit log, gRPC between SDK and control plane.