Google Assistant Actions on Nest

Someone in the living room says, “Lock the front door,” or “Set it to seventy.” A few seconds later, a bolt moves or a thermostat ticks over. The distance between those words and that action is where this work lived.

I was the product designer embedded with the data-integration engineering team, shaping the interaction layer between Assistant and Nest. My focus was the point where a request on any surface becomes an action, where clarity and safety outweighed speed.

We treated Assistant as a decision system anchored to what the home already knows: a composite of shared memory and the deterministic and implicit states of the connected devices. Internally, we thought of this as a kind of Frankenstein’s monster—stitched together from logs, health checks, and schedules—but its behavior had to feel simple. Simple requests can be handled as straightforward if/then statements. Others are more nuanced. If a device suffers a fault and goes offline, its peers can pass along its most recent state so a short request still leads to the right action or a clear next step.

The stakes of these calculations are clearest in products like the Nest × Yale Lock. When someone says, “Lock the front door,” and identity and policy allow it, the device acts and Assistant confirms in a single line: “Locked Front Door.” When something stops the action, the system tells the person why and follows up with next steps: “Can’t lock. Door is obstructed. Try again, or check the bolt.” For riskier requests like unlocking, the bar is higher. The assistant asks for explicit confirmation, keeps the window brief, and—if identity is uncertain—declines with a concrete next step. Each successful action writes to history in the same one-line format the app uses so a shared home can see who did what, when.

Thermostats follow the same logic without the safety layer. Someone in the living room says, “Set it to seventy.” Room context resolves the target, the system applies the change, and the feedback reflects the device’s actual state. When the target is ambiguous, the follow-up is short and specific: “Which thermostat, Living Room or Bedroom?” If the person walks away mid-question, we cancel so nothing fires later. The aim stays modest on purpose: recognize intent, match it to a capability, and keep the path from request to result short and legible.

// We introduced a typed grammar for third-party-actions  
// with manual and induced rules for controlling Nest devices

// Example grammar rule
(rule $NestAction (change temperature to $Number degree) (= ((mode CHANGE_TEMP) 
(absolute_degree $1))))

// Self-generated example
(instance "change temperature to 20 degree" (= 
([nlp_semantic_parsing.models.third_party_actions.Nest.nest]
((mode CHANGE_TEMP)
(absolute_degree ((content 20))))))

Language had to read the same wherever you met it. Success fits in one line you can absorb in passing. Failure states tell the truth and point to recovery with a shared vocabulary across surfaces, for example: “Device is offline. Check Connect.” Voice, app, and notifications share the same terms so moving between them feels continuous rather than like switching systems.

Some requests do not finish in a single turn; this is where shared memory earns its keep. Assistant uses what the home knows—device health, recent errors, who is speaking—and pairs it with what the cloud knows about schedules, help paths, and account context. Instead of returning a generic question, it prepares the next step that fits the situation. If cooling has failed and recent compressor faults are logged, it skips a generic checklist and offers the step that serves this home. If a handoff to support is needed, it sends a short transcript and pertinent device context so the conversation does not restart from zero. On nearby screens, we surface targeted follow-ups and plausible alternatives during the exchange, then collect a brief rating so the model learns which paths helped.

My scope was the interaction model and its measurement. I translated model outputs, triage signals, and device health into legible states and controls; set patterns for confirmation, decline, and disambiguation; and kept language consistent across app, notifications, and spoken feedback. With data science, I defined the event schema and success signals—intent recognized, action completed, time to resolve. With platform, mobile, and firmware, I aligned execution and confirmation so the surface reported what actually occurred.

Because voice can fail quietly, we measured what mattered. We logged key UI states to trace each step and watched guardrails: denial rates on safety actions, false-positive identity triggers, and support contacts tied to voice flows. In early internal use, logs showed fewer second attempts on the lock and shorter detours after common errors. When friction appeared, we tuned the question rather than add decoration.

What shipped stays out of the way: the right action, clear words, and a trace you can live with. The assistant uses what the home already knows, asks only when needed, and settles the same way in voice, app, and notifications. ♦


Someone in the living room says, “Lock the front door,” or “Set it to seventy.” A few seconds later, a bolt moves or a thermostat ticks over. The distance between those words and that action is where this work lived.

I was the product designer embedded with the data-integration engineering team, shaping the interaction layer between Assistant and Nest. My focus was the point where a request on any surface becomes an action, where clarity and safety outweighed speed.

We treated Assistant as a decision system anchored to what the home already knows: a composite of shared memory and the deterministic and implicit states of the connected devices. Internally, we thought of this as a kind of Frankenstein’s monster—stitched together from logs, health checks, and schedules—but its behavior had to feel simple. Simple requests can be handled as straightforward if/then statements. Others are more nuanced. If a device suffers a fault and goes offline, its peers can pass along its most recent state so a short request still leads to the right action or a clear next step.

The stakes of these calculations are clearest in products like the Nest × Yale Lock. When someone says, “Lock the front door,” and identity and policy allow it, the device acts and Assistant confirms in a single line: “Locked Front Door.” When something stops the action, the system tells the person why and follows up with next steps: “Can’t lock. Door is obstructed. Try again, or check the bolt.” For riskier requests like unlocking, the bar is higher. The assistant asks for explicit confirmation, keeps the window brief, and—if identity is uncertain—declines with a concrete next step. Each successful action writes to history in the same one-line format the app uses so a shared home can see who did what, when.

Thermostats follow the same logic without the safety layer. Someone in the living room says, “Set it to seventy.” Room context resolves the target, the system applies the change, and the feedback reflects the device’s actual state. When the target is ambiguous, the follow-up is short and specific: “Which thermostat, Living Room or Bedroom?” If the person walks away mid-question, we cancel so nothing fires later. The aim stays modest on purpose: recognize intent, match it to a capability, and keep the path from request to result short and legible.

// We introduced a typed grammar for third-party-actions  
// with manual and induced rules for controlling Nest devices

// Example grammar rule
(rule $NestAction (change temperature to $Number degree) (= ((mode CHANGE_TEMP) 
(absolute_degree $1))))

// Self-generated example
(instance "change temperature to 20 degree" (= 
([nlp_semantic_parsing.models.third_party_actions.Nest.nest]
((mode CHANGE_TEMP)
(absolute_degree ((content 20))))))

Language had to read the same wherever you met it. Success fits in one line you can absorb in passing. Failure states tell the truth and point to recovery with a shared vocabulary across surfaces, for example: “Device is offline. Check Connect.” Voice, app, and notifications share the same terms so moving between them feels continuous rather than like switching systems.

Some requests do not finish in a single turn; this is where shared memory earns its keep. Assistant uses what the home knows—device health, recent errors, who is speaking—and pairs it with what the cloud knows about schedules, help paths, and account context. Instead of returning a generic question, it prepares the next step that fits the situation. If cooling has failed and recent compressor faults are logged, it skips a generic checklist and offers the step that serves this home. If a handoff to support is needed, it sends a short transcript and pertinent device context so the conversation does not restart from zero. On nearby screens, we surface targeted follow-ups and plausible alternatives during the exchange, then collect a brief rating so the model learns which paths helped.

My scope was the interaction model and its measurement. I translated model outputs, triage signals, and device health into legible states and controls; set patterns for confirmation, decline, and disambiguation; and kept language consistent across app, notifications, and spoken feedback. With data science, I defined the event schema and success signals—intent recognized, action completed, time to resolve. With platform, mobile, and firmware, I aligned execution and confirmation so the surface reported what actually occurred.

Because voice can fail quietly, we measured what mattered. We logged key UI states to trace each step and watched guardrails: denial rates on safety actions, false-positive identity triggers, and support contacts tied to voice flows. In early internal use, logs showed fewer second attempts on the lock and shorter detours after common errors. When friction appeared, we tuned the question rather than add decoration.

What shipped stays out of the way: the right action, clear words, and a trace you can live with. The assistant uses what the home already knows, asks only when needed, and settles the same way in voice, app, and notifications. ♦