How to Build an Ability
Learn to build and integrate custom Abilities in OpenHome, using CapabilityWorker
and example use cases.
Introduction
Custom Abilities are the cornerstone of extending OpenHome’s functionality. They allow developers to:
- Add personalized features to AI agents.
- Integrate third-party APIs for dynamic interactions.
- Customize logic for enhanced user engagement.
This guide walks you through:
- Structuring and registering an Ability.
- Using
CapabilityWorker
for seamless I/O management. - Examples showcasing how to create powerful custom Abilities.
Adding an Ability
File Structure
Each Ability resides in its folder and requires a main.py
file to define the logic.
Example File: main.py
Here’s a basic template for building a new Ability:
Key Components
register_capability
: Defines the Ability’sunique_name
andmatching_hotwords
.call
: Executes the Ability’s logic when triggered.
Understanding CapabilityWorker
The CapabilityWorker
class simplifies I/O interactions, enabling:
- Speech synthesis: Using text-to-speech (TTS).
- Listening for user input: Capturing and processing responses.
- Running interaction loops: Supporting conversational flows.
Speak Function
The speak
function converts text into speech and streams it to the user via WebSocket.
User Response Function
This function listens for user input asynchronously, ideal for capturing dynamic user responses.
Run I/O Loop
The run_io_loop
orchestrates speaking and listening for responses in a single flow.
Confirmation Loop
This specialized loop handles confirmation prompts like “yes” or “no” interactions.
Wait for Complete Transcription
This function waits until the user has completely finished speaking before processing the input.
Returns:
msg (str)
: The final transcribed user input after the user has finished speaking.
Send Devkit Action
This function sends an action event over WebSocket, allowing integration with the Devkit system. It is useful for triggering specific actions in Devkit systems.
Parameters:
action (str)
: The action to be sent to the Devkit.
Send Data Over WebSocket
This function transmits structured data over WebSocket, ensuring seamless communication with external systems.
Parameters:
type (str)
: The type of data being sent.data (dict)
: The structured data to be transmitted.
Using Specific Voice IDs for Text-to-Speech
The CapabilityWorker class supports the use of specific Voice IDs for text-to-speech (TTS) functionality. This allows you to customize the voice used for speech synthesis by specifying a Voice ID from the provided list.
Available Voice IDs
You can use any of the following Voice IDs for TTS:
text_to_speech Function
The text_to_speech
function converts the provided text into speech using the specified Voice ID and streams it to the user via WebSocket.
Parameters
text (str)
: The text to be converted into speech.voice_id (str)
: The Voice ID to be used for speech synthesis.
Example 1: Basic Capability
This Ability creates a daily life advisor that:
- Asks the user for a problem: Initiates a conversation to gather user input.
- Provides advice: Offers a solution based on user input.
- Collects feedback: Captures user satisfaction with the advice.
Code
Key Functions
speak
: Introduces the advisor and provides the solution.user_response
: Captures user input (e.g., their problem).run_io_loop
: Combines speaking the solution and listening for feedback.resume_normal_flow
: Resumes the agent’s default workflow after interaction.
Example 2: Weather Capability
This Ability integrates a weather API to fetch and share weather updates based on user-provided locations.
Code
Key Features
- External API Call: Fetches real-time weather data.
- Geolocation: Validates and processes user-provided locations.
- Error Handling: Provides meaningful feedback for invalid inputs.
Conclusion
Building Abilities in OpenHome empowers developers to create custom functionalities for AI agents. With examples like the Basic Advisor and Weather Capability:
- Understand core functions like
speak
,run_io_loop
, and API integration. - Extend the platform to fit unique needs, from integrating external APIs to creating dynamic conversational flows.
Start creating innovative Abilities and push the boundaries of voice AI with OpenHome! 🎉
Note: It is recommended to use the requests
module to call third-party APIs and avoid using other libraries. If any other library is needed for a special case, you can request us to add it.