How to Build an Ability

Introduction

Custom Abilities are the cornerstone of extending OpenHome’s functionality. They allow developers to:

Add personalized features to AI agents.
Integrate third-party APIs for dynamic interactions.
Customize logic for enhanced user engagement.

This guide walks you through:

Structuring and registering an Ability.
Using CapabilityWorker for seamless I/O management.
Examples showcasing how to create powerful custom Abilities.

Adding an Ability

File Structure

Each Ability resides in its folder and requires a main.py file to define the logic.

<YourAbilityFolder>
│
└── main.py

Example File: `main.py`

Here’s a basic template for building a new Ability:

class YourCapability(MatchingCapability):
    @classmethod
    def register_capability(cls) -> "MatchingCapability":
        with open(
            os.path.join(os.path.dirname(os.path.abspath(__file__)), "config.json")
        ) as file:
            data = json.load(file)
        return cls(
            unique_name=data["unique_name"],
            matching_hotwords=data["matching_hotwords"],
        )

    def call(
        self,
        worker: AgentWorker,
    ):
        # Your capability logic here
        return "Your Capability Called!"

Key Components

register_capability: Defines the Ability’s unique_name and matching_hotwords.
call: Executes the Ability’s logic when triggered.

Understanding `CapabilityWorker`

The CapabilityWorker class simplifies I/O interactions, enabling:

Speech synthesis: Using text-to-speech (TTS).
Listening for user input: Capturing and processing responses.
Running interaction loops: Supporting conversational flows.

Speak Function

The speak function converts text into speech and streams it to the user via WebSocket.

async def speak(self, tokens: str):

User Response Function

This function listens for user input asynchronously, ideal for capturing dynamic user responses.

async def user_response(self):

Run I/O Loop

The run_io_loop orchestrates speaking and listening for responses in a single flow. It orchestrates a single “ask-then-listen” interaction. It speaks the provided text to the user, then waits for the next user reply, and finally returns that reply. Use when: You need a simple “ask once, get one reply” interaction.

async def run_io_loop(self, tokens: str):

Confirmation Loop

This specialized loop handles confirmation prompts like “yes” or “no” interactions.

async def run_confirmation_loop(self, tokens: str):
    prompt = tokens + ", Please respond with 'yes' or 'no'."
    while True:
        await self.speak(prompt)
        response = await self.user_response()
        if "yes" in response.lower():
            return True
        elif "no" in response.lower():
            return False

Wait for Complete Transcription

This function waits until the user has completely finished speaking before processing the input.

async def wait_for_complete_transcription(self):

Returns:

msg (str): The final transcribed user input after the user has finished speaking.

Send Devkit Action

This function sends an action event over WebSocket, allowing integration with the Devkit system. It is useful for triggering specific actions in Devkit systems.

async def send_devkit_action(self, action: str = ""):

Parameters:

action (str): The action to be sent to the Devkit.

Send Data Over WebSocket

This function transmits structured data over WebSocket, ensuring seamless communication with external systems.

async def send_data_over_websocket(self, data_type: str = "", data: dict = {}):

Parameters:

data_type (str): The type of data being sent.
data (dict): The structured data to be transmitted.

Text-to-Text Response

This function generates text responses using the configured text-to-text client with optional history and system prompts.

def text_to_text_response(self, prompt_text: str, history: list = [], system_prompt: str = ""):

Parameters:

prompt_text (str): The prompt text to generate a response for.
history (list): Optional conversation history for context.
system_prompt (str): Optional system prompt to guide the response.

Returns:

str: The generated text response.

Play Audio

This function plays audio content directly from file content or bytes.

async def play_audio(self, file_content):

Parameters:

file_content: The audio file content to play (bytes or file-like object).

Play from Audio File

This function plays audio from a file stored in the capability’s directory.

async def play_from_audio_file(self, file_name: str = None):

Parameters:

file_name (str): The name of the audio file to play from the capability directory.

Stream Init

This function initializes audio streaming and sets up the WebSocket connection.

async def stream_init(self):

Stream End

This function ends the audio streaming session and cleans up resources.

async def stream_end(self):

Send Audio Data in Stream

This function streams audio data in chunks over WebSocket, with automatic audio processing (mono conversion, resampling).

async def send_audio_data_in_stream(self, file_content: bytes | IO[bytes] | httpx.Response, chunk_size=4096):

Parameters:

file_content: The audio content to stream (bytes, file-like object, or HTTP response).
chunk_size (int): The size of each chunk to send (default: 4096 bytes).

Using Specific Voice IDs for Text-to-Speech

The CapabilityWorker class supports the use of specific Voice IDs for text-to-speech (TTS) functionality. This allows you to customize the voice used for speech synthesis by specifying a Voice ID from the provided list.

Available Voice IDs

You can use any of the following Voice IDs for TTS:

{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "29vD33N1CtxCmqQRPOHJ",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "2EiwWnXFnvU5JabPnv8n",
      "labels": {
        "accent": "american",
        "description": "war veteran",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "5Q0t7uMcjvnagumLfvZi",
      "labels": {
        "accent": "american",
        "description": "ground reporter",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "CYw3kZ02Hs0563khs1Fj",
      "labels": {
        "accent": "british-essex",
        "description": "conversational",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "D38z5RcWu1voky8WS1ja",
      "labels": {
        "accent": "irish",
        "description": "sailor",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "labels": {
        "accent": "american",
        "description": "soft",
        "age": "young",
        "gender": "female",
        "use case": "news"
      }
    },
    {
      "voice_id": "ErXwobaYiN019PkySvjV",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "GBv7mTt0atIp3Br8iCZE",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "male",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "IKne3meq5aSn9XLyUdCD",
      "labels": {
        "accent": "australian",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "JBFqnCBsd6RMkjVDRZzb",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "LcfcDJNUP1GQjkzn1xUU",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "MF3mGyEYCl7XYWbV9V6O",
      "labels": {
        "accent": "american",
        "description": "emotional",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "N2lVS1w4EtoT3dr4eOWO",
      "labels": {
        "accent": "american",
        "description": "hoarse",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "ODq5zmih8GrVes37Dizd",
      "labels": {
        "accent": "american",
        "description": "shouty",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "SOYHLrjzK2X1ezoPC6cr",
      "labels": {
        "accent": "american",
        "description": "anxious",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "TX3LPaxmHKxFdv7VOQHJ",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "narration",
        "description ": "neutral"
      }
    },
    {
      "voice_id": "ThT5KcBeYPX3keUQqHPh",
      "labels": {
        "accent": "british",
        "description": "pleasant",
        "age": "young",
        "gender": "female",
        "use case": "children's stories"
      }
    },
    {
      "voice_id": "TxGEqnHWrfWFTfGW9XjX",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "VR6AewLTigWG4xSOukaG",
      "labels": {
        "accent": "american",
        "description": "crisp",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "XB0fDUnXU5powFXDhCwa",
      "labels": {
        "accent": "english-swedish",
        "description": "seductive",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "Xb7hH8MSUJpSbSDYk0k2",
      "labels": {
        "accent": "british",
        "description": "confident",
        "age": "middle aged",
        "gender": "female",
        "featured": "new",
        "use case": "news"
      }
    },
    {
      "voice_id": "XrExE9yKIg1WjnnlVkGX",
      "labels": {
        "accent": "american",
        "description": "warm",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "ZQe5CZNOzWyzPSCn5a3c",
      "labels": {
        "accent": "australian",
        "description": "calm ",
        "age": "old",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "Zlb1dXrM653N07WRdFW3",
      "labels": {
        "accent": "british",
        "age": "middle aged",
        "gender": "male",
        "use case": "news",
        "description ": "ground reporter "
      }
    },
    {
      "voice_id": "bVMeCyTHy58xNoL34h3p",
      "labels": {
        "accent": "american-irish",
        "description": "excited",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "flq6f7yk4E4fJM5XTYuZ",
      "labels": {
        "accent": "american",
        "age": "old",
        "gender": "male",
        "use case": "audiobook",
        "description ": "orotund"
      }
    },
    {
      "voice_id": "g5CIjZEefAph4nQFvHAz",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "ASMR",
        "description ": "whisper"
      }
    },
    {
      "voice_id": "iP95p4xoKVk53GoZ742B",
      "labels": {
        "accent": "american",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "jBpfuIE2acCO8z3wKNLl",
      "labels": {
        "accent": "american",
        "description": "childlish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    },
    {
      "voice_id": "jsCqWAovK2LkecY7zXl4",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "description ": "overhyped",
        "usecase": "video games"
      }
    },
    {
      "voice_id": "nPczCjzI2devNBz1zQrb",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "narration"
      }
    },
    {
      "voice_id": "oWAxZDx7w5VEj9dCyTzz",
      "labels": {
        "accent": "american-southern",
        "age": "young",
        "gender": "female",
        "use case": "audiobook ",
        "description ": "gentle"
      }
    },
    {
      "voice_id": "onwK4e9ZLuTAKqWW03F9",
      "labels": {
        "accent": "british",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "news presenter"
      }
    },
    {
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "pMsXgVXv3BLzUgSXRplE",
      "labels": {
        "accent": "american",
        "description": "pleasant",
        "age": "middle aged",
        "gender": "female",
        "use case": "interactive"
      }
    },
    {
      "voice_id": "pNInz6obpgDQGcFmaJgB",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "piTKgcLEGmPE4e6mEKli",
      "labels": {
        "accent": "american",
        "description": "whisper",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "pqHfZKP75CvOlQylNhV4",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "middle aged",
        "gender": "male",
        "use case": "documentary"
      }
    },
    {
      "voice_id": "t0jbNlBVZ17f02VDIeMI",
      "labels": {
        "accent": "american",
        "description": "raspy ",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "yoZ06aMxZJJ28mfd3POQ",
      "labels": {
        "accent": "american",
        "description": "raspy",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "z9fAnlkpzviPz146aGWa",
      "labels": {
        "accent": "american",
        "description": "witch",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "zcAOhNBS3c14rBihAFp1",
      "labels": {
        "accent": "english-italian",
        "description": "foreigner",
        "age": "young",
        "gender": "male",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "zrHiDhphv9ZnVXBqCLjz",
      "labels": {
        "accent": "english-swedish",
        "description": "childish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    }
  ]
}

text_to_speech Function

The text_to_speech function converts the provided text into speech using the specified Voice ID and streams it to the user via WebSocket.

async def text_to_speech(self, text: str, voice_id: str):

Parameters

text (str): The text to be converted into speech.
voice_id (str): The Voice ID to be used for speech synthesis.

Advanced CapabilityWorker Functions

Audio Processing Functions

The CapabilityWorker provides comprehensive audio handling capabilities:

play_audio: Play audio content directly or file objects
play_from_audio_file: Play audio files stored in the capability directory
send_audio_data_in_stream: Stream processed audio data over WebSocket

Text Generation Functions

Multiple options for text generation:

text_to_text_response: Standard text generation with history and system prompts

Streaming and Communication

Advanced communication features:

stream_init and stream_end: Manage audio streaming sessions
send_data_over_websocket: Send custom data over WebSocket

Session Task Utilities (replace raw asyncio usage)

To ensure Abilities run within the agent’s managed lifecycle, avoid using raw asyncio helpers directly.

Use self.worker.session_tasks.sleep(seconds: float) instead of asyncio.sleep(...):

async def some_task(self):
    await self.worker.session_tasks.sleep(1.5)
    await self.capability_worker.speak("Thanks for waiting!")

These helpers ensure proper cancellation, cleanup, and session scoping.

Example 1: Basic Capability

This Ability creates a daily life advisor that:

Asks the user for a problem: Initiates a conversation to gather user input.
Provides advice: Offers a solution based on user input.
Collects feedback: Captures user satisfaction with the advice.

Code

class BasicCapability(MatchingCapability):
    async def give_advice(self):
        await self.capability_worker.speak("Hi! I'm your daily life advisor. Tell me your problem.")
        user_problem = await self.capability_worker.user_response()

        solution_prompt = f"Provide a solution for: {user_problem}"
        solution = self.capability_worker.text_to_text_response(solution_prompt)

        user_feedback = await self.capability_worker.run_io_loop(
            solution + " Are you satisfied with the advice?"
        )
        await self.capability_worker.speak("Thank you for using the advisor.")
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        self.worker.session_tasks.create(self.give_advice())

Key Functions

speak: Introduces the advisor and provides the solution.
user_response: Captures user input (e.g., their problem).
run_io_loop: Combines speaking the solution and listening for feedback.
resume_normal_flow: Resumes the agent’s default workflow after interaction.

Example 2: Weather Capability

This Ability integrates a weather API to fetch and share weather updates based on user-provided locations.

Code

class WeatherDocsCapability(MatchingCapability):
    async def first_setup(self, location: str):
        if not location:
            await self.capability_worker.speak("Which location?")
            location = await self.capability_worker.user_response()

        geolocator = Nominatim(user_agent="weather_agent")
        loc = geolocator.geocode(location)

        if loc:
            result = requests.get(
                f"https://api.open-meteo.com/v1/forecast?latitude={loc.latitude}&longitude={loc.longitude}&current=temperature_2m"
            ).json()
            weather_report = f"The temperature in {location} is {result['current']['temperature_2m']}°C."
            await self.capability_worker.speak(weather_report)
        else:
            await self.capability_worker.speak("Invalid location. Try again.")
        
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        self.worker.session_tasks.create(self.first_setup(""))

Key Features

External API Call: Fetches real-time weather data.
Geolocation: Validates and processes user-provided locations.
Error Handling: Provides meaningful feedback for invalid inputs.

Allowed/Disallowed Libraries and Patterns

The following imports, keywords, and patterns are not allowed in Abilities. Use the safe alternatives.

Blocked Imports and Keywords

Name	Why not allowed
redis	Direct datastore coupling and security concerns; not portable across deployments.
RedisHandler (`src.utils.db_handler`)	Bypasses platform abstractions; risks data integrity and sandbox boundaries.
connection_manager	Central system control; direct use from Abilities breaks isolation and multi-tenant safety.
user_config	Raw config access can leak or mutate global state; use provided APIs on `CapabilityWorker`/`worker`.
print	Bypasses structured logging; noisy and untraceable in production; use `editor_logging_handler`.
open (raw)	Unmanaged filesystem access; security and portability risks; prefer approved helpers/per-user storage.

Guidance:

Avoid direct storage/infra access. Use platform-provided helpers within CapabilityWorker/worker or request an API if needed.
Use the provided logging (editor_logging_handler) instead of prints.
For files, prefer platform abstractions and per-user capability folders; ask for an approved helper if you need persistent storage.

Security Guidance

Avoid insecure or unsafe patterns such as runtime assert checks, exec() of dynamic code, binding servers to all interfaces, hardcoded secrets, swallowing exceptions, insecure deserialization (pickle/dill/shelve/marshal), weak hashes like MD5, or weak cipher modes (e.g., ECB). If you have a special case, request approval and an approved wrapper/utility.

Conclusion

Building Abilities in OpenHome empowers developers to create custom functionalities for AI agents. With the examples like the Basic Advisor and Weather Capability, you can:

Core Communication: Use speak, run_io_loop, and user_response for basic interactions.
Advanced Audio: Play custom audio files, and stream audio data.
Text Generation: Leverage multiple text-to-text options with history and system prompts.
Voice Customization: Use specific voice IDs for varied and engaging responses.
External APIs: Integrate third-party services for dynamic functionality.

The examples demonstrate everything from basic conversational flows to advanced audio processing and device control. The CapabilityWorker provides all the tools needed to create sophisticated, interactive Abilities.

Start creating innovative Abilities and push the boundaries of voice AI with OpenHome! 🎉

Note: It is recommended to use the requests module to call third-party APIs and avoid using other libraries. If any other library is needed for a special case, you can request us to add it.

Get Started

Developer's Docs

AI Twin

​Introduction

​Adding an Ability

​File Structure

​Example File: main.py

​Key Components

​Understanding CapabilityWorker

​Speak Function

​User Response Function

​Run I/O Loop

​Confirmation Loop

​Wait for Complete Transcription

​Send Devkit Action

​Send Data Over WebSocket

​Text-to-Text Response

​Play Audio

​Play from Audio File

​Stream Init

​Stream End

​Send Audio Data in Stream

​Using Specific Voice IDs for Text-to-Speech

​Available Voice IDs

​text_to_speech Function

​Parameters

​Advanced CapabilityWorker Functions

​Audio Processing Functions

​Text Generation Functions

​Streaming and Communication

​Session Task Utilities (replace raw asyncio usage)

​Example 1: Basic Capability

​Code

​Key Functions

​Example 2: Weather Capability

​Code

​Key Features

​Allowed/Disallowed Libraries and Patterns

​Blocked Imports and Keywords

​Security Guidance

​Conclusion

Introduction

Adding an Ability

File Structure

Example File: `main.py`

Key Components

Understanding `CapabilityWorker`

Speak Function

User Response Function

Run I/O Loop

Confirmation Loop

Wait for Complete Transcription

Send Devkit Action

Send Data Over WebSocket

Text-to-Text Response

Play Audio

Play from Audio File

Stream Init

Stream End

Send Audio Data in Stream

Using Specific Voice IDs for Text-to-Speech

Available Voice IDs

text_to_speech Function

Parameters

Advanced CapabilityWorker Functions

Audio Processing Functions

Text Generation Functions

Streaming and Communication

Session Task Utilities (replace raw asyncio usage)

Example 1: Basic Capability

Code

Key Functions

Example 2: Weather Capability

Code

Key Features

Allowed/Disallowed Libraries and Patterns

Blocked Imports and Keywords

Security Guidance

Conclusion