Introduction

Custom Abilities are the cornerstone of extending OpenHome’s functionality. They allow developers to:

  • Add personalized features to AI agents.
  • Integrate third-party APIs for dynamic interactions.
  • Customize logic for enhanced user engagement.

This guide walks you through:

  • Structuring and registering an Ability.
  • Using CapabilityWorker for seamless I/O management.
  • Examples showcasing how to create powerful custom Abilities.

Adding an Ability

File Structure

Each Ability resides in its folder and requires a main.py file to define the logic.

<YourAbilityFolder>

└── main.py

Example File: main.py

Here’s a basic template for building a new Ability:

class YourCapability(MatchingCapability):
    @classmethod
    def register_capability(cls) -> "MatchingCapability":
        with open(
            os.path.join(os.path.dirname(os.path.abspath(__file__)), "config.json")
        ) as file:
            data = json.load(file)
        return cls(
            unique_name=data["unique_name"],
            matching_hotwords=data["matching_hotwords"],
        )

    def call(
        self,
        worker: AgentWorker,
    ):
        # Your capability logic here
        return "Your Capability Called!"

Key Components

  • register_capability: Defines the Ability’s unique_name and matching_hotwords.
  • call: Executes the Ability’s logic when triggered.

Understanding CapabilityWorker

The CapabilityWorker class simplifies I/O interactions, enabling:

  • Speech synthesis: Using text-to-speech (TTS).
  • Listening for user input: Capturing and processing responses.
  • Running interaction loops: Supporting conversational flows.

Speak Function

The speak function converts text into speech and streams it to the user via WebSocket.

async def speak(self, tokens: str):

User Response Function

This function listens for user input asynchronously, ideal for capturing dynamic user responses.

async def user_response(self):

Run I/O Loop

The run_io_loop orchestrates speaking and listening for responses in a single flow.

async def run_io_loop(self, tokens: str):

Confirmation Loop

This specialized loop handles confirmation prompts like “yes” or “no” interactions.

async def run_confirmation_loop(self, tokens: str):
    prompt = tokens + ", Please respond with 'yes' or 'no'."
    while True:
        await self.speak(prompt)
        response = await self.user_response()
        if "yes" in response.lower():
            return True
        elif "no" in response.lower():
            return False

Wait for Complete Transcription

This function waits until the user has completely finished speaking before processing the input.

async def wait_for_complete_transcription(self):

Returns:

  • msg (str): The final transcribed user input after the user has finished speaking.

Send Devkit Action

This function sends an action event over WebSocket, allowing integration with the Devkit system. It is useful for triggering specific actions in Devkit systems.

async def send_devkit_action(self, action: str = ""):

Parameters:

  • action (str): The action to be sent to the Devkit.

Send Data Over WebSocket

This function transmits structured data over WebSocket, ensuring seamless communication with external systems.

async def send_data_over_websocket(self, type: str = "", data: dict = {}):

Parameters:

  • type (str): The type of data being sent.
  • data (dict): The structured data to be transmitted.

Using Specific Voice IDs for Text-to-Speech

The CapabilityWorker class supports the use of specific Voice IDs for text-to-speech (TTS) functionality. This allows you to customize the voice used for speech synthesis by specifying a Voice ID from the provided list.

Available Voice IDs

You can use any of the following Voice IDs for TTS:

{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "29vD33N1CtxCmqQRPOHJ",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "2EiwWnXFnvU5JabPnv8n",
      "labels": {
        "accent": "american",
        "description": "war veteran",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "5Q0t7uMcjvnagumLfvZi",
      "labels": {
        "accent": "american",
        "description": "ground reporter",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "CYw3kZ02Hs0563khs1Fj",
      "labels": {
        "accent": "british-essex",
        "description": "conversational",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "D38z5RcWu1voky8WS1ja",
      "labels": {
        "accent": "irish",
        "description": "sailor",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "labels": {
        "accent": "american",
        "description": "soft",
        "age": "young",
        "gender": "female",
        "use case": "news"
      }
    },
    {
      "voice_id": "ErXwobaYiN019PkySvjV",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "GBv7mTt0atIp3Br8iCZE",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "male",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "IKne3meq5aSn9XLyUdCD",
      "labels": {
        "accent": "australian",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "JBFqnCBsd6RMkjVDRZzb",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "LcfcDJNUP1GQjkzn1xUU",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "MF3mGyEYCl7XYWbV9V6O",
      "labels": {
        "accent": "american",
        "description": "emotional",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "N2lVS1w4EtoT3dr4eOWO",
      "labels": {
        "accent": "american",
        "description": "hoarse",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "ODq5zmih8GrVes37Dizd",
      "labels": {
        "accent": "american",
        "description": "shouty",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "SOYHLrjzK2X1ezoPC6cr",
      "labels": {
        "accent": "american",
        "description": "anxious",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "TX3LPaxmHKxFdv7VOQHJ",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "narration",
        "description ": "neutral"
      }
    },
    {
      "voice_id": "ThT5KcBeYPX3keUQqHPh",
      "labels": {
        "accent": "british",
        "description": "pleasant",
        "age": "young",
        "gender": "female",
        "use case": "children's stories"
      }
    },
    {
      "voice_id": "TxGEqnHWrfWFTfGW9XjX",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "VR6AewLTigWG4xSOukaG",
      "labels": {
        "accent": "american",
        "description": "crisp",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "XB0fDUnXU5powFXDhCwa",
      "labels": {
        "accent": "english-swedish",
        "description": "seductive",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "Xb7hH8MSUJpSbSDYk0k2",
      "labels": {
        "accent": "british",
        "description": "confident",
        "age": "middle aged",
        "gender": "female",
        "featured": "new",
        "use case": "news"
      }
    },
    {
      "voice_id": "XrExE9yKIg1WjnnlVkGX",
      "labels": {
        "accent": "american",
        "description": "warm",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "ZQe5CZNOzWyzPSCn5a3c",
      "labels": {
        "accent": "australian",
        "description": "calm ",
        "age": "old",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "Zlb1dXrM653N07WRdFW3",
      "labels": {
        "accent": "british",
        "age": "middle aged",
        "gender": "male",
        "use case": "news",
        "description ": "ground reporter "
      }
    },
    {
      "voice_id": "bVMeCyTHy58xNoL34h3p",
      "labels": {
        "accent": "american-irish",
        "description": "excited",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "flq6f7yk4E4fJM5XTYuZ",
      "labels": {
        "accent": "american",
        "age": "old",
        "gender": "male",
        "use case": "audiobook",
        "description ": "orotund"
      }
    },
    {
      "voice_id": "g5CIjZEefAph4nQFvHAz",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "ASMR",
        "description ": "whisper"
      }
    },
    {
      "voice_id": "iP95p4xoKVk53GoZ742B",
      "labels": {
        "accent": "american",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "jBpfuIE2acCO8z3wKNLl",
      "labels": {
        "accent": "american",
        "description": "childlish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    },
    {
      "voice_id": "jsCqWAovK2LkecY7zXl4",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "description ": "overhyped",
        "usecase": "video games"
      }
    },
    {
      "voice_id": "nPczCjzI2devNBz1zQrb",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "narration"
      }
    },
    {
      "voice_id": "oWAxZDx7w5VEj9dCyTzz",
      "labels": {
        "accent": "american-southern",
        "age": "young",
        "gender": "female",
        "use case": "audiobook ",
        "description ": "gentle"
      }
    },
    {
      "voice_id": "onwK4e9ZLuTAKqWW03F9",
      "labels": {
        "accent": "british",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "news presenter"
      }
    },
    {
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "pMsXgVXv3BLzUgSXRplE",
      "labels": {
        "accent": "american",
        "description": "pleasant",
        "age": "middle aged",
        "gender": "female",
        "use case": "interactive"
      }
    },
    {
      "voice_id": "pNInz6obpgDQGcFmaJgB",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "piTKgcLEGmPE4e6mEKli",
      "labels": {
        "accent": "american",
        "description": "whisper",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "pqHfZKP75CvOlQylNhV4",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "middle aged",
        "gender": "male",
        "use case": "documentary"
      }
    },
    {
      "voice_id": "t0jbNlBVZ17f02VDIeMI",
      "labels": {
        "accent": "american",
        "description": "raspy ",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "yoZ06aMxZJJ28mfd3POQ",
      "labels": {
        "accent": "american",
        "description": "raspy",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "z9fAnlkpzviPz146aGWa",
      "labels": {
        "accent": "american",
        "description": "witch",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "zcAOhNBS3c14rBihAFp1",
      "labels": {
        "accent": "english-italian",
        "description": "foreigner",
        "age": "young",
        "gender": "male",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "zrHiDhphv9ZnVXBqCLjz",
      "labels": {
        "accent": "english-swedish",
        "description": "childish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    }
  ]
}

text_to_speech Function

The text_to_speech function converts the provided text into speech using the specified Voice ID and streams it to the user via WebSocket.

async def text_to_speech(self, text: str, voice_id: str):

Parameters

  • text (str): The text to be converted into speech.
  • voice_id (str): The Voice ID to be used for speech synthesis.

Example 1: Basic Capability

This Ability creates a daily life advisor that:

  1. Asks the user for a problem: Initiates a conversation to gather user input.
  2. Provides advice: Offers a solution based on user input.
  3. Collects feedback: Captures user satisfaction with the advice.

Code

class BasicCapability(MatchingCapability):
    async def give_advice(self):
        await self.capability_worker.speak("Hi! I'm your daily life advisor. Tell me your problem.")
        user_problem = await self.capability_worker.user_response()

        solution_prompt = f"Provide a solution for: {user_problem}"
        solution = self.capability_worker.text_to_text_response(solution_prompt)

        user_feedback = await self.capability_worker.run_io_loop(
            solution + " Are you satisfied with the advice?"
        )
        await self.capability_worker.speak("Thank you for using the advisor.")
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        asyncio.create_task(self.give_advice())

Key Functions

  • speak: Introduces the advisor and provides the solution.
  • user_response: Captures user input (e.g., their problem).
  • run_io_loop: Combines speaking the solution and listening for feedback.
  • resume_normal_flow: Resumes the agent’s default workflow after interaction.

Example 2: Weather Capability

This Ability integrates a weather API to fetch and share weather updates based on user-provided locations.


Code

class WeatherDocsCapability(MatchingCapability):
    async def first_setup(self, location: str):
        if not location:
            await self.capability_worker.speak("Which location?")
            location = await self.capability_worker.user_response()

        geolocator = Nominatim(user_agent="weather_agent")
        loc = geolocator.geocode(location)

        if loc:
            result = requests.get(
                f"https://api.open-meteo.com/v1/forecast?latitude={loc.latitude}&longitude={loc.longitude}&current=temperature_2m"
            ).json()
            weather_report = f"The temperature in {location} is {result['current']['temperature_2m']}°C."
            await self.capability_worker.speak(weather_report)
        else:
            await self.capability_worker.speak("Invalid location. Try again.")
        
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        asyncio.create_task(self.first_setup(""))

Key Features

  • External API Call: Fetches real-time weather data.
  • Geolocation: Validates and processes user-provided locations.
  • Error Handling: Provides meaningful feedback for invalid inputs.

Conclusion

Building Abilities in OpenHome empowers developers to create custom functionalities for AI agents. With examples like the Basic Advisor and Weather Capability:

  • Understand core functions like speak, run_io_loop, and API integration.
  • Extend the platform to fit unique needs, from integrating external APIs to creating dynamic conversational flows.

Start creating innovative Abilities and push the boundaries of voice AI with OpenHome! 🎉

Note: It is recommended to use the requests module to call third-party APIs and avoid using other libraries. If any other library is needed for a special case, you can request us to add it.