Sending media messages through my chat bot

Posted on Feb 7, 2022
tl;dr: How I juggled media blobs through different contexts until I landed in the correct one

Background

If the first question after reading that was why or even how did I end up here? Then I don’t blame you. Just under 5 years ago I resumed working on a chatbot I was working on for a popular messaging service. The first version had stopped working because the third party library that reverse engineered the service’s API was not being maintained. I did not want to have that type of hard dependency this time, so I started looking at how I could go around doing it myself.

Ultimately, I landed on using the messaging service’s web client to send and receive messages through their JavaScript implementation. This way I would not have to end up reverse engineering and maintaining their protocol. I only had to hook into the necessary places on the client.

This version of the chatbot was built as a Chrome Extension. At the time it was the easiest way for me to get up and running, and a lot of the old logic from the old version was in JavaScript as well. So I started refactoring and building out the features from the ground up.

Very quickly I ran into my first challenge, and that was sending media messages through the web interface. Given the isolation built into web browsers there was a limit to what can be done programmatically through JavaScript. Automating the file selection at this stage was not an option as I had not started using Selenium. Many breakpoints and code steps later I was able to identify which functions prepared the media file and I knew what I needed to get started. I needed to have a File Object, essentially a Blob with a filename, filetype and a last modified date. I knew I can get a Blob of any media file by setting the responseType of an XMLHTTPRequest.

Content Security Policy (CSP)

Browsers implement a security control called Content Security Policy (CSP). The web server can set a CSP HTTP Header that will dictate what resources a client (the browser) can load. In my case it was the connect-src directive that was causing the issue. It restricts the URLS that can be loaded through client side JavaScript. If you try to access a resource that is not whitelisted you will get the following error:

CSP Error

Chrome Extension Script Types

When developing a Chrome Extension, you have the option to run your JavaScript as one of three types:

  1. Content Scripts
  2. Injected Scripts
  3. Background Scripts

If you have more than one script running in these different contexts, a message passing API is exposed to facilitate communication between them.

These script types differ by which context they run in and what they can and cannot access.

Content Scripts

These scripts run within the context of the web page and have access to read and write the DOM. However, they are limited by the fact that they cannot interact with the JavaScript that is running on the page. Additionally, they respect the Content Security Policy set by the web server, meaning they cannot access resources that are not whitelisted.

Injected Scripts

These scripts are injected using the Content Script (essentially modifying the DOM and adding a script element on the page). They operate within the same context as the Content Script, but with the added benefit of being able to interact with the JavaScript running on the page.

For my chatbot use case, this is where I needed to inject the code that will interact with the messaging service’s JavaScript implementation to send and receive messages.

Example Injected Script

// This code will inject variables and scripts into the web page
function injectVariables(variables, tag) {
	var node = document.getElementsByTagName(tag)[0];
	var script = document.createElement("script");
	script.setAttribute("type", "text/javascript");
	console.log("[Info]: Injecting variables");
	for (var i = 0; i < variables.length; i++) {
		script.textContent =
			"var " +
			variables[i].name +
			" = " +
			JSON.stringify(variables[i].value) +
			";";
	}
	node.appendChild(script);
}
function injectScript(file_path, tag) {
	var node = document.getElementsByTagName(tag)[0];
	var script = document.createElement("script");
	script.setAttribute("type", "text/javascript");
	script.setAttribute("src", file_path);
	node.appendChild(script);
}

injectVariables([{name: "extensionID", value: chrome.runtime.id}], "body");
injectScript(chrome.extension.getURL("web/inject.js"), "body");

Background Scripts

These scripts run in their own context, but do not have direct access to the web page. They can access any external resource that is whitelisted in the manifest.json, and the only way to interact with the Content / Injected Scripts is through message passing. This is where the core logic of the chatbot was implemented as it was a separate context and not tied to the web page itself.

The Challenge

The Content Security Policy defined for the messaging service’s web page was quite strict and meant that I could not simply download and send the media directly from my injected script. The idea was then to download the media within the context of the background script and somehow pass it back to the injected script to process and send.

Attempt 1

// Initial code to download the media to be sent 
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.responseType = "blob";
xhr.onload = function(e) {
    if (this.status == 200) {
        var random_name = Math.random()
            .toString(36)
            .substr(2, 5);
        var type = content_type
            ? content_type
            : xhr.getResponseHeader("Content-Type");
        file = new File([this.response], random_name, {
            type: type,
            lastModified: Date.now()
        });
        /* 

        The file above is what I can process and send through the bot

        */
    }
};
xhr.send();

The recommendation for sending this binary data through the message passing API was to use the createObjectURL function. So the updated approach was now:

  1. Download the media blob in the background script
  2. Call the createObjectURL function to generate the blob URL and pass it to the injected script.
  3. Download and process the blob file again from the injected script to send it.

Attempt 2

// Background Script 
var xhr = new XMLHttpRequest();
xhr.open("GET", link, true);
xhr.responseType = "blob";
xhr.onload = function(e) {
    if (this.status == 200) {
        var url = window.URL.createObjectURL(this.response);
        var message = {
            type: "send_media",
            url: url
        };
        chrome.tabs.sendMessage(tabs[0].id, message, function (response) {
            // Message has been sent to the injected script
        });
    }
};
xhr.send();


// Injected Script
var api = chrome.runtime.connect(extensionID, {
    name: "api"
});
api.onMessage.addListener((msg) => {
    if (msg.type === "send_media") {
        var xhr = new XMLHttpRequest();
        xhr.open("GET", link, true);
        xhr.responseType = "blob";
        xhr.onload = function(e) {
            if (this.status == 200) {
                var random_name = Math.random()
                    .toString(36)
                    .substr(2, 5);
                var type = content_type
                    ? content_type
                    : xhr.getResponseHeader("Content-Type");
                file = new File([this.response], random_name, {
                    type: type,
                    lastModified: Date.now()
                });
                /* 

                The file above is what I can process and send through the bot

                */
            }
        };
        xhr.send();
    }
});

Unfortunately, the above approach did not work. The injected script could not download the blob from the object URL due to the Content Security Policy. After several Google searches and going through the Chrome Extension documentation I could see that what I wanted to do was theoretically possible, but it would involve juggling the blob around. This is where the content script comes into play.

Attempt 3

Up until this point, the content script served only one purpose: injecting my JavaScript into the web page. However, it would soon have another purpose due to how it interacts with the Chrome extension context. Both content and injected scripts respect the CSP directives set by the web server, but, the content script can access resources from the extension context as well. What this means is that I can do the following:

  1. Download the media blob in the background script
  2. Call the createObjectURL function to generate the blob URL and pass it to the content script.
  3. Download the blob again, only this time inside the content script and call the createObjectURL function to generate a new blob URL. This time the URL will be in the same context of the web page as that is where the content script is running.
  4. Pass this new blob URL back to the background script so that it can be sent to the injected script to download, process and send.
// Background Script
var xhr = new XMLHttpRequest();
var msg_id = message_id ? message_id : null;
xhr.open("GET", link, true);
xhr.responseType = "blob";
xhr.setRequestHeader("Access-Control-Allow-Origin", "*");
xhr.onload = function (e) {
    if (this.status == 200) {
        var url = window.URL.createObjectURL(this.response);
        var message = {
            type: "media_url",
            url: url
        };
        chrome.tabs.query({
            url: "https://***********"
        },
            (tabs) => {
                chrome.tabs.sendMessage(tabs[0].id, message, function (response) {
                    if (response.type === "media_success") {
                        api.postMessage({
                            type: "send_media",
                            to: jid,
                            text: caption,
                            url: response.url,
                            msg_id: msg_id,
                            content_type: content_type
                        });
                    } else {
                        console.log("[ERROR]: Media Error " + JSON.stringify(response));
                    }
                });
            }
        );
    }
};
// Content Script
function injectVariables(variables, tag) {
	var node = document.getElementsByTagName(tag)[0];
	var script = document.createElement("script");
	script.setAttribute("type", "text/javascript");
	console.log("[Info]: Injecting variables");
	for (var i = 0; i < variables.length; i++) {
		script.textContent =
			"var " +
			variables[i].name +
			" = " +
			JSON.stringify(variables[i].value) +
			";";
	}
	node.appendChild(script);
}
function injectScript(file_path, tag) {
	var node = document.getElementsByTagName(tag)[0];
	var script = document.createElement("script");
	script.setAttribute("type", "text/javascript");
	script.setAttribute("src", file_path);
	node.appendChild(script);
}

injectVariables([{name: "extensionID", value: chrome.runtime.id}], "body");
injectScript(chrome.extension.getURL("web/inject.js"), "body");

chrome.runtime.onMessage.addListener(function(message, sender, sendResponse) {
	if (message.type === "media_url") {
		var x = new XMLHttpRequest();
		x.open("GET", message.url, true);
		x.responseType = "blob";
		x.onload = function() {
			if (this.status == 200) {
				var myurl = window.URL.createObjectURL(this.response);
				sendResponse({
					type: "media_success",
					url: myurl
				});
			} else {
				sendResponse({
					type: "media_fail"
				});
			}
		};
		x.send();
	}
	return true;
});

xhr.send();

// Injected Script
var api = chrome.runtime.connect(extensionID, {
    name: "api"
});
api.onMessage.addListener((msg) => {
    if (msg.type === "send_media") {
        var xhr = new XMLHttpRequest();
        xhr.open("GET", link, true);
        xhr.responseType = "blob";
        xhr.onload = function(e) {
            if (this.status == 200) {
                var random_name = Math.random()
                    .toString(36)
                    .substr(2, 5);
                var type = content_type
                    ? content_type
                    : xhr.getResponseHeader("Content-Type");
                file = new File([this.response], random_name, {
                    type: type,
                    lastModified: Date.now()
                });
                /* 

                The file above is what I can process and send through the bot

                */
            }
        };
        xhr.send();
    }
});

I had implemented a !pingm command in the chatbot that will select a random image from a list I had predefined and send it along with a “Pong” message. With all this in place, I opened up the app on my phone, sent the command to the bot and after a couple of seconds I got that Pong response and was able to finally put the juggling balls down.

Closing Thoughts

About a year ago I had written a POC which eliminated the Chrome Extension requirement using Python and Selenium. The script was injected directly along with a form that I can use to introduce the File Object I needed to send media. Theoretically, I could refactor the chatbot to work with this method, but I decided to take a different approach. I began decoupling certain parts of the bot into smaller microservices and kept the Chrome Extension component as a primary interface to the messaging service’s web client.

The first version of the bot ran for just about a year before the third party library stopped working with the latest version of the messaging service. The code was a nightmare, but I learned a lot when writing it. The current version will be running for 5 years this May (fingers crossed) and was basically rewritten based on what I learned the first time around. I am proud of where it is at today, going from a Chrome window that I had to manually set up to a collection of Docker containers that I can start with a single command thanks to projects like Selenium.