web analytics
Press "Enter" to skip to content

Analytic Integrations in Composable: Algorithmia

Micah Lee

A core capability of Composable is your ability to integrate a variety of analytics together no matter how they’re built or where they’re located. We have many analytics tools available directly within Composable, but you also have the flexibility to bring in analytics from anywhere.

One growing source of analytics and algorithms is the online service Algorithmia. Algorithmia brings together many open source and commercial algorithms with a pay-per-use model. While these algorithms are available to execute, you still typically need to handle data preparation and analytic coordination yourself. With Composable, you can easily and visually design the orchestration for analytic processing using Algorithmia and with your own analytics!

Algorithmia offers a free usage tier, which you can use to try each of these examples yourself in the hosted Composable application, or in your locally installed instance of Composable. A link to the final Composable Dataflow example we will build is available at the end of the article.

Getting Started

The first step is to create an account with Algorithmia, this will give you access to the library of analytics and, more importantly, your API authorization key. You need the API key in order to execute these analytics from Composable.

After registering your account and logging in, you can access your API key by going to https://algorithmia.com/getting-started. Your key will be displayed at the top of the page.

You can browse through all of the algorithms available at https://algorithmia.com/algorithms. In our example, we are going to focus on performing text analysis of documents to summarize them, automatically extract tags or topics, and determine the overall sentiment of the author. Wikipedia will be our document source, and we will use Composable to orchestrate each of the analytic steps and provide a single result from our process.

Extracting the Document Text

The first step in our process will be to extract just the text of the document for analysis. The technique for this will vary based on the type of document. Since we are using Wikipedia, we will start with an Algorithmia tool that takes a URL as input and returns just the extracted text from the web page: https://algorithmia.com/algorithms/util/Html2Text.

In this first stage of a Composable Dataflow, we’re setting up the basics for processing documents and connecting to Algorithmia. The primary module is a WebClient module, which will responsible for making the call to the algorithm hosted by Algorithmia. The inputs our set as:

  • Uri
    The algorithm endpoint URL that can be found on the landing page in Algorithmia. Here it is https://api.algorithmia.com/v1/algo/util/Html2Text/0.1.4.
  • Method and Content Type
    The method for Algorithmia is always POST and the content type is always application/json.
  • Headers
    We need to add a header for the API key. We use a Key Value Pair module for this. The key should be “Authorization” and the value is “Simple {YOUR API KEY FROM BEFORE}”.
  • Input
    The input format for algorithms varies, but can always be found in the algorithm’s documentation on Algorithmia. In this case, it is just the URL to extract text from. This must be in JSON format, so we quickly format it with the Convert to JSON module before connecting it to the input.

When we run this data flow, we see the output is a simple JSON document containing the document text extracted from the HTML markup, along with some metadata about the algorithm execution. Now we are ready to start doing some analysis of this text.

Adding the First Analytic

The first analytic step we’ll add is generating a summary of the document text for review. We will use a Natural Language Processing (NLP) tool in Algorithmia for this: https://algorithmia.com/algorithms/nlp/Summarizer.

We need to grab just the extracted text from the previous module result. We will use the JSONPath Query module for this, and because we need it quoted to be proper JSON for the next input, we’ll again use the Convert to JSON module on the string of text.

Next we will add another WebClient module, configure it with the next algorithm endpoint URL and connect the same Authorization header we created before.

When we execute this updated dataflow, we can review the summary produced by the algorithm.

All the Analytics!

Let’s go ahead and add a couple more analytics for generating suggested tags or topics from the document, as well as determine the overall sentiment of the document.

These work the same as the other algorithms we’ve added so far. The one distinction is that the sentiment analysis algorithm takes a JSON document with one entry for document rather than the just the text string itself.

Mixing and Matching

One thing you may notice about the sentiment analysis result is that it returns a decimal number as the result (e.g. 0.01828676). We can learn from the algorithm document that the result is number from -1.0 (most negative sentiment) to +1 (most positive sentiment). In our data flow, we want to post-process this result to provide a human-readable description of the sentiment. We can easily add this step with an inline Code Module in Composable.

We’ll add a simple code statement that will take this decimal input and turn it into a text description of the sentiment using five sentiment levels.

Putting It All Together

Being able to connect and orchestrate all of these algorithms is great, but how do we make all of this useful as a single dataflow? To make this analytic composition reusable by others, we will put all of the analytic outcomes into a single data result.

To do this, we will use the JSON processing and construction modules in Composable to assemble a single JSON document with all of the analytic processing results.

Now we have the results in format that we can more easily review and use in further downstream processing, such as:

  • Tracking sentiment of a document over time, and potentially alerting someone if the sentiment drops too low.
  • Producing a nicely formatted PDF report and emailing it to decision makers in your company.
  • Saving the results in a new document catalog database for your enterprise.

Or many other possibilities using the capabilities and integrations available in Composable.

You can try out this final data flow for yourself in Composable and use your imagination to explore other possibilities in Composable and with Algorithmia. Come back and let us know what you discover!

Micah Lee

Micah Lee

Micah is an experienced, hands-on software architect involved with large-scale distributed software systems and web-based applications for data analysis. Micah comes from MIT Lincoln Laboratory, where he was a core contributor and leader for many complex software design and development efforts across a variety of homeland security and disaster response programs, including the Composable Analytics platform.
Micah Lee

Latest posts by Micah Lee (see all)