ML in production: Build your own API from scratch

5th November, 2018

Serving ML models using custom Flask API

Scikit-learn is the most commonly used library in Python to train and validate predictive models using a uniform syntax. Trained models can be persisted / saved for later use in an effort to avoid retraining the model every time they are used as training models is often a time consuming task. Many predictive modelling efforts fail to deliver on ‘the promise of analytics’, due to difficulties integrating the model (outcomes) in the broader business, despite excellent modelling efforts and performance. An API (Application Programmatic Interface) to make your model available to a wider audience may suit your needs very well in this case.

APIs, or simply a web services, are the building blocks of many (large scale) web applications and allow parts of computer code to exchange information. This may sound very abstract, but you can think of an API, in the context of this article, as nothing more than a web address that takes information as input, executes some code using those inputs and (may) return information as output. We can take our trained machine learning model and serve it as part of an API, so that users can interact with the model by sending it input data they would like predicted values for, returned by the API. The massive advantage of serving models in an API is the flexibility it brings. Serving as a web service means the model is available completely ‘implementation-agnostic’, at all times and you will be able to integrate the web service in any business process. For instance, a web service can be called from Excel using some simple VBA, but can also easily be integrated into any other web based application, mobile app or visualisation tool such as Tableau.

A Flask application is one of many ways you can productionize your ML models as an API, but the flexibility of Flask as a general purpose web application framework makes it an all-inclusive option that is worth considering for many problem sets. For those ML models that are run periodically on a batch, there may be completely different requirement. Think of an application where you have trained a sophisticated Text Categorization model to classify calls to your customer service department as compliant / non-compliant and every day you are interested in classifying the calls from that day; your path to production will look very different from the request-response service you might need when you want to approve or decline a mortgage (in principle) based on salary details and financial history which needs to be available (publicly) at all times, ready to be queried.

There is no better way to get to grips with the concept of serving ML models as part of a Flask application than doing this by example and in order to do so, we will first create two predictive models using the German Credit dataset before deploying them as part of a custom web service in Flask. Please keep in mind that the trained model below is terrible from the perspective of any chosen performance measure, but it is a trained model nevertheless; the focus is on serving the model as a web service.

First we complete the necessary imports and acquire the data.

For this particular application we build our predictive model using only three predictors: ‘duration’, ‘amount’ and ‘age’.

We train both a logistic regression model and a random forest model, don’t bother generating training and test dataset, nor do we compute any performance metrics; we are just interested in the trained models themselves.

Using joblib we can save the trained models to the local disk. This allows us to use these models later. This can be very useful in general, beyond API deployments. We could load the model into another Python script and feed it data, for whatever purpose.

Flask is a general purpose microframework to write web services and allows entire applications to be built around generating predictions. We will build a very simple Flask application with only one endpoint that uses both the trained and stored models from the training script.

First we import the necessary libraries, initiate the Flask application and make available the models to our application by reading from the disk.

We define several helper functions that, respectively, put together the input as dataframe, generate predictions and format output.

We define our first and only endpoint that takes four parameters; the three inputs our models expect and a string that we use to decide which of the two predictive models to query. This immediately shows the flexibility of a custom API, where we can code any logic to be part of an endpoint.

When you run, the web service it is now available on the web address ‘’, which is your own machine. Naturally this does not make the model available to a wider audience. For that you’d need to take the next steps and serve the model on a publicly accessible server. When you do that, security, authentication, and performance become relevant. The following stack is commonly used one and may or may not be coupled with Docker: Nginx, Gunicorn and Supervisor. The web application we have written above would sit in the Flask box in the diagram, and the training script sits outside the diagram as its only run to generate the trained models.

Making sure ML models are widely accessible is a bottleneck that limits the potential impact of Data Science projects. Depending on the nature of the problem the model is trying to solve and assist in, it may be necessary model the model to be available for queries ‘at all times’. If so, then a request-response web application is the answer in many cases. These come in many shapes and sizes, and a custom-built Flask application is certainly a powerful one.

The flexibility of Flask also allows you to build slightly more extensive applications, beyond just serve the trained models. You can think of an endpoint that re-trains a particular model, an endpoint that takes a .csv file as input and predicts all instances, logs queries to a database to inspect usage of the model or an endpoint that provides some model performance statistics the user may be interested in.

As a very quick demo case, we have written simple, native VBA code to integrate our API in Excel. When a user clicks the button, it picks up the four input fields from the Excel sheet, send them as input to our API endpoint, and displays the response (JSON) as a message box in Excel. Remember that the endpoint we have created generates single predictions, but equally we could have written VBA code to take an entire ‘table’ to be predicted and adjusted our endpoint accordingly.


Forecast Analytics Europe

Neil Macdonald (Director)

+44 7570 961 716


8 St. James’s Square

St. James’s

London SW1Y 4JU


6-8 Dewar Place Lane

Edinburgh EH3 8EF

Forecast Australia

Greg Norman (Director)

+61 435 863437


94 Jones Bay Wharf

26-32 Pirrama Rd

Pyrmont NSW 2009


401 Collins Street

Melbourne VIC 3000