Upload to Dataiku DSS in a Webapp#

Out of the box, Dataiku DSS offers datasets and managed folders in which users can upload files from their local machines, and then build their Flow on this uploaded data. But for users with no access to the Flow, or for automated tasks, manually uploading to a dataset or a managed folder in the UI can be out of the question.

Webapps can fill this gap by letting users upload files from their machine to Dataiku DSS, and trigger actions from the webapp.

Shiny offers a fileInput widget, but Python APIs for DSS’s managed folders are more convenient to use. Since Bokeh requires coding the uploading widget yourself, the simplest option is to use the simplest webapp type, namely standard webapps, that is, a combination of HTML/JS for the frontend and Flask for the backend.

Basic Upload#

The simplest version is an HTML form with a <input type="file" /> field, one Ajax call to send the data, and a route in the Flask app to forward the data to a managed folder via the Python API.

In a standard webapp which has jquery, dataiku api, and bootstrap javascript libraries (see Settings tab), the frontend is:

<form style="margin: 20px;">
    <div class="form-group" id="fileGroup">
        <label for="newFile">Select file</label>
        <input class="form-control" id="newFile" type="file" />
    </div>
    <button id="uploadButton" class="btn btn-primary">Upload to DSS</button>
</form>
$('#uploadButton').click(function (e) {
    e.preventDefault();
    let newFile = $('#newFile')[0].files[0];
    let form = new FormData();
    form.append('file', newFile);
    $.ajax({
        type: 'post',
        url: getWebAppBackendUrl('/upload-to-dss'),
        processData: false,
        contentType: false,
        data: form,
        success: function (data) { console.log(data); },
        error: function (jqXHR, status, errorThrown) { console.error(jqXHR.responseText); }
    });
});

The Ajax call targets a route /upload-to-dss in the Flask backend, whose code is:

import dataiku
from flask import request

@app.route('/upload-to-dss', methods = ['POST'])
def upload_to_dss():
    f = request.files.get('file')
    mf = dataiku.Folder('box') # name of the folder in the flow
    target_path = '/%s' % f.filename
    mf.upload_stream(target_path, f)
    return json.dumps({"status":"ok"})

This produces a UI like:

../../../../_images/Screenshot-2020-03-25-at-16.04.38.png

Select file, hit the upload button, and voilà, the file is now in the managed folder (named box), ready to be used in a Flow or by the webapp itself!

Adding Parameters#

Just sending a file to the Python backend is often not enough, and additional parameters might be needed. To add a field to the form and retrieve its value in the backend, add:

 ...
 <div class="form-group" id="paramGroup">
     <label for="someParam">Some param</label>
     <input class="form-control" id="someParam" type="text" />
 </div>
...
...
let form = new FormData();
form.append('file', newFile);
form.append('extra', $('#someParam').val())
...
...
extra_param = request.form.get('extra', '')
f = request.files.get('file')
...

Simple UI Improvements#

In order to make the upload a bit more pleasant, you can tweak the html/js to add drag & drop on the form field:

$('#newFile').on('dragover', function(e) {
    e.preventDefault();
    e.stopPropagation();
});
$('#newFile').on('dragenter', function(e) {
    e.preventDefault();
    e.stopPropagation();
    $("#newFile").css("opacity", "0.5")
});
$('#fileGroup').on('dragleave', function(e) {
    e.preventDefault();
    e.stopPropagation();
    $("#newFile").css("opacity", "")
});
$('#newFile').on('drop', function(e){
    $("#newFile").css("opacity", "")
    if(e.originalEvent.dataTransfer && e.originalEvent.dataTransfer.files.length) {
        e.preventDefault();
        e.stopPropagation();
        $("#newFile")[0].files = e.originalEvent.dataTransfer.files;
    }
});

If files to upload can be large, it’s a good idea to give some user feedback on the progress of the upload, to not give the impression nothing is happening. A simple progress bar would be:

let stopUpload = function() {
    $("#progress").remove();
};

let startUpload = function() {
    stopUpload();
        let progress = $('<div id="progress"/>').css("height", "10px").css("margin", "10px 0").css("background","lightblue");
    $('#fileGroup').append(progress);
};
...
$.ajax({
    type: 'post',
    ...
    xhr: function() {
        startUpload();
        var ret = new window.XMLHttpRequest();
        ret.upload.addEventListener("progress", function(evt) {
          if (evt.lengthComputable) {
            var pct = parseInt(evt.loaded / evt.total * 100);
            $('#progress').css("width", "" + pct + "%");
          }
        }, false);
        return ret;
    },
    complete: function() { stopUpload(); }
});

After that, improvements could be feedback post-upload, styling, etc.

What’s Next?#

To learn more about HTML/JS, Python Bokeh, and R Shiny webapps in Dataiku DSS, visit the Dataiku Academy for tutorials and examples.

Here are the complete versions of the code presented in this tutorial:

HTML Code
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-GLhlTQ8iRABdZLl6O3oVMWSktQOp6b7In1Zl3/Jr59b6EGGoI1aFkw7cmDA6j6gD" crossorigin="anonymous"></link>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/js/bootstrap.bundle.min.js" integrity="sha384-w76AqPfDkMBDXo30jS1Sgez6pr3x5MlQ1ZAGC+nuZB+EYdgRZgiwxhTBTkF7CXvN" crossorigin="anonymous"></script>

<form style="margin: 20px;">
    <div class="form-group" id="fileGroup">
        <label for="newFile">Select file</label>
        <input class="form-control" id="newFile" type="file" />
    </div>
    <div class="form-group" id="paramGroup">
        <label for="someParam">Some param</label>
        <input class="form-control" id="someParam" type="text" />
    </div>

    <button id="uploadButton" class="btn btn-primary">Upload to DSS</button>
</form>
JS Code
$('#uploadButton').click(function (e) {
    e.preventDefault();
    let newFile = $('#newFile')[0].files[0];
    let form = new FormData();
    form.append('file', newFile);
    form.append('extra', $('#someParam').val())
    $.ajax({
        type: 'post',
        url: getWebAppBackendUrl('/upload-to-dss'),
        processData: false,
        contentType: false,
        data: form,
        success: function (data) { console.log(data); },
        error: function (jqXHR, status, errorThrown) { console.error(jqXHR.responseText); },
        xhr: function() {
            startUpload();
            var ret = new window.XMLHttpRequest();
            ret.upload.addEventListener("progress", function(evt) {
                if (evt.lengthComputable) {
                    var pct = parseInt(evt.loaded / evt.total * 100);
                    $('#progress').css("width", "" + pct + "%");
                }
            }, false);
            return ret;
        },
        complete: function() { stopUpload();}
    });
});

$('#newFile').on('dragover', function(e) {
    e.preventDefault();
    e.stopPropagation();
});
$('#newFile').on('dragenter', function(e) {
    e.preventDefault();
    e.stopPropagation();
    $("#newFile").css("opacity", "0.5")
});
$('#fileGroup').on('dragleave', function(e) {
    e.preventDefault();
    e.stopPropagation();
    $("#newFile").css("opacity", "")
});
$('#newFile').on('drop', function(e){
    $("#newFile").css("opacity", "")
    if(e.originalEvent.dataTransfer && e.originalEvent.dataTransfer.files.length) {
        e.preventDefault();
        e.stopPropagation();
        $("#newFile")[0].files = e.originalEvent.dataTransfer.files;
    }
});

let stopUpload = function() {
    $("#progress").remove();
};

let startUpload = function() {
    stopUpload();
        let progress = $('<div id="progress"/>').css("height", "10px").css("margin", "10px 0").css("background","lightblue");
    $('#fileGroup').append(progress);
};
Python Code
import dataiku
from flask import request

@app.route('/upload-to-dss', methods = ['POST'])
def upload_to_dss():
    extra_param = request.form.get('extra', '')
    f = request.files.get('file')
    mf = dataiku.Folder('box') # name of the folder in the flow
    target_path = '/%s' % f.filename
    mf.upload_stream(target_path, f)
    return json.dumps({"status":"ok"})