Data Pipes

Shareable Simple Data Transformations

Data Pipes is an online service for doing simple data transformations on tabular data – deleting rows and columns, find and replace, filtering, viewing as HTML – and, furthermore, to connect these transformations together Unix pipes style to make more complex transformations.

Plus: as an online service to share your pipeline (and its output data) just copy/paste the URL.

Quick start

Example

To illustrate here's an example which shows the power of DataPipes. It shows DataPipes being used to clean up and display a raw spending data CSV file from the Greater London Authority.

http://datapipes.okfnlabs.org/csv/head -n 50/cut 0/delete 1:7/grep -i London/html?url=https://raw.github.com/okfn/datapipes/master/test/data/gla.csv

This does the following:

Here's what the output looks like:

API

The basic API is of the form:

/csv/{transform} {args}/?url={source-url}

For example, here is a head operation which shows first n rows or a file (default case with no arguments will show first 10 lines):

/csv/head/?url={source-url}

With arguments (showing first 20 rows):

/csv/head -n 20/?url={source-url}

Piping

You can also do piping, that is pass output of one transformation as input to another:

/csv/{trans1} {args}/{trans2} {args}/.../?url={source-url}

Input Formats

At present we only support CSV but we are considering support for JSON, plain text and RSS.

If you are interested in JSON support then vote here)

Query string substitution

Some characters can’t be used in a URL path because of restrictions. If this is a limitation (for instance if you need to use backslashes in your grep regex) variables can be defined in the query string and substituted in. E.g.:

/csv/grep $dt/html/?dt=\d{2}-\d{2}-\d{4}&url={source-url}

CORS and JS web apps

CORS is supported so you can use this from pure JS web apps.

Transform Operations

The basic operations are inspired by unix-style commands such head, cut, grep, sed but really anything a map function can do could be supported. (Suggest new operations here).

Contributing

Under the hood Data Pipes is a simple open-source node.js webapp living here on github.

It's super easy to contribute and here are some of the current issues.