We’ve updated the PetaJakarta website with a video explaining how to add Tweets to the flood map.

Originally posted on The Big Boulder Initiative:

Twitter #DataGrants offer academics access to social data with the intention to change the world. At today’s panel, three researchers spoke on how they plan to use Twitter data to answer big questions around health, disaster response, and sentiment analysis and the best ways for the social data industry to work with academia at large to encourage new ideas, collaboration, and how to train the next generation of scientists to effectively use social data.

John Brownsten of Boston Children’s Hospital / Harvard Medical school plans to use Twitter Data to track foodborne illness, which generally goes unreported due to its fleeting presence. Tomas Holderness of the University of Wollongong will use Twitter data to track and test disaster response and decision making during annual flooding in Jakarta, Indonesia, so that future flood damage can be mitigated in real time. Finally, Mehrdad Yazdani of UCSD is using machine learning and artificial…

View original 210 more words

I recently published a short essay in the IEEE Technology & Society Magazine about the opportunities and research challenges that social media present as a data source for understanding complex urban systems in informal settlements.  The post below is a synopsis of the article posted on IdeaPod.

Social media, driven by the explosive uptake in mobile computing, has caused a systematic shift in personal communications on a global scale. From the Arab Spring to the Occupy Movement it is apparent that social media is becoming an integrated part of our global communication infrastructure. Critically, much of this information is underpinned by geographical content such as mobile GPS coordinates, which enable the user to tie their media to a specific location on the Earth’s surface. In this new paradigm, social media are effectively forming a human-powered sensor network.

PetaJakarta_FloodReport 2

As world populations continue to grow, and we face the social, climatic and economic challenges of the 21st century, how can we leverage the potential of this new global network of intelligence sensors? How can we use this data to inform us about the urban system and adapt to global change?

Article originally published in IEEE Technology & Society Magazine Spring 2014: http://ro.uow.edu.au/smartpapers/119/

Previously, in this blog post, I discussed the ways in which we’re tackling the infrastructure challenges in developing nations using open data. Below are the slides I presented at the first International Symposium for Next Generation Infrastructure. The work presented is a proof-of-concept model using data from Map Kibera to optimise a road-based sewage network. The great thing about using this data is that for the first time we can glean an insight into infrastructure provision in informal urban settlements, and examine methods to improve it.

The world is becoming more urban – so how do we ensure our infrastructure meets the social, economic and climatic challenges of the 21st century? This was the theme of the recent International Symposium for Next Generation Infrastructure (ISNGI) hosted by the SMART Infrastructure Facility, at the University of Wollongong. The symposium highlighted some of the great research going on in Australia and around the world to understand how we can make our cities and their infrastructure sustainable for future generations.

IWA Poster

Sanitation Network Modelling Poster

But what about developing nations? How do we model infrastructure when there’s no data? Or when the system is changing so fast that traditional data collection techniques become redundant? How do we quantify the infrastructure requirements of a slum when it’s  population fluctuates by 800,000 people annually? More importantly, how do you engage with that community to understand their needs?

One solution is to use open data and open tools. The world is becoming more connected, and crowd-sourced data offer, for the first time, an insight into infrastructure in some of the world’s poorest cities and informal settlements which have never before been mapped. The Map Kibera project is a really great example of this. In collaboration with colleagues from the UK, we built a prototype model to demonstrate the utility of data from Map Kibera and Open Street Map for spatio-topological network modelling, to optimise road-based sanitation for Kibera. I presented this work at ISNGI, and Ruth recently presented a poster of this work at the International Water Association Congress and Exhibition in Nairobi. We’ve demonstrated it’s possible – the challenge now is to make it work in the real world.

I recently discovered the GeoAlchemy2 project – a replacement for the original GeoAlchemy package, focused on providing PostGIS support for SQLAlchemy. The SQLAlchemy package is a “Python SQL Toolkit and Object Relational Mapper”. In a nutshell this means you can write Python classes and map them to PostgreSQL tables without the need to write SQL statements – pretty cool!

PostGIS is great for doing spatial stuff, but if you’re using it as back-end for a Python app then you can spend a lot of time writing Python wrappers around SQL statements, and even with the excellent Psycopg2 package this can be tricky. This is especially true if you’re using the OGR Python bindings to handle PostGIS read/writes.

Enter GeoAlchmey2. I’ve been experimenting with it for a week, in that time I’ve learnt this:

For developing geospatial Python apps with PostGIS, GeoAlchemy2 is nothing short of revolutionary.

You can call PostGIS functions in Python, which means you can use them (and the data) directly within your Python application logic. Here’s an example. The SQL statement below uses PostGIS to create a new line geometry between a point, and the closest point on the nearest line.

Now here’s a snippet from a Python script, performing the same process using GeoAlchemy2.

You’ll notice here that we’re actually calling our own Python function “make_link_line” during the query to create the new geometry. This exemplifies how we can move PostGIS objects around inside the script. Once the query runs we can access the returned data in our application from the row variable. Below is the complete script.

Nearest neighbour PostGIS and GeoAlchemy2 script:

The script above is just a simple example, but it shows how powerful GeoAlchemy2 is for embedding PostGIS objects and methods inside Python. I’m really looking forward to digging deeper into the functionality of GeoAlchemy2 and SQLAlchemy to integrate them within my own projects. Check out the official tutorials for more examples: https://geoalchemy-2.readthedocs.org/en/latest/#tutorials

As part of my PhD I had to produce a land cover map for the Greater London area. I derived a simple land cover classification using the UKMap Basemap (which I previously used to generate the 3D London map). Click on the image below to see a larger version (1.8Mb at 300dpi).

London land cover map

Land cover in the Greater London area

I created the layers using PostGIS tables for each land cover type, based on the Basemap’s Feature Type Code (FTC), which classifies land use based on the National Land Use Database. Using separate tables also significantly improved rendering performance in Quantum GIS (QGIS), which I used for the cartography. I was impressed by QGIS’ ability to process and render such a detailed data-set (the Basemap contains ~11 million polygons for London).


As I had a lot of spare time on my hands before Christmas I decided to build a bookcase as a present for my girlfriend (an avid book reader and collector). The design is based on ‘nomad style’ furniture, which doesn’t require nails or glue – I really liked the idea that it can be dismantled and easily transported. I got my inspiration from instructables.


The finished product

The bookcase dimensions are 90x65x21cm, based on my measurements of existing bookshelves we had. The biggest challenge was finding a timber merchant who stocked plain square edged (PSE) wood that was deeper than 19cm, I think anything less is a bit shallow for most books.

Apart from drilling holes to help chisel some of the joints, I managed to build it without any other power tools. The lid sits on four dowel pegs, and I finished the whole thing with a couple of coats of boiled linseed oil, which provided a lovely finish. I’m happy to say it was well received on Christmas day!

Here are some more photos I took during construction.

Searching for knot-free PSE.

Searching for knot-free PSE.

Making the first cut

Is it straight?

Work in progress

Work in progress

Cutting the first shelf tenon joint

Cutting the first shelf tenon joint

The finished shelves

The finished shelves

Cutting the holes in the uprights

Cutting the holes in the uprights

Testing the shelves for fit

Testing the shelves for fit

A surveyor's dream.

A surveyor’s dream

Plotting confidence intervals of linear regression in Python

After a friendly tweet from @tomstafford who mentioned that this script was useful I’ve re-posted it here in preparation for the removal of my Newcastle University pages.

This script calculates and plots confidence intervals around a linear regression based on new observations. After I couldn’t find anything similar on the internet I developed my own implementation based on Statistics in Geography by David Ebdon (ISBN: 978-0631136880).

Linear regression plot

Plot of linear regression with confidence intervals

# linfit.py - example of confidence limit calculation for linear regression fitting.

# References:
# - Statistics in Geography by David Ebdon (ISBN: 978-0631136880)
# - Reliability Engineering Resource Website:
# - http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm
# - University of Glascow, Department of Statistics:
# - http://www.stats.gla.ac.uk/steps/glossary/confidence_intervals.html#conflim

import numpy as np
import matplotlib.pyplot as plt

# example data
x = np.array([4.0,2.5,3.2,5.8,7.4,4.4,8.3,8.5])
y = np.array([2.1,4.0,1.5,6.3,5.0,5.8,8.1,7.1])

# fit a curve to the data using a least squares 1st order polynomial fit
z = np.polyfit(x,y,1)
p = np.poly1d(z)
fit = p(x)

# get the coordinates for the fit curve
c_y = [np.min(fit),np.max(fit)]
c_x = [np.min(x),np.max(x)]

# predict y values of origional data using the fit
p_y = z[0] * x + z[1]

# calculate the y-error (residuals)
y_err = y -p_y

# create series of new test x-values to predict for
p_x = np.arange(np.min(x),np.max(x)+1,1)

# now calculate confidence intervals for new test x-series
mean_x = np.mean(x)			# mean of x
n = len(x)				# number of samples in origional fit
t = 2.31				# appropriate t value (where n=9, two tailed 95%)
s_err = np.sum(np.power(y_err,2))	# sum of the squares of the residuals

confs = t * np.sqrt((s_err/(n-2))*(1.0/n + (np.power((p_x-mean_x),2)/

# now predict y based on test x-values
p_y = z[0]*p_x+z[0]

# get lower and upper confidence limits based on predicted y and confidence intervals
lower = p_y - abs(confs)
upper = p_y + abs(confs)

# set-up the plot
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Linear regression and confidence limits')

# plot sample data
plt.plot(x,y,'bo',label='Sample observations')

# plot line of best fit
plt.plot(c_x,c_y,'r-',label='Regression line')

# plot confidence limits
plt.plot(p_x,lower,'b--',label='Lower confidence limit (95%)')
plt.plot(p_x,upper,'b--',label='Upper confidence limit (95%)')

# set coordinate limits

# configure legend
leg = plt.gca().get_legend()
ltext = leg.get_texts()
plt.setp(ltext, fontsize=10)

# show the plot

Using NodeJS to develop a temperature server with the Raspberry Pi

The Raspbian Linux distribution for the Raspberry Pi includes some useful kernel drivers for accessing devices connected to the Pi’s GPIO pins. Based on the tutorial from the University of Cambridge Computer Laboratory I’ve been playing around with the DS18B20 digital thermometer on the Pi. I’ve connected the sensor to the Pi using a standard electronics breadboard and the excellent Adafruit Pi Cobbler breakout connector kit (kudos to Mills for her excellent soldering skills). When the required GPIO kernel modules are loaded a file containing sensor output is written to the /sys/bus directory (see tutorial link above for more) which contains the current thermometer reading.

Raspberry Pi & DS18B20

Raspberry Pi & DS18B20 digital thermometer

Developing with NodeJS

I originally wrote a Python CGI script as part of the CPC Pi Hack event to parse the sensor file and display the temperature on a web page (although we didn’t submit our hack in the end), but I’ve also been looking for a project for a while to try out NodeJS.The result is a prototype JavaScript server/client app to serve temperature from the Pi as a JSON string and display a graph of current temperature on the client.

This was my first NodeJS app and I was impressed with the speed of development and the readability of documentation/examples – so much so that I managed to write the bulk of the server on my netbook during a flight from Leeds to London! The biggest disadvantage of using NodeJS on the Pi is the time it takes to compile (1hr 58 minutes, Pi CPU clocked at 950MHz). While this may put some developers off, the build process was painless and is more than made up for by the efficient asynchronous nature of NodeJS once you get it running (first-pass testing shows no noticeable increase in memory/CPU when the server is under load although I’ve not investigated this thoroughly).

The server code is divided into two parts: a dynamic server response which is called when a request for the “temperature.json” URL is received. The server reads the sensor file, parses the data and returns the temperature with a Unix time-stamp in JSON notation. Here’s a snippet from the server code parsing and returning the temperature data:

// Read data from file (using fast node ASCII encoding).
var data = buffer.toString('ascii').split(" "); // Split by space

// Extract temperature from string and divide by 1000 to give celsius
var temp = parseFloat(data[data.length-1].split("=")[1])/1000.0;

// Round to one decimal place
temp = Math.round(temp * 10) / 10

// Add date/time to temperature
var jsonData = [Date.now(), temp];

// Return JSON data
response.writeHead(200, { "Content-type": "application/json" });
response.end(JSON.stringify(jsonData), "ascii");

The second section of the server uses the node-static module to serve a client-side page which performs an AJAX call for “temperature.json” and plots current temperature. The plot is created using the highcharts JavaScript package to create a dynamic graph which moves along the x-axis over time (check out the highcharts demo page to get a better idea of dynamic charts). One thing to note is that the sensor precision is ±0.5 °C, and while the temperature data is rounded to one decimal place, the default highcharts y-axis precision may be a bit misleading. Overall though the highcharts package is pretty slick, and the plot looks great on my new Nexus 7!

Temperature Plot

Raspberry Pi Temperature Sensor Plot

I’ve pushed the code to GitHub as it may be useful to others who are also new to the Pi and NodeJS. One of the features of NodeJS I like the most is the ease of testing and deployment – you can run the server right in the terminal window without super-user permissions and get debugging info straight away. Furthermore, given the ease of creating web-apps and the small resource footprint (obviously depending on what you’re doing) I’ll definitely be looking to use NodeJS/JavaScript as development platform for the Pi in the future.


Get every new post delivered to your Inbox.