These slides: http://surf.leeds.ac.uk/presentations.html
Urban Analytics - very broad!
Focus on population flows and the ambient population
Background - smart cities, the data deluge and the ambient population
Machine learning approaches
Tracking people (HABITS)
Towards a real-time city simulation ...
Data assimilation
Agent-Based modelling
Abundance of data about individuals and their environment
"Big data revolution" (Mayer-Schonberger and Cukier, 2013)
"Data deluge" (Kitchin, 2013a)
Smart cities
cities that "are increasingly composed of and monitored by pervasive and ubiquitous computing" (Kitchin, 2013a)
This talk presents some LIDA projects that aim to make use of these data
We need to quantify the ambient population and better understand urban flows:
Crime – how many possible victims?
Pollution – who is being exposed? Where are the hotspots?
Economy – can we attract more people to our city centre?
Health - can we encourage more active travel?
Aim: to better understand footfall in Leeds city centre, using computer models and footfall camera data
Methods
Data analysis & visualisation
Machine learning
Outcomes
Analysis of changes in footfall patterns over time
A model of footfall, able to quantify the success of events
Indications of most important drivers
8 cameras installed between 2007 and 2009
Track movement of people through their field of vision
Provide counts of number of passers-by per hour
Cover a relatively small area of the city centre, not good for wider dynamics.
Saw an change in proportion of people using the city
Daytime (8am to 6pm) -> Evening (6pm to 9pm)
Night time (9pm to 8am) unchanged
Bank Holidays
What type of bank holiday?
School and University Holidays
The weather
Mean temperature, wind speed and rainfall
Day of week
Many others (not tested yet...)
Train prices, car parking availability, business opening times, etc.
Attempted a number of different methods
Mainly neural networks and random forests
Random forest was most accurate
Variable | Relative Importance |
---|---|
Mean daily temperature | 1142 |
Mean daily rainfall | 383 |
Monday | 131 |
2013 | 131 |
Saturday | 130 |
2016 | 130 |
After Trinity opened | 123 |
Thursday | 122 |
Tuesday | 116 |
School holiday | 115 |
Event | Date | Real Footfall | Prediction | Difference (%) |
---|---|---|---|---|
Tour de France | 5-Jul-14 | 346,180 | 217,277 | -37 |
Trinity Opening | 21-Mar-13 | 279,473 | 187,381 | -33 |
Xmas lights 2013 | 07-Nov-13 | 193,441 | 153,750 | -21 |
Xmas lights 2015 | 12-Nov-15 | 175,126 | 160,105 | -9 |
Light Night | 06-Oct-16 | 225,660 | 198,025 | -12 |
??? | 30-Dec-12 | 106,423 | 157,842 | +48 |
??? | 12-Apr-16 | 134,808 | 173,734 | +29 |
Still needs some refinement, but otherwise a useful tool
Potential for prospective analysis as well as forecasting
Not useful for more nuanced analysis
How should roads be re-configured to encourage pedestrians?
Where are most visitors coming from?
How have patterns of use in the city changed?
For this we need more detail about individual movements...
Aim: Take new 'Track and Trace' (T&T) data generated from mobile phones to support new policies to:
Reduce the disease burden of pollution
Encourage active travel
Lead by the Institute for Transport Studies, in collaboration with Newcastle City Council and funded by the ESRC
A more nuanced measure of population flows?
Smart-phone app built in collaboration with Newcastle City Council
Tracks peoples' journeys
Detects when the user is moving and estimates mode of travel
Rewards for using active / sustainable modes of travel
Aim: Demonstrate how the linking of high-resolution location data and other databases / models can support better policy making
Collaborating with the Newcastle Urban Observatory who are sensing the urban environment
Aim: use T&T data to model urban flows and identify the most serious pollution hotspots.
T&T data are
High resolution (spatio-temporal)
Abundant
BUT:
How representative of the wider population?
Abundant enough?
Large population coverage
Private, unknown methodology, privacy concerns, coarse resolution (?)
Smart-phone apps that capture movement / location are becoming ubiquitous
Great potential for understanding (some) urban dynamics
Skewness
Prolific users distort patterns
Representation
Online & public ≠ offline & private
Spatial accuracy
Bias
Participation inequality and the digital divide
Complicated!!
Messy, and "too big for Excel"
Volume
Potential for large sub-samples
Velocity
Streaming / regularly updated
Potential for dynamic models
Autonomous, interacting 'agents'
Model phenomena from the 'bottom-up'
Advantages:
Modelling complexity, non-linearity, emergence
Natural description of a system
Bridge between verbal theories and mathematical models
History of the evolution of the system
1. Computationally Expensive
Not amenable to machine-led calibration
2. Data hungry
Need fine-grained information about individual actions and behaviours
3. Divergent
Usually models represent complex systems
Projections / forecasts quickly diverge from reality
Complex systems
One-shot calibration
Nonlinear models predict near future well, but diverge over time.
Drawback with the 'typical' model development process
Waterfall-style approach is common
Calibrate until fitness is reasonable, then make predictions
But we can do better:
Better computers
More (streaming) data
Methodological gap
Used in meteorology and hydrology to constrain models closer to reality.
Try to improve estimates of the true system state by combining:
Noisy, real-world observations
Model estimates of the system state
Should be more accurate than data / observations in isolation.
1. Forecast.
Run an ensemble of models (ABMs) forward in time.
Calculate ensemble mean and variance
2. Analysis.
New 'real' data are available
Integrate these data with the model forecasts to create estimate of model parameter(s)
Impact of new observations depends on their accuracy
3. Repeat
Very simple ABM
People walking along a street
Every hour, x people begin at point A
CCTV Cameras at either end count footfall
Some people can leave before they reach the end (bleedout rate)
Aim: Estimate the number of people who will pass camera B
Use the model to first generate a hypothetical reality
Forecast and analysis are barely distinguishable
Virtual observations are closer to 'truth' than the analysis :-(
This is probably due to the degree of randomness in the model
EnKF estimates the model parameter (bleedout rate) accurately :-)
Aim: Create an agent-based model capable of representing the human flows in a real city.
Calibrated using streaming data dynamically
Hoping for European Research Council funding to continue the work
A sample of LIDA Urban Analytics projects
Machine learning approaches
Tracking people (HABITS)
Agent-Based modelling
Towards a real-time city simulation (surf) ...
These slides: http://surf.leeds.ac.uk/presentations.html