Why? and How?
E-scooters started to clutter the streets, and unfortunately also the sidewalks, of Lisbon. For a test ride after a late dinner, I installed the application of Circ (formerly Flash). The disappointment was huge after the scooter did not pick up more speed than I could go myself on foot. Something was certainly off with the battery, engine or power circuit and I was not happy about the 1 Euro spent for unlocking it.
The next day I decided to have a closer look at their iOS application. I used mitmproxy to intercept the traffic between my phone and the Circ API while using the app. One of the most interesting requests happened when I scrolled on the map so see scooters to rent around me.
GET https://api.goflash.com/api/Mobile/Scooters
The request requires the following query parameters
userLatitude
anduserLongitude
define the user location.latitude
andlongitude
give the center around which the scooters are searched.latitudeDelta
andlongitudeDelta
define the search area around the center.
A typical response (shortened to the important information) looks like the following
{
"Result": "OK",
"Data": {
"Scooters": [
{
"idScooter": 15171,
"idCity": "LIS",
"ScooterCode": "549858",
"idScooterState": "DEPLOYED_FOR_RENTAL",
"PowerPercent": "40%",
"RemainderRange": "15 km",
"ScooterModel": "Circ B1",
"txtRentalPrice": "1€ to unlock + 0.25€ per minute",
"Locked": true,
"location": {
"latitude": 38.702908,
"longitude": -9.161877
}
}
]
}
}
With this information it is possible to follow the GPS coordinates of a given scooter via idScooter
throughout the day and see where it was going. The scooter only has to be found! To not deal with rate limitation of the API endpoint Scooters
(15 requests per minute) an authentication header with a bearer token can be send with the request. The bearer token can be obtained from the endpoint
GET https://api.goflash.com/api/Mobile/UserHello
with the query parameter deviceKey
, which can be easily obtained with mitmproxy
while using the app (more details here).
For retrieving all scooters in Lisbon, a box of approximately 5km x 5km is created and then discretized in 10 x 10 grid points, which are used for latitude
and longitude
in the API request. The grid discretization is relatively small since I did not know what is the maximum value for latitudeDelta
and longitudeDelta
. The value for both coordinate deltas was set to 0.05
, which corresponds to the whole grid size. In the case, the API allows such a big search area, scooters of several grid points are retrieved in one request. That means duplicates need to be taken care of.
To collect the data I created a Django application with Celery to record the positions of all scooters every 20 minutes. Why Django? Because it provides out of the box an admin interface to display the created models and their data. By adding a few lines of JavaScript to the admin panel, I was able to verify the logged scooter locations on a map.
The repository with the full Django project can be found here
Note: I was also trying to retrieve similar information for Lime scooters. However, the Lime API does not return a unique Scooter ID. For every request the returned scooter IDs are randomized (and internally they map them to their real IDs), which makes it impossible to track individual scooters.
Results of analyzing the data
The data was collected from the 18th of August for 111 days every 20 minutes. The data was processed and analyzed using a Jupyter notebook and the libraries pandas, matplotlib and folium.
Finding the GPS accuracy.
First, the data was analyzed by removing all scooter trips that were <20m in travel distance, due to the GPS accuracy of IoT devices and scooters being pushed around for fun. The results were rather odd:
The number of trips in October increases ~4x compared to the period before. On the other side, the kilometers per day are not increasing by the same factor. That means that most of the additional trips are very short. I was running the same analysis, but removing all trips with distances shorter than 100m. The relation between the number of trips and kilometers per day looks more coherent now. I cannot conclude why from October on the number of very short trips (<100m) has strongly increased. I can only assume the GPS positioning of the IoT boards started to be less accurate (updates?, power-saving?) or Circ started a marketing campaign that would give away free unlocks.
In the following, I will continue by removing trips with less than 100m distance traveled from the data, since the ratio trips/kilometers
should stay constant over time.
The remaining data includes:
- 1793 scooters
- 34671 scooter trips recorded
- 63322 km of scooter movement
The trip distance distribution in figure 3 shows, that most trips (~50%) are shorter than 1km. Less than 30% of all trips of the data set are longer than 2km.
The distribution in figure 4 shows that 70% of the recorded scooters did not drive more than 45km during the data set recording.
From the number of new scooters added in figure 5 it can be estimated, how many scooters need to be replaced due to technical problems and/or vandalism. The spike of new scooters added just before November can be explained by the start of the WebSummit on the 4th of November.
The final result is a heatmap to show the most common places where trips with scooters are started or finished, accumulated for all recorded days.
The heatmap is also available showing the start and end location of the scooters day by day. This figure visualizes how the number of scooters in the center is getting bigger over time.
Final words
This was just a superficial analysis of the recorded data. There could be more analysis on certain spikes in trips/kilometers and new scooters and relate them to local events. Also, the influence of weather could be analyzed. This data could be also very useful for city halls to understand the impact and usage of e-scooters in their city and adjust urban planning accordingly. Assuming an average speed for a trip the total usage time of the scooters could be calculated to retrieve the revenue of the renting companies.
What to improve
The time resolution of the data is low. For a second attempt, I would retrieve data from the API every minute. Also, the scooter “displacement” overnight was not taken into account since I can’t distinguish it from a ride. Probably I could clean up the data by removing all trips that were longer than usual during the night time. It would be also interesting to monitor different cities and compare the results.