Can we predict the occupancy that the buses will have? | Chronicles of the intangible | Technology
is the headline of the news that the author of WTM News has collected this article. Stay tuned to WTM News to stay up to date with the latest news on this topic. We ask you to follow us on social networks.
Due to the covid-19 pandemic, many transport companies have seen as an added value the need to inform their customers of the occupation they will find when they get on a bus. This functionality is useful to curb contagion and to avoid crowds: if passengers are informed in advance about how full the vehicles will be, they can choose to travel in the emptiest ones.
Since bus companies usually keep data and history of the trips of each vehicle, the possibility arises of addressing the problem with data-driven artificial intelligence techniques. Specifically, with supervised learning, in which a predictive model is built and trained on historical data that is labeled in some way. From such data, the model detects occurrences of patterns. For example, that the eight-thirty bus is usually full on weekdays. Thus, it is able to make predictions for the future.
By bringing together historical occupancy data and user validations with bus line and calendar information, the data needed to model passenger behavior would already be available.
What does it mean to train a model?
When we talk about training a neural network, what we really want is to adjust the parameters of a function (neural network) so that it learns a specific task. That is, so that it learns to obtain some outputs from some inputs.
This is achieved by supplying large amounts of data as input during the training phase, so that the prediction error of the model is minimized by adjusting its internal parameters. The training process of a neural network is mainly based on the ingenious application of the chain rule, which is often taught to derive in mathematics. This rule allows us to calculate how the network parameters should be modified to iteratively minimize the prediction error.
In this case, it is intended that the model learn to calculate the occupancy of the buses at each of the stops where they pass. To do this, the model must be given as input a numerical representation (vectors) of the information on the bus, the stop, the timetable, etc., and it is intended that it returns the number of passengers that there will be, as shown in the following image. With historical records of a few months, it is already possible to obtain a fairly good prediction.
The grace of using neural networks is that they allow, with a single model, to make predictions about any of the bus lines at each of its stops, without having to make a specialized model or algorithm for each of the lines. This facilitates its application in other cities, where the lines and stops will be completely different.
Time series: recurrent neural networks
However, it can be done better. The model that we have just presented does not take into account the previous occupation of the previous stops, which is very relevant information, and knowing it can greatly condition the occupation of the next stop.
For these types of problems where sequences are dealt with, there is a type of neural networks called recurrent networks (recurrent neural networks), which allow us to work with time series taking into account previous occupancy values to predict that of the next stop. What these networks do, in short, is to encode the information of the previous stops in a compact way to finally obtain a resulting vector that contains all the compressed information of the entire sequence. In this way, the model can better capture changes and temporal patterns and thus obtain better results.
And how are the predictions? An example of what the model predicts for different buses can be seen in the image shown below: the x-axis is the time axis and the y-axis represents bus occupancy. You can see how the occupancy of the vehicles starts at zero at the beginning of the trip, goes up and goes back down to zero when they finish their journey. And so on, as you can see from the “hills” that form on the graph. The blue line refers to the real occupancy (according to historical data) and the orange line represents the future predictions of the neural network, in which we observe that it is quite correct with the real occupancy of future trips, obtaining an average error of two people per stop.
Models based on neural networks are very versatile and allow us to solve a wide variety of problems in a very efficient and precise way if we have the necessary data. With these techniques, inLab FIB, the innovation laboratory of the Faculty of Informatics of Barcelona (FIB) of the Universitat Politècnica de Catalunya (UPC), together with the smart mobility and ticketing engineering company Ityneri by Geoactio, has developed a solution that is already informing users in some Spanish cities 24 hours a day, seven days a week about the occupancy predictions in the bus fleet.
Albert Obiols He is responsible for the area of data science and big data, inLab FIB, at the Polytechnic University of Catalonia (UPC).
Gonzalo Recio He is a senior data scientist at the inLab FIB of the UPC.
Chronicles of the Intangible is a space for dissemination of computer science, coordinated by the academic society SISTEDES (Society of Software Engineering and Software Development Technologies). The intangible is the non-material part of computer systems (that is, the software), and its history and future are recounted here. The authors are professors from Spanish universities, coordinated by Ricardo Peña Marí (Professor at the Complutense University of Madrid) and Macario Polo Usaola (Professor at the University of Castilla-La Mancha).
You can follow THE COUNTRY TECHNOLOGY in Facebook Y Twitter or sign up here to receive our weekly newsletter.