Database of Public Transport Connections – its Creation ... - IEEE Xplore

3 downloads 4292 Views 1MB Size Report
Database of Public Transport Connections – its Creation and Use. David Fojtík. Department of Control Systems and Instrumentation. VSB - Technical University ...
Database of Public Transport Connections – its Creation and Use

David Fojtík

Igor Ivan, JiĜí Horák

Department of Control Systems and Instrumentation VSB - Technical University of Ostrava Ostrava, Czech Republic [email protected]

Institute of Geoinformatics VSB - Technical University of Ostrava Ostrava, Czech Republic [email protected] [email protected]

Abstract—Unemployment represents one of the key social and economical issues of the current world. The Integrated Portal of the Ministry of Labor and Social Affairs of the Czech Republic supports unemployed people to find a proper job using various useful tools. Any user can find a job according various criteria. The existence of public transport connection between municipality of residence and municipality of potential new job is one of many various criteria of searching. Data for this option are provided and guaranteed by VSB – Technical University of Ostrava. Two types of this database are developed according to the spatial level – municipality and municipality district level. To provide updated versions of these databases three times per year, an automatic creation has been developed and utilized. This paper describes the software solution, where the whole process is automatically distributed to many powerful workstations for parallel processing of all public transport connection combinations and the current use of this database. Keywords-public transport; time tables; parallel processing; client-server; distributing system

I.

INTRODUCTION

Although the share of public transport is continuously decreasing all over the world ([7]), important part of unemployed people is not able to commute to a potential job by own car and they must rely on public transport. Therefore the role of public transport for commuting remains still very important [1]. The Ministry of Labor and Social Affairs of the Czech Republic (MLSA hereafter) is conscious this fact. The Integrated Portal of MLSA represents a central (and official) site for searching vacant jobs. People, who cannot find any appropriate job in their residence municipality, can try to find appropriate jobs in surrounding municipalities. For this reason the searching interface has been updated by the special option for commuting conditions since 2007. Job applicant can specify the distance or time of the commute by public transport from the residence municipality (see Fig. 1). The output list of appropriate job offers is created based on applicant’s requirements and it contains prices, connection durations, number of changes and detail description of one recommended connection for 5 time intervals (beginnings of shifts at 6, 7, 8, 14 and 22 o’clock) (see Fig. 2).

c 978-1-61284-361-2/11/$26.00 2011 IEEE

Figure 1. Transport searching options of the Portal

Online searching of public transport connections, analyses and listing of parameters for the best connection is impossible due to various technical matters. So, this preprocessing of all possible commuting options is the only possibility how to realize this extension. Because of such a robust solution, results must be stored to a special large database. The complete process of the database building which is based on client-server architecture is very demanding on the performance and number of clients and the performance of server.

Figure 2. Example of a searched vacant job

115

The database of public transport connections (Database hereafter) contains description of public transport connections between all municipalities in the Czech Republic. The accessibility is calculated from thousands of retrieved connections possibilities which have to fulfill following parameters: • the Euclidean distance between municipalities is less than 100 km, • the duration of journey is less than 90 minutes, • the maximal number of changes is 5, • time of arrival cannot be earlier than 60 minutes before the shift should start, • time of departure from commuter’s residence cannot be earlier than 120 minutes before the shift should start. Commuting time is limited to several selected intervals. Also the appropriate return way connections are searched and their parameters are stored to the Database. These return connections should depart immediately (ideally until 15 minutes) after the end of the shift (typically after 8.5 hours of work). All other conditions of particular return connections are identical to above mentioned parameters. The existence of a return connection can be quite often limited for commuting. It was proved that the variability of possible return depends strongly on time and place (see Tab. I.). The portal must be updated regularly to provide real information. Generally the time tables are significantly updated by providers two times a year (June and December). Therefore the completely new Database must be generated right after new time tables are published. Another Database update is completed in March due to the frequent changes usually occurred in the beginning of year. However some local changes of time tables can be done by providers whenever, so ideally the Database should be updated by every change of time tables. This process of updating means to find public transport connections between almost all municipalities in the Czech Republic and to make sophisticated analysis (i.e. comparison and selection of the optimal transport connections). The whole process is very time-consuming and demands the parallel way of processing. III.

BUILDING OF THE DATABASE

The building process of the Database consists of two phases: 1. Searching phase, including massive searching of various possible public transport connections. 2. Analyses of found data, including weighted evaluation and selection of optimal transport connection for each combination and time interval.

116

Found connections

THE DATABASE OF PUBLIC TRANSPORT CONNECTIONS

Group of requests

II.

Figure 3. Principle of distributing method

During the first phase of Database creation it is necessary to find all public transport connections for all requested pairs of municipalities. Each pair respecting defined time criteria must be individually delivered to a searching module for processing. It means to process almost 13 million of municipal combinations within 100 kilometers. As written above, two types of database are prepared according to the spatial level – municipality and municipality district level. In case of municipality district level, number of records to process is more then 73 million of combinations. This part of processing is very demanding due to a large number of combinations. To reduce the process of connection searching it is necessary to distribute the whole process into a set of workstations using client-server technology. Particular data segments are processed in parallel (see Fig. 3). The server part provides database of MS SQL Server 2008 which contains all combinations of municipalities (municipality districts) to search and it saves all searched connections as well. The client part of this solution is created of circa 50 computers with special software for transport connection searching called TRAM. All these workstations are equipped by multi-core processors what allows to increase the total number of clients. The searching algorithms are optimized to multiple launch of TRAM application to use multi-core processors. This improvement maximizes the performance of all clients because the application can be launched more than 200 times. So the current situation is the same as the client part would consists of 200 computers. This robust solution ensure the creation of new actual version of both databases within 5 days of processing (90 million of combinations).

2011 12th International Carpathian Control Conference (ICCC)

Figure 4. Client application during searching process (using processor with dual-core)

The searching phase is started by the client’s data request. Concrete client calls the saved procedure on the server. Then the server answers this request with sending group of municipalities’ combinations. Practical verifications found that the optimal size of one batch is 700 records. However, the client receives only the combination of pairs of points, so the client search a connection for each pair in both ways. In total, the process doubles the number of requests to find transport connections (usually 1400 connections). After the client has received, a link between server and client is closed and the client starts the searching process in these steps: 1. First pair of municipalities (one record) is chosen. 2. Transport connections are found for this combination in both ways and for all defined time intervals. 3. Optimal transport connections are selected from the group of all found connections for particular time intervals. Defined weights are summarized and transport connections with low weight are filtered out for each time interval and first way. These weights are stated according to price, distance, duration, number of changes and differences from requested time of arrival and departure. 4. All selected optimal transport connections are saved in a local version of the database. Similarly only details of these optimal connections are saved in the XML format. This step significantly reduces the size of the database. 5. The steps 2-4 are repeated for the return way of the same combination. 6. The client starts to search connection for another combination – step 1. When all combination are marked as found, the link with server is created again and data from the local database (found connections) are transferred to the server database. This procedure notes information about which data, whom and when were sent. When results are retrieved, the procedure marks these records as completed. This information is necessary in case of some client collision. Than the client requests a new group of combinations from non searched combinations. This starts a new searching cycle.

Figure 5. Searching process

This evaluation is not made continuously but as a final step of the whole processing. Practically it is evaluated when the client requests a new group of data but all requests have been already sent to clients. If the client receives an empty set, it asks the server for selecting some already sent but non-processed requests older than 30 minutes. If there are some, they are marked as non-sent. The server procedure sends the number of these records to the client. The client will either repeat the request for a new group of data or (if it is null) it will ask for the number of requests which were sent but still wait to be processed. If the server returns zero, all requests were processed and the client will end its work. Otherwise the client waits 30 minutes and then it repeats the whole checking process again. The whole process is very robust and it counts with each possibility – crash or late response during data saving, late reaction within the limit etc. Client software is based on .NET platform in MS Visual Basic 2008. As it is written above, a client will disconnect immediately when it gets the data from a server. This fact saves system sources of the server. A client is searching connection between municipalities during the time in disconnected status. After the searching process is finished, the client connects to the server to send the results. So the potential of .NET platform is fully exploited. For physical transport connection searching process a module from CHAPS, Ltd. is being used. The client part is approaching the valid time table through this module, as well as searching for connections, computing prices etc.

2011 12th International Carpathian Control Conference (ICCC)

117

Client Searching process

Wait for 30 minutes

+

Were any new source data sent 30 minutes ago?

Do any exist?

-

Do any data exist to be processed? End

+ Were any sent?

-

Server Analysis of already sent data. If any data that have been sent to be processed 30 minutes ago are still not processed then these data are selected again as data to be processed.

Sum of records that have been sent and still have not been processed.

database

Figure 6. End process

IV.

TWO VARIANTS OF THE DATABASE

The existence of two variants of the database is mentioned several times in the paper, these databases vary according to the spatial level (municipality and municipality district level). The main purpose of this solution is to provide more detailed description of the situation in larger municipalities, cities. Basically, the module from CHAPS, Ltd. defines concrete public transport stop for each municipality or municipality district. The searching process in case of municipality level is using any public transport stop within concrete municipality, similar situation is in case of municipality district. The third possible solution is to define the origin or destination directly by a stop. In case of large cities with many stops, stops near city outskirts were often selected and used for searching. It is obvious that the time of journey from this stop to the city center can be even bigger than the time needed to reach this stop at city outskirts from the residence.

Figure 7. Service areas around three biggest cities according to spatial level (municipality, municipality district)

118

Figure 8. Service areas around three biggest cities according to spatial level (municipality, terminal stops)

Fig. 7 describes differences of two various service areas around three biggest cities (Prague, Brno and Ostrava). These service areas differ according to definition of the destination. The larger area consists municipalities whence it is possible to commute to any stop within these three cities. The second area consists municipalities whence it is possible to commute to any or defined stop within municipality districts of these three cities. There are evident very significant differences between these two service areas (mainly in case of Prague). This solution has problems too, because the commuter would rather use some terminal station in the city instead of some peripheral stop in municipality district where the job is located. Fig. 8 compares the situation between the same service area as in Fig. 7 and service area which contains municipalities whence it is possible to commute to selected terminal stops (more about urban transport problems in [2]). This service area describe the most realistic situation and it should be used for further versions of the database. V.

ANOTHER POSSIBLE USE OF THE DATABASE

The main purpose of this database is the Integrated Portal of MLSA and searching vacancies within defined distance and time. But the database could be used for another purposes too. Analyses of public transport services and accessibility belong certainly to one of these purposes. This kind of database is produced by the Technical University of Ostrava since 2006 and it is possible to use also the older time tables to create even longer time series to describe changes and general development of public transport accessibility in the Czech Republic. The size of final databases is very big according to the spatial level. Table below (Table I) describes numbers of combinations with existing connections. It is evident that only 5-6% of all combinations of municipalities has a one way transport connection according to input conditions. Without these limiting factors the size of existing connections would be higher. Another fact is the slow increase of combinations with existing public transport connections what can indicate

2011 12th International Carpathian Control Conference (ICCC)

small progress in public transport accessibility. This fact is even more surprising because before the December 2010 some local railway and bus links were canceled. Only 50% of all combinations with existing transport connection has the possibility to return after the shift - existing return connection. This number is significantly depended on time. The biggest possibility to return (85%) is after the night shift (after 6 a.m.), then after the shift with start at 6 a.m. (67%). The worst situation is in case of the afternoon shift where the possibility of return exists only for 13% combinations. The percentage of combinations with existing connection is even a little bit smaller on the municipality district level. TABLE I.

Date

NUMBER OF FOUND CONNECTION (MUNICIPALITY LEVEL, CZECHIA) Number of combinations

Combinations with one way connection

Combinations with both way connection

3/2010

12,565,836

721,682

379,001

6/2010

12,565,836

726,625

383,470

12/2010

12,565,836

743,121

397,721

This database allows to analyze regional differences and specifics. Because significant regional differences can be hidden within general development of public transport services (more in [4] or [5]). Example of this fact is depicted in Fig. 9. Significant decline of public transport services is evident in two NUTS3 regions, in another 4 regions the decline is much smaller and a positive progress is obvious in the rest of regions in the Czech Republic (similar in [3]).

Data from the database of public transport connections are also being used by employers of labor offices. They appreciate the possibility to construct service areas around particular municipalities in their administrative area accessible by public transport (similar in [6]). VI.

CONCLUSIONS

Building the database of public transport connections is very time-consuming and difficult task. Transport connections for almost 86 million pairs of combinations and 5 time intervals are searched. This solution uses the biggest solution of client-server architecture – independence on the number of parallel running client (computers). So the time duration of the processing can be reduced by adding other clients. Possible increase of performance is very important from the perspective of the needs to carry out more frequent updates. The database can be utilized not only for searching parameters of a commuting request, but also for an evaluation of transport accessibility. The concept of accessibility of geographical objects has been studied since the 1950’s. The practical provision of accessibility analysis is demonstrated e.g. in Bracken (1994) or Burrough et al. (1998). Results serve regional authorities for monitoring and taking appropriate measures in their transport policy. ACKNOWLEDGMENT The data and the transport searching engine are provided by courtesy of CHAPS Ltd. The project is supported by the Ministry of Labor and Social Affairs of the Czech Republic. REFERENCES [1]

[2]

[3]

[4] Figure 9. Number of connections (7-8 am). Change between 2006 and 2007

[5]

[6] [7]

J. Farrington, D. Gray, S. Martin, D. Roberts, “Car dependence in rural Scotland: challenges and policies”, Edinburgh; workpaper for The Scottish Executive, 1998, 246 pp. L. Janosikova, M. Blaton, D. Teichmann, “Design of urban public transport lines as a multiple criteria optimisation problem”, In A. Pratelli, C.A. Brebbia, “Urban transport XVI - Urban transport and the environment in the 21st Century”, vol. 111, 2010, pp. 137-146. S. Kraft, M. Vanþura, “Transport hierarchy of Czech settlement centres and its changes in the tranformation period: Geographical analysis”, Moravian Geographical Reports, vol. 17, No. 3, 2009, 4152. M. Marada, “Transport and geographic organization of society: Case study of Czechia,” Geografie, vol. 113, No. 3, 2008, pp. 285-301. M. Marada, V. KvČtoĖ, “Differences in the availability of transport possibilities in Czech municipalities and socio-geographical microregions”, Geografie, vol. 115, No. 1, 2010, pp. 21-43. M. J. Moseley, “Accessibility: the rural challenge”, London: Methuen, 1979, 204 pp. P. White, “Public Transport”, Spon Press, London, 2001, 219 pp.

2011 12th International Carpathian Control Conference (ICCC)

119

Suggest Documents