Scientific Conference June, 9. - 13. 2014
Bayesian Network methodology for the assessment of the factors influencing the location of Dry Ports Samir Awad Núñez
Nicoletta González Cancelas
Francisco Soler Flores
Transportation Department Universidad Politécnica de Madrid Avd. del Profesor Aranguren s/n 28040, Madrid, Spain
[email protected]
Transportation Department Universidad Politécnica de Madrid Avd. del Profesor Aranguren s/n 28040, Madrid, Spain
[email protected]
Transportation Department Universidad Politécnica de Madrid Avd.del Profesor Aranguren s/n 28040, Madrid, Spain
[email protected]
Abstract—Increasing road congestion, lack of open spaces in port installations and the significant environmental impact of seaports can be solved integrating Dry Ports into the supply chain. In addition, these infrastructures are also presented as an opportunity to strengthen intermodal solutions as part of an integrated and more sustainable transport chain for transporting goods by rail. The choice of their location is a decisive strategic element which determines the intermodal integration to achieve an optimum and sustainable use of resources that passes through a modal shift to increase the contribution of freight by rail. In this paper, we present an assessment of the main factors influencing the location of Dry Ports. The aim of this study is to determine the relationship between territorial planning factors using Bayesian Networks to the determination of the localization of Dry Ports. First, we present a brief literature review and the results obtained of a DELPHI questionnaire in which we set the weighting of each factorinfluencing the determination of the location of Dry Ports. Then, using a Bayesian Network methodology, we establish the relationships in the whole group of variables in order to understand how we could catalyze this modal shift. Keywords— Logistics; Intermodal Transport; Sustainability; Dry Ports; DELPHI; Bayesian Networks
I.
INTRODUCTION
Shipping has become the most suitable and cheapest way to meet the needs generated by mobility of goods over long distances [1]. Thus, ports are configured as nodes with capital importance in the logistics chain as a link between two transport systems, sea and land [[2]and [3]], but the complexity of the transport sector and the increased volume of transported goods have produced problems that must be solved. Dry Ports are inland terminals directly connected by rail or road to a seaport [1]. They are designed as a solution to increasing transportation routes congestion, lack of open spaces in port installations and the significant environmental impact of seaports, due to the complexity of the transport sector and the increased volume of transported goods [[1], [4]and [5]]. There is also a widely held view that the train is the most sustainable land transport and its use should be increased [[2], [6], [7], [8], [9], [10], [11], [12], [13] and [14]]. Thus, Dry Ports are also presented as an opportunity to strengthen intermodal solutions as part of an integrated transport chain
The 2nd international virtual Scientific Conference http://www.scieconf.com
[[6], [15], [16], [17] and [18]].Consequently, the choice of their location is a decisive strategic element which determines the success of the logistics function. II.
METHODOLOGY
This paper continues the research presented in [19], where a wide literature review about industrial location can be found about the selected factors affecting the determination of the location of Dry Ports. The decision between purely technical considerations and the project costs has traditionally been made by means of CostBenefit Analysis [20]. Since the last decades of the 20th Century, the environment as a variable has gained importance in the planning and construction of transportation [21]. An Environmental Impact Assessment was added to the Cost-Benefit Analysis, which contributed decisively to the formalization of a decision-making system based on the use of multi-criteria analysis systems. There are also some newer techniques that attempt to solve the problems of localization. The most important which have been implemented are: Cluster analysis, classification trees and Decision, Future Scenario Analysis (Simulation), DELPHI, Expert Systems (Bayesian Networks and Neural Networks) and Geographic Information Systems. This research has selected the DELPHI method in order to set the weights of each of the factors that have an influence when deciding on the location of a dry port. Then, using the data collected in the DELPHI, we developed a model based on Bayesian Networks in order to establish the relationships in the whole group of variables. Following the literature review summarized in [22] it is possible to determine the factors that influence the determination of location of Dry Ports in terms of their characteristics and assess the constraints faced by its locationplanning. The factors used in location problems may respond to the "carrying capacity" or "use restriction" of the location. In this paper we only consider factors related to the use restriction which corresponds to the phases of exclusion and definition [23].
SECTION Transport and Logistics
- 524 -
Scientific Conference June, 9. - 13. 2014
These factors are presented in Table 1.They depend on: TABLE I. no.
FACTORS RELATED TO THE USE RESTRICTION.
Factor name
Observations Binary variable automatically discarded protected areas Noise level measured in dB (A) on the natural environment
BN Factors
*
Environmental Protection table
1
Noise on natural environment
2
Noise on urban environment
Noise level measured in dB (A) on the urban environment
ruido_mu
3
Hydrology
Presence of vulnerable areas such as rivers, streams or lakes
hidro
4
Land price
Measurement of investment to make
precio_suelo
Hosting municipality range
Considering the size of the municipality, the future development of urban municipio centers and centers nearby and the demographic and economic potential
5
Regarding hydrological conditions, high scores are obtained for locations far away from surface water courses, no aquifers in the environment and areas without potential risk of flooding.
ruido_mn
Locations with a moderate price are of special interest because they reduce investment costs. However, cheaper land is often less accessible for other infrastructures. Therefore, we must reach a trade-off between not choosing cheaper ground but the location where the total land price and construction of access facilities is minimized.
Accessibility to freight and passenger transport networks
acc-ffcc
7
Accessibility to high capacity roads
Accessibility to highcapacity motorwaynetworks
acc_carreteras
8
Accessibility to airports
Accessibility to air cargo terminals
acc_aeropuertos
9
Accessibility to seaports
Connection with one or more Seaports
acc_puertos
Accessibility of supplies and services
Accessibility to communication networks and the electrical grid and any other necessary utilities such as water, sanitation, etc
acc_dotacional
Weather
The climate's appropriateness for the activities the greatest number of days per year.
clima
Orography
Topography of the land on which the facility is located
orografia
Geology
Mechanical characteristics of the land on which the facility is located
gelologia
Distance to other logistics platforms
Overlap between hinterlands and the agglomeration of industries according to the principle of Spatial Justice [24]
10
11
12
13
14
To assess rail accessibility we took into account many different measures: type of access, number of tracks on the beach access, population of municipalities that are accessed directly by rail, proximity to an important railway junction, if bulk distribution shares the infrastructure with passenger transport, if the track is electrified and if the track is double or single. The best scores were obtained for locations with direct access, closer to railway junctions and best features of the track. Accessibility to roads was also taken into account: type of road access (conventional, turnpike, highway), distance to the nearest highway, number of lanes of the access road, Average Daily Traffic (ADT) and Level of Service (LOS) of the path. The best locations have direct access to routes with good infrastructure conditions and a level of service that allows heavy vehicles to circulate efficiently. Accessibility to airports is interesting because of the synergies that can occur with air cargo terminals. This is measured by the distance to the nearest airport, with the shortest distance having a higher value. Accessibility of seaports was measured by the distance to the nearest Seaport and the number of ports at a distance not less than 200 Km or greater than 400 km.
distancia_otras
Source: Based on information gathered by the authors.
The 2nd international virtual Scientific Conference http://www.scieconf.com
In the case of Hosting Municipality Range a large population is valued positively because it will be an advantage for the demand of the goods stored in the Dry Port and availability of manpower. However, population density can also have a negative value for a Dry Port neighbourhood because a high density is assumed to affect more people. In assessing already built Dry Ports locations, we find that several of the examples presented here have rail, road and other service networks that have been developed in parallel with the construction of the facility.
Accessibility to the rail network
6
Noise scores are better for longer distances to protected natural media and urban areas. However, in this study we do not establish a minimum distance because it is a comparison between the distances presented by different Dry Ports.
To evaluate climate the following factors are taken into account: climatic characterization, rainfall average, temperature deviation to 20° C, days with snowfall and wind speed average. Topography is related to slope. We are interested in outstanding locations with very low requirements for rail freight transport. A flat location that requires very little ground
SECTION Transport and Logistics
- 525 -
Scientific Conference June, 9. - 13. 2014
preparation makes construction work cheaper. Geological factors must also be taken into account and this involves evaluating three steps: the nature of the material forming the floor of the area, excavability and compression strength. The distance to other logistic platforms offers the possibility to assess the relationship between the Dry Port and the rest of the logistics system in the country. For Dry Ports, the competitive hinterland reflects a place where a terminal competes with other terminals. This system can generate a competitive geographic location disordered system that prioritizes economic management over other factors. In this sense, it is better to go to a collaborative-competitive model that generates economies of density, focus on spatial coverage and proximity. A core issue is the benefits derived from market density so that the same customer base can be reached (or serviced) with shorter distances and thus with fewer facilities. In such a circumstance, the location strategies are based on the relation with existing facilities, even if this implies the selection of sub-optimal locations. III.
DELPHI QUESTIONNAIRE
The DELPHI questionnaire consists of two rounds. It has selected a wide range of experts from the different disciplines that come together in this research: logistics, sustainability, environmental impact, transport planning and geography. The first round consists of a table with the selected factors which the experts were asked to order from most to least important and assign a weighting. After analyzing the information provided in the first round of the questionnaire by all the experts that made up the DELPHI panel, a second roundwas performed. In the second round, they were asked to review the weightings for the first round due to the differences between their responses and those of other experts. In this second round, the weightingcould be the same as each expert proposed in the first round or different if their opinion varied depending on the findings contained in the annex to the questionnaire. The aim of this second questionnaire is to try to achieve a consensus among experts, and to highlight the convergence of views. Although the theoretical formulation of the DELPHI method itself comprises several successive stages of questionnaire shipments, drain and operation, this study is limited to two stages, which nevertheless does not affect the quality of the results as shown by the findings of similar studies [25]. We used mean and mediansimultaneously to exploit the potential of each. The arithmetic mean is very intuitive for the expert group, being the measure of central tendency which is used most. However, it is very sensitive to the presence of outliers. Hence, the median was used for rigorous statistical analysis of the data. Interquartile range is selected as a measure of variability of the data. This is used to measure the consensus as it offers a very intuitive data deviate from the median. The descriptive analysis of the data from is done using box and whisker diagrams, which provide an overview of the symmetry of the distribution of the data and permit locating the presence of outliers.
The 2nd international virtual Scientific Conference http://www.scieconf.com
The box and whisker diagrams obtained after the first and the second rounds of the questionnaire are summarized in Fig.2 and Fig. 3. The most important factors indicated are related to the accessibility of the facilities and present a near-consensus. In particular, the most important factors considered by the expert panel are: accessibility to the rail network, accessibility to high-capacity main roads and accessibility to seaports. However, accessibility to airports was hardly given any importance. Also, the experts reached the least agreement on this factor after the second round. A detailed analysis of the DElPHI questionnaire result can be found in [22]. IV.
BAYESIAN NETWORK MODEL
Probabilistic graphical models are graphs in which nodes represent random variables, and the (lack of) arcs represent conditional independence assumptions. Hence they provide a compact representation of joint probability distributions. Undirected graphical models, also called Markov Random Fields (MRFs) or Markov networks, have a simple definition of independence: two (sets of) nodes A and B are conditionally independent given a third set, C, if all paths between the nodes in A and B are separated by a node in C. The variables selected for the study are shown in. By contrast, directed graphical models also called Bayesian Networks or Belief Networks (BNs), have a more complicated notion of independence, which takes into account the directionality of the arcs, as we explain below. Undirected graphical models are more popular with the physics and vision communities, and directed models are more popular with the AI and statistics communities. (It is possible to have a model with both directed and undirected arcs, which is called a chain graph.) For a careful study of the relationship between directed and undirected graphical models, see the references by [26], [27] and [27][28]. Although directed models have a more complicated notion of independence than undirected models, they do have several advantages. The most important is that one can regard an arc from A to B as indicating that A “causes” B. This can be used as a guide to construct the graph structure. In addition, directed models can encode deterministic relationships, and are easier to learn (fit to data). In addition to the graph structure, it is necessary to specify the parameters of the model. For a directed model, we must specify the Conditional Probability Distribution (CPD) at each node. If the variables are discrete, this can be represented as a table (CPT), which lists the probability that the child node takes on each of its different values for each combination of values of its parents. A. Inference A graphical model specifies a complete joint probability distribution (JPD) over all the variables. Given the JPD, we can answer all possible inference queries by marginalization (summing out over irrelevant variables), as illustrated in the introduction. However, the JPD has size O(2^n), where n is the
SECTION Transport and Logistics
- 526 -
Scientific Conference June, 9. - 13. 2014
number of nodes, and we have assumed each node can have 2 states. Hence summing out the JPD takes exponential time. Because a Bayesian Network is a complete model for the variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to find out updated knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when one wants to choose values for the variable subset which minimize some expected loss function, for instance the probability of decision error. A Bayesian Network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems.
A more fully Bayesian approach to parameters is to treat parameters as additional unobserved variables and to compute a full posterior distribution over all nodes conditional upon observed data, then to integrate out the parameters. This approach can be expensive and lead to large dimension models, so in practice classical parameter-setting approaches are more common. C. Learning One needs to specify two things to describe a BN: the graph topology (structure) and the parameters of each Conditional Probability Distribution (CPD). It is possible to learn both of these from data. However, learning structure is much harder than learning parameters. Also, learning when some of the nodes are hidden, or we have missing data, is much harder than when everything is observed.
The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product; clique tree propagation, which caches the computation so that many variables can be queried at one time and new evidence can be propagated quickly; and recursive conditioning and AND/OR search, which allow for a space-time tradeoff and match the efficiency of variable elimination when enough space is used. All of these methods have complexity that is exponential in the network's treewidth. The most common approximate inference algorithms are importance sampling, stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation, generalized belief propagation and variational methods. B. Parameter learning In order to fully specify the Bayesian Network and thus fully represent the joint probability distribution, it is necessary to specify for each node X the probability distribution for X conditional upon X's parents. The distribution of X conditional upon its parents may have any form. It is common to work with discrete or Gaussian distributions since that simplifies calculations. Sometimes only constraints on a distribution are known; one can then use the principle of maximum entropy to determine a single distribution, the one with the greatest entropy given the constraints. (Analogously, in the specific context of a dynamic Bayesian Network, one commonly specifies the conditional distribution for the hidden state's temporal evolution to maximize the entropy rate of the implied stochastic process). Often these conditional distributions include parameters which are unknown and must be estimated from data, sometimes using the maximum likelihood approach. Direct maximization of the likelihood (or of the posterior probability) is often complex when there are unobserved variables. A classical approach to this problem is the expectationmaximization algorithm which alternates computing expected values of the unobserved variables conditional on observed data, with maximizing the complete likelihood (or posterior) assuming that previously computed expected values are correct. Under mild regularity conditions this process converges on maximum likelihood (or maximum posterior) values for parameters.
The 2nd international virtual Scientific Conference http://www.scieconf.com
Figure 1. Box and whisker diagrams obtained after the first round of the questionnaire. Source: Based on information gathered by the authors.
Figure 2. Box and whisker diagrams obtained after the second round of the questionnaire. Source: Based on information gathered by the authors.
SECTION Transport and Logistics
- 527 -
Scientific Conference June, 9. - 13. 2014
Figure 3. .Bayesian Network Algorithm K2.Source: Based on information gathered by the authors.
In many practical settings the BN is unknown and one needs to learn it from the data. This problem is known as the BN learning problem, which can be stated informally as follows: Given training data and prior information (e.g., expert knowledge, casual relationships), estimate the graph topology (network structure) and the parameters of the JPD (joint probability) distribution in the BN. Learning the BN structure is considered a harderproblem than learning the BN parameters. Moreover, another obstacle arises in situations of partial observability when nodes are hidden or when data ismissing. In the simplest case, a Bayesian Network is specified by an expert and is then used to perform inference. In other applications the task of defining the network is too complex for humans. In this case the network structure and the parameters of the local distributions must be learned from data. Automatically learning the graph structure of a Bayesian Network is a challenge pursued within machine learning. V.
RESULTS
The network obtained is displayed with the K2 algorithm is as shown in Topology needs:
Identify factors that are relevant
Determine how those factors are causally related to each other
The arc cause effect does mean that cause is a factor involved in causing effect
As examples respect to the weightings of each of the factors: In this case the parent of the variables: distancia_otras, ruido_mu, hidroandacc_carreteras, is the noderuido_mn.
The 2nd international virtual Scientific Conference http://www.scieconf.com
The children of ruido_mn are precio_suelo, municipio and hidro. The children of hidro are municipio, acc_aeropuertos and acc_ffcc. In this case the parents of the variable clima are acc_carreteras and municipio. And the child of clima is orografía. VI.
CONCLUSIONS
Although the results of the DELPHI questionnaire show a greater importance in the search for the location of a Dry Port to the aspects considered in the classical theories of industrial location, we should not lose sight of the other aspects, since a non weighting factor is so unimportant that is can be excluded. In this sense, the results help to achieve sustainable development as argued by [23]. An effect that has two or more ingoing arcs from other vertices is a common effect of those causes. A cause that has two or more outgoing arcs to other vertices is a common cause (factor) of those effects. The effects of a common cause are usually observables. Following the BN independence assumption, several independence statements can be observedin this case, respect to the weights of each of the factors. For example, the variables noise on urban environment and hydrology are marginally independent, but when hosting municipality range is given they are conditionally dependent.This relation is often called explaining away. When noise on natural environment is given, noise on urban environment, distance to other logistics platforms, hydrology and accessibility to high capacity roads are conditionally independent. When hydrology is given, accessibility to the rail network is conditionally independent of its ancestor’s noise on natural environment andnoise on urban environment.
SECTION Transport and Logistics
- 528 -
Scientific Conference June, 9. - 13. 2014
[7]
Causal graph: Accessibility to high capacity roads has two common effects: accessibility to seaports and weather.
[8]
Orography is an indirect effect of hosting municipality range and accessibility to high capacity roads, caused by weather.
[9]
Land price and Orography are two alternative causes of geology (but may enhance each other).
[10]
The conditional independence statement of theBN provides a compact factorization of the JPDs instead of factorizing the joint distribution of allthe variables by the chain rule. In this sense, according to BN, the weight of road accessibility is greater than the weighting of other accessibilities. For instance, to date, the weighting of the road decides the location of logistics facilities.
[11]
Thus, the determination of an intermodal integration to achieve an optimum and sustainable use of resources that passes through a modal shift to increase the contribution of freight by rail need the solution of prioritizing the accessibility by the railway network. These results confirm the possibility offered by Dry Ports to increase the efficiency of transport modes, both individually and in the context of intermodal integration to achieve an optimum and sustainable use of resources which passes through a modal shift to increase the contribution of freight by rail, such as exposed [12] [12]. However, the weighting of the accessibility of railway and airports are affected by the hydrography and noise factors and the accessibility of ports is affected by road accessibility and noise factors. So, we concluded that the noise conditions will limit the modal shift. Based on the weightings obtained through the DELPHI methodology and the relations observed with the Bayesian Network, future researches will support an Artificial Neural Network which will provide more accurate weights. These final weightings will be integrated into a Geographic Information System in order to select the most suitable locations using a Transparencies Method, providing a planning methodology to help decision-making, based on technicians’ criteria.
[12]
[13]
[14]
[15]
[16]
[17]
[18] [19]
[20]
[21]
[22]
REFERENCES [1] [2]
[3]
[4] [5]
[6]
A. Camarero and N. González, Cadenas Integradas de Transporte. Fundación Agustín de Betancourt. Ministerio de Fomento, Spain, 2005. M. Hesse and J.P. Rodrigue, Global production networks and the role of logistics and transportation, Growth and Change, 37(4), pp. 499-509, 2006. N. González., F. Soler and A. Camarero,Motorways of the sea quality index. Case of Spain, In Proc. of ARSA, ISBN: 978-80-554-0606-0, ISSN: 1338-9831, 2012 A. Camarero and N. González, Logística y transporte de contenedores. Fundación Agustín de Betancourt. Ministerio de Fomento, Spain, 2007. J.P. Rodrigue, Transportation and the geographical and functional integration of global production networks, Growth and Change, 37(4), pp. 510-525, 2006. V. Roso, J. Woxenius, and K. Lumsden, The dry port concept: Connecting container seaports with the hinterland. Journal of Transport Geography, 27(5), 338–345, 2009.
The 2nd international virtual Scientific Conference http://www.scieconf.com
[23]
[24]
[25]
[26]
[27] [28]
S. Hanaoka and M. B. Regmi, Promoting intermodal freight transport through the development of dry ports in Asia: An environmental perspective. IATSS Research, Volume 35, Issue 1, Pages 16-23, ISSN 0386-1112, 2011. N. González, F. Soler and A. Camarero, Motorways of the sea quality index. Case of Spain, In Proc. of ARSA, ISBN: 978-80-554-0606-0, ISSN: 1338-9831, 2012. B.J.C.M. Rutten, The design of a terminal network for intermodal transport. Transport Logistics 1(4), 279-298, 1998. J. Woxenius, Development of small-scale intermodal freight transportation in a systems context. Chalmers University of Technology,Göteborg, Sweden, 1998. A. Ballis and J. Golias, Comparative evaluation of existing and innovative rail-road freight transport terminals, Transportation Research Part A: Policy and Practice 36(7), 593-611, 2002. V. Roso, Evaluation of the dry port concept from an environmental perspective: A note, Transportation Research Part D: Transport and Environment, 12(7), pp. 523-527, 2007. V. Roso, Factors influencing implementation of a dry port, International Journal of Physical Distribution & Logistics Management, 38(10), pp. 782-798, 2008. J. P. Rodrigue, C. Comtois, and B. Slack, The geography of transport systems. Second Edition, New York: Routledge, 352 pages. ISBN 9780-415-48324-7, 2009. R. J. Mc Calla, Factors influencing the landward movement of containers: the cases of Halifax and Vancouver. In: Wang, J., Olivier, D., Notteboom, T., Slack, B. (Eds.), Ports, Cities and Global Supply Chain, first ed. Ashgate, pp. 121–137, 2007. European Commission, IQ – Intermodal Quality. Final Report, Transport RTD Programme of the 4th Framework Programme – Integrated Transport Chain, 2000. European Commission, European Transport Policy for 2010: Time to decide. Office for official publications of the European Communities, Luxemburg. White Paper, 2001. European Commission, Expert group 1 «methodology for TEN-T planning (2010). Proposal on TEN-T Network Planning, 2010. S. Awad, N. González and A. Camarero, Aplicación de un modelo basado en el uso de metodologíaDELPHI y Análisis Multicriterio para la evaluación de la calidad de la localización de los Puertos Secos españoles, In Proc. of XVIII Congreso Panamericano de Ingeniería de Transito Transporte y Logística, Santander (Spain), 2014. Á. Aparicio, La toma de decisiones en la política española de transporte: aportación y limitaciones de la evaluación de proyectos. Cuadernos económicos de ICE. - n. 80 (dic. 2010); p. 115-147, 2010. C. Daniele, J.F. Mereb, A. Frassetto and J. Pérez, Estado actual de institucionalización y regulación de la evaluación y gestión ambiental de las obras de transporte en Argentina. Revista Transporte y Territorio Nº 6, Universidad de Buenos Aires. pp. 52 a 83, 2012. S. Awad, N. González and A. Camarero, Setting of weighting factors influencing the determination of the location of Dry Ports using a DELPHI methodology. Proceedings in SCIECONF2013, ISBN: 978-80554-0726-5, ISSN: 1339-3561, vol. 1, issue 1, pp. 505--510, 2013. W. Plata, M. Gómez and J. Bosque, Desarrollo de modelos de crecimiento urbano óptimo para la Comunidad de Madrid. GeoFocus (Artículos), nº 10, p. 103-134. ISSN: 1578-5157, 2010. J. Bosque, and R. C. García, El uso de los sistemas de información geográfica en la planificación territorial. Anales de Geografía de la Universidad complutense, n. 20, pp. 49-67, 2000. G. Rowe andG. Wright, Expert opinions in forecasting: The role of the Delphi technique. International Series in Operations Research and Management Science, 125-144, 2001. Pearl, J., The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm and its Optimality. Communications of the ACM, vol. 25, no. 8. pp. 559-564, 1982. Castillo, E.,Gutiérrez, J. M.and Hadi, A. S., Expert Systems and Probabilistic Network Models. Springer Verlag, 1997. Duda, R. O.,Hart, P. E.and Stork, D. G., Pattern Classification, NY Wiley, 2000.
SECTION Transport and Logistics
- 529 -