World Grid Square Codes: Definition and an example of world grid square data Aki-Hiro Sato Graduate School of Informatics, Kyoto University Yoshida Honmachi, Kyoto 606-8501 Japan Japan Science and Technology Agency, PRESTO 4-1-8 Honchoi, Kawaguchi 332-0012 Japan Email:
[email protected]
Shoki Nishimura National Statistics Center 19-1 Wakamatsu-cho, Shinjuku-Ku, Tokyo 162-8668 Japan E-mail:
[email protected]
Hiroe Tsubaki National Statistics Center 19-1 Wakamatsu-cho, Shinjuku-Ku, Tokyo 162-8668 Japan Institute of Statistical Mathematics 10-3 Midori-Cho, Tachikawa, Tokyo 190-8562 Japan E-mail:
[email protected]
Abstract—Grid square statistics enable us to both protect privacy and analyze socioeconomic activities and compress an amount of data comparing original point data. We can compare and merge different grid square statistics and recalculate new grid square statistics from different grid square statistics without regard to privacy issues. Japan has industrial and government standards to define grid square codes to generate grid square statistics, which are defined in the Japan Industrial Standard X0410. This paper proposes a novel procedure to define global grid square codes for six levels, hierarchically modifying Japanese grid square codes. We show a procedure to extend JIS X0410 to grid square codes for worldwide usage, which we call world grid square codes and an example of grid square data focused on administrative areas for 252 countries and regions. We explain three case studies to employ grid square statistics and discuss how to apply several anonymization models to generating grid square statistics. Keywords-Gridded data; Standards; JIS X0410; Grid square statistics; Global administrative areas
I. I NTRODUCTION The Japanese Industrial Standard (JIS) for gird square codes (JIS X0410 [1]) was established in 1976. JIS X0410 defines a Japanese procedure to generate grid square statistics for both Japanese government statistics and industrial applications. In Japan, the Statistics Bureau, Ministry of Internal Affairs and Communications and the Ministry of Land Infrastructure, Transport and Tourism provide grid square data for Japanese statistics survey [2], [3], such as population census, economic census, national land numeric information, facilities, natural environment and land usage. JIS X0410 enables us to both protect privacy and analyze different grid square statistics. It further can be used for recalculating new grid square statistics from several different grid square statistics. There are also other applications, such as disaster risk assessment, estimation of total sales for a new business launch, and demand estimation for public
investment. The definition of grid square codes enables us to identify each grid square as a unique location from latitude and longitude. Thus, it is designed based on a conformal projection. However, this is bounded from 100◦ to 180◦ for longitude and from 0◦ to 66.6˙ 6˙ ◦ for latitude. Therefore, the applicability of JIS X0410 to areas is partial and cannot be applied for worldwide usage. We need an extension of its definition of JIS X0410. This is the motivation of this paper. In this paper, we propose a compatible extension of the grid square code defined in JIS X0410 for worldwide usage in a simple manner and shall call it World Grid Square Code. Although grid squares in JIS X0410 are not based on any equal-area projection, they have conformality and are so simple that we can easily handle them and intuitively understand grid square statistics in an actual environment. It further seems to be practically useful if we know area of each grid square since we can compute statistics per unit area and evaluate errors of statistics. The area estimation of each grid square can cover the shortcoming of their nonequal area property. This paper is organized as follows. In Section II, we mention some existing studies on grid coding systems. In Section III, we propose a procedure to construct the world grid square coding system by extending Japanese grid square coding system (JIS X0410). In Section IV, we show example data generated by using the proposed procedure. In Section V, we explain case studies of the proposed procedure. In Section VI, we discuss how to apply several anonymization procedures to generating grid square statistics. Section VII is devoted to conclusions. II. E XISTING STUDIES The first grid square coding system was established in the United Kingdom (UK) in 1938 [4]. This is called the
British National Grid (BNG) and has been widely used in the UK to generate grid square statistics from national statistical surveys. These statistics are calculated from latitude and longitude based on the Universal Transverse Mercator (UTM) coordinate system, which sets 49◦ N and 2◦ W as the true origin. The BNG, therefore, has an available range to be used as a grid reference. More recently, the European Union (EU) Commission has proposed European Reference Grids [5] and has used them to generate grid square statistics about population census in EU member states [6]. This is known as GEOSTAT 1A and provides a census population prototype dataset for 2006. The European Reference Grids are calculated from latitude and longitude based on the ETRS89 Lambert Azimuthal Equal-Area Projection Coordinate Reference System (ETRS89/ETRS-LAEA), which uses a projection point set at 52◦ N and 10◦ E. There is a hierarchical structure of grid resolutions resolutions, e.g., 1 m, 10 m, 100 m, 1,000 m, 10,000 m and 100,000 m. Since the BNG and European Reference Grids are based on the Equal-Area Projection Coordinate Reference System, they cannot be extended for worldwide usage due to their projected bounds. However, the Japanese grid square coding system can be extended for worldwide usage due to its simple projection. Furthermore, OGC Discrete Global Grid System (DGGS) Standard has been developed based on ellipsoidal polygons constituting the boundary of a DGGS cell, which posses hierarchical structure by dividing a parent cell into several child cells [7]. This procedure has been widely studied in geoscience.
code is expressed as twelve numeric digits. A fifth-level grid square consists of four sixth-level grid squares with a 3.75′′ north-to-south span and 5.625′′ west-to-east span. The sixthlevel of a grid square code is expressed as thirteen numeric digits. The hierarchical structure of the same grid square code as JIS X0410 is included in the world grid square code from level 1 to level 6. We provide open libraries to compute grid square codes with four computer languages (R, JavaScript, PHP, and Python) from the web page of the Research Institute for World Grid Squares (RIWGS) [8] (http://www.fttsus.jp/worldgrids/en/library). Consider three binary variables x, y, and z separating the earth into eight areas based on latitude and longitude, which construct the 0th-level grid square code. The three binary variables x, y, and z are given as follows:
III. A N EXTENSION OF JIS X0410
Therefore, from a point data described as latitude and longitude, we can calculate grid square codes, including the position for six levels. Assume that p, q, r, u, v, w, s2 , s4 , and s8 are integers. The 1st-level grid square code can be computed as
TO THE WORLD GRID
SQUARE CODING SYSTEM
Figure 1 shows a conceptual illustration of an extension of JIS X0410 for worldwide usage. Preserving the structure of JIS X0410, we introduced an upper grid square level consisting of eight areas, which we express as one digit called the 0th-level grid square code. Table I shows characteristics of the world grid square codes. The first-level code (six numeric digits) provides a grid square with a 40′ north-to-south span and 1◦ west-to-east span. A first-level grid square consists of 64 second-level grid squares with a 5′ north-to-south span and 7.5′ west-to-east span. The second grid square code is coded as eight numeric digits. A second-level grid square consists of 100 third-level grid squares with a 30′′ north-to-south span and 45′′ west-to-east span. The third-level grid square is coded as ten numeric digits. A third-level grid square consists of four fourthlevel grid squares with a 15′′ north-to-south span and 22.5′′ west-to-east span. The fourth-level grid square is coded as eleven numeric digits. A fourth-level grid square consists of four fifth-level grid squares with a 7.5′′ north-to-south span and 11.25′′ west-to-east span. A fifth-level grid square
1) Variable x = 0 if latitude is positive, otherwise x = 1 2) Variable y = 0 if longitude is positive, otherwise y = 1 3) Variable z = 0 for |longitude| < 100◦ , otherwise z = 1. Finally, we define the 0th-level grid square code as o = 22 x + 2y + z + 1.
(1)
From 0th-level grid square code o, we obtain x, y, and z, as z
= (o − 1) mod 2,
(2)
y x
= ((o − z − 1) ÷ 2) mod 2, = (o − 2 × y − z − 1) ÷ 4.
(3) (4)
o00p0u o0p0u op0u o00pu o0pu opu
1st-level grid square code = (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) . (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10)
(5)
The 2nd-level grid square code can be described as o00p0uqv o0p0uqv op0uqv o00puqv o0puqv opuqv
2nd-level grid square code (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) . (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10)
=
(6)
Figure 1.
Conceptual illustration of world grid squares given by an extension of JIS X0410. Table I
W ORLD GRID
SQUARE CODES WHICH ARE A COMPATIBLE EXTENT ION OF GRID SQUARE CODES DEFINED IN
Layer type 1st level 2nd level
Length of sequences 6 numeric digits 8 numeric digits
3rd level(basic)
10 numeric digits
4th level
11 numeric digits
5th level
12 numeric digits
6th level
13 numeric digits
Definition the 1st-level grid square consists of 64 2nd-level grid squares (8 × 8) the 2nd-level grid square consists of 100 3rd-level grid squares (10 × 10) the 3rd-level grid square consists of 4 4th-level grid squares (2 × 2) the 4th-level grid square consists of 4 5th-level grid squares (2 × 2) the 5th-level grid square consists of 4 6th-level grid squares (2 × 2)
The 3rd-level grid square code can be described as o00p0uqvrw o0p0uqvrw op0uqvrw o00puqvrw o0puqvrw opuqvrw
3rd-level grid square code (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) . (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10) 4th-level grid square code (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) . (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10)
Latitude length 40′ 5′
Longitude length 1◦ 7.5′
30′′
45′′
15′′
22.5′′
7.5′′
11.25′′
3.75′′
5.625′′
The 5th-level grid square code is computed from =
(7)
The 4th-level grid square code is computed from o00p0uqvrws2 o0p0uqvrws 2 op0uqvrws2 o00puqvrws2 o0puqvrws2 opuqvrws2
JIS X0410
o00p0uqvrws2 s4 o0p0uqvrws 2 s4 op0uqvrws2 s4 o00puqvrws2 s4 o0puqvrws 2 s4 opuqvrws2 s4
5th-level grid square code (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) , (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10)
=
(9)
and the 6th-level grid square code is computed from =
(8)
o00p0uqvrws2 s4 s8 o0p0uqvrws2 s4 s8 op0uqvrws2 s4 s8 o00puqvrws2 s4 s8 o0puqvrws 2 s4 s8 opuqvrws2 s4 s8
6th-level grid square code (p < 10, u < 10) (10 ≤ p < 100, u < 10) (p ≥ 100, u < 10) . (p < 10, u ≥ 10) (10 ≤ p < 100, u ≥ 10) (p ≥ 100, u ≥ 10)
=
(10)
its northwestern corner can be computed as In the above equations, all integers p, q, r, u, v, w, s2 , ( s4 , and s8 were calculated from latitude and longitude as latitude = (1 − 2x) p × 40 ÷ 60 + q × 5 ÷ 60 follows: ) + (r − x + 1) × 30 ÷ 3600 , (16) ( p := ⌊(1 − 2x)latitude × 60 ÷ 40⌋ (p is two or three digits), { } longitude = (1 − 2y) 100 × z + u + v × 7.5 ÷ 60 a := (1 − 2x)latitude × 60 ÷ 40 − p × 40, ) + (w + y) × 45 ÷ 3600 . (17) q := ⌊a ÷ 5⌋ (q is one digit), b := (a ÷ 5 − q) × 5, When the 4th-level grid square code is given as r := ⌊b × 60 ÷ 30⌋ (r is one digit), opuqvrws2 (o(1 digit), p(3 digits), u(2 digits), q(1 digit), c := (b × 60 ÷ 30 − r) × 30, v(1 digit), r(1 digit), w(1 digit), and s2 (1digit)), latitude s2u := ⌊c/15⌋ (s2u is one digit), and longitude at its northwestern corner can be computed as d := (s2u /15 − s2u ) × 15, ( latitude = (1 − 2x) p × 40 ÷ 60 + q × 5 ÷ 60 s4u := ⌊d/7.5⌋ (s4u is one digit), e := (d/7.5 − s4u ) × 7.5, + r × 30 ÷ 3600 ) s8u := ⌊e/3.75⌋ (s8u is one digit), + ((s2 − x) mod 2) × 15 ÷ 3600 , (18) u := ⌊(1 − 2y)longitude − 100z⌋ (u is one or two digits), ( longitude = (1 − 2y) 100 × z + u + v × 7.5 ÷ 60 f := (1 − 2y)longitude − 100z − u, . v := ⌊f × 60 ÷ 7.5⌋ (v is one digit), + w × 45 ÷ 3600 ) g := (f × 60 ÷ 7.5 − v) × 7.5, + ⌊(s2 + y − 1)/2⌋ × 22.5 ÷ 3600 . (19) w := ⌊g × 60 ÷ 45⌋ (w is one digit), When the 5th-level grid square code is given as h := (g × 60 ÷ 45 − w) × 45, opuqvrws s := ⌊h/22.5⌋ (s is one digit), 2 s4 (o(1 digit), p(3 digits), u(2 digits), q(1 digit), 2l 2l v(1 digit), r(1 digit), w(1 digit), s2 (1 digit), and s4 (1 digit)), i := (s /22.5 − s ) × 22.5, 2l 2l latitude and longitude at its northwestern corner can be s := ⌊i/11.25⌋ (s is one digit), 4l 4l computed as j := (s /11.25 − s ) × 11.25, 4l 4l ( s8l := ⌊j/5.625⌋ (s8l is one digit), latitude = (1 − 2x) p × 40 ÷ 60 + q × 5 ÷ 60 s := s × 2 + s + 1 (s is one digit), 2 2u 2l 2 + r × 30 ÷ 3600 s4 := s4u × 2 + s4l + 1 (s4 is one digit), s8 := s8u × 2 + s8l + 1 (s8 is one digit) + ((s2 − 1) mod 2) × 15 ÷ 3600 ) (11) + ((s4 − x) mod 2) × 7.5 ÷ 3600 , (20) ( However, we need to determine a position of a grid square longitude = (1 − 2y) 100 × z + u + v × 7.5 ÷ 60 from its grid square code. Such a transformation can be described as follows: If we have the 1st-level grid square code opu(o(1 digit), p(3 digits), and u(2 digits)), then latitude and longitude at its northwestern corner can be computed as latitude = longitude =
{ } (1 − 2x) (p − x + 1) × 40 ÷ 60 ,(12) ( ) (1 − 2y) 100 × z + u + y . (13)
If we have the 2nd-level grid square code opuqv(o(1 digit), p(3 digits), u(2 digits), q(1 digit), and v(1 digit)), then latitude and longitude at its northwestern corner can be computed as { latitude = (1 − 2x) p × 40 ÷ 60 } + (q − x + 1) × 5 ÷ 60 , ( longitude = (1 − 2y) 100 × z + u ) + (v + y) × 7.5 ÷ 60 .
(14) (15)
When the 3rd-level grid square code is given as opuqvrw(o(1 digit), p(3 digits), u(2 digits), q(1 digit), v(1 digit), r(1 digit), and w(1 digit)), latitude and longitude at
+ +
w × 45 ÷ 3600 ⌊(s2 − 1)/2⌋ × 22.5 ÷ 3600
+
) ⌊(s4 + y − 1)/2⌋ × 11.25 ÷ 3600 . (21)
When the 6th-level grid square code is given as opuqvrws2 s4 s8 (o(1 digit), p(3 digits), u(2 digits), q(1 digit), v(1 digit), r(1 digit), w(1 digit), s2 (1 digit), s4 (1 digit), and s8 (1 digit)), latitude and longitude at its northwestern corner can be computed as ( latitude = (1 − 2x) p × 40 ÷ 60 + q × 5 ÷ 60 + r × 30 ÷ 3600 + ((s2 − 1) mod 2) × 15 ÷ 3600 + ((s4 − 1) mod 2) × 7.5 ÷ 3600 ) + ((s8 − x) mod 2) × 3.75 , (22) ( longitude = (1 − 2y) 100 × z + u + v × 7.5 ÷ 60 + w × 45 ÷ 3600 + ⌊(s2 − 1)/2⌋ × 22.5 ÷ 3600 + +
⌊(s4 − 1)/2⌋ × 11.25 ÷ 3600 ) ⌊(s8 + y − 1)/2⌋ × 5.625 .
(23)
D EFINITION x: Latitude 0: latitude > 0◦ 0: latitude > 0◦ 0: latitude > 0◦ 0: latitude > 0◦ 1: latitude < 0◦ 1: latitude < 0◦ 1: latitude < 0◦ 1: latitude < 0◦
y: 0: 0: 1: 1: 0: 0: 1: 1:
OF
Longitude longitude > 0◦ longitude > 0◦ longitude < 0◦ longitude < 0◦ longitude > 0◦ longitude > 0◦ longitude < 0◦ longitude < 0◦
Table II 0 TH - LEVEL GRID SQUARE CODES z: 0: 1: 0: 1: 0: 1: 0: 1:
Range of longitude |longitude| < 100◦ |longitude| ≥ 100◦ |longitude| < 100◦ |longitude| ≥ 100◦ |longitude| < 100◦ |longitude| ≥ 100◦ |longitude| < 100◦ |longitude| ≥ 100◦
The total number of 1st-level grid squares over the world is 360 × 180 × 3/2 = 97, 200. The total number of 2nd-level grid squares is 64 times larger than the total number of 1stlevel grid squares. The total number of 3rd-level grid squares is 100 times larger than the total number of 2nd-level grid squares. The total number of 4th-level grid squares is 4 times larger than the total number of 3rd-level grid squares. The total number of 5th-level grid squares is 4 times larger than the total number of 4th-level grid squares. The total number of 6th-level grid squares is 4 times larger than 5th-level grid squares. We can also approximate the total number of grid squares in land since Earth’s total land mass is 29.1998% of its total surface. Thus, the total number of each level grid squares over the world can be estimated in Table III. Table III T HE TOTAL NUMBER OF EACH LEVEL GRID SQUARES OVER THE WORLD . Layer type 1st level 2nd level 3rd level 4th level 5th level 6th level
# of grid squares 97,200 622,080 622,080,000 2,488,320,000 9,953,280,000 39,813,120,000
# of grid squares in land 28,383 2,838,220 181,646,111 726,584,445 2,906,337,778 11,625,351,113
The shape of the grid square is not a complete square but a trapezoid with some slight curvature. The northern west-to-east span is normally different from the southern west-to-east span. Denoting the northern west-to-east span as W1 , the southern west-to-east span as W2 , and the northto-south span as H, we can approximate the area of a given grid square as A = (W1 + W2 )H/2. The area of each level of grid square depends on its latitude. For example, the area of the third-level grid square is approximately equal to about 1.28 km2 at the equator and becomes less than about 1.28 km2 around the north and south poles increasing latitude. Since the area of a grid square is independent of longitude but dependent on latitude, we have an interest in the dependence of the area on the latitude. Moreover, the area approximation is sensitive to a selection of a geodetic datum. Several geodetic systems such as OSGB36, GRS80, WGS 84, and EGM2008 are proposed. Since the global WGS 84 datum is widely adopted, we
x|y|z 0|0|0 0|0|1 0|1|0 0|1|1 1|0|0 1|0|1 1|1|0 1|1|1
Decimal expression 1 2 3 4 5 6 7 8
estimated the area of each level of the grid square in terms of latitude. Figure 2 shows relationships between latitude and area of grid squares at each level. The maximum area of grid squares at each level is given on the equator line (latitude = 0). The maximum area at the 1st-level grid square is 8191.83 km2 , at the 2nd-level is 128.00 km2 , at the 3rd-level is 1.28 km2 , at the 4th-level is 0.32 km2 , at the 5th-level is 0.08 km2 , and at the 6th-level is 0.02 km2 . IV. T RIAL OF WORLD GRID SQUARE STATISTIC DATA We attempted to compute the world grid square data of administrative areas for 252 countries and regions based on administrative boundaries downloaded from the Global Administrative Data (GADM) [9]. The algorithm to compute the grid square data is as follows: 1) Extract one boundary included in the GADM and express it as D. 2) Create a rectangular area covering D and express it as K. 3) Define a set S of a given level of world grid squares included in K. 4) If si ∈ S has a common area with D, then display a grid square code of si and administrative area name of D. 5) Do step 4 for any si included in S. Table IV shows an example of grid square data for administrative areas of Japan. These data contain country code (ISO 3166-1 alpha-3), country name, level1 name, level1 local name, level1 type, leve2 name, level2 local name, level2 type, level3 name, level3 local name, level3 type, level4 name, level4 local name, level4 type, level5 name, level5 local name, level5 type, world grid square code, minimum latitude, minimum longitude, maximum latitude, maximum latitude, west-to-east span, west-to-east span, north-to-south span, and area. In the case of Japan, level1 name is a prefecture name in English, level1 local name is a prefecture name in Japanese, and level1 type is ’Prefecture’. Level2 name is a local government (city, town and village) name in English, the level2 local name is a local government name in Japanese, and level2 type is a local government type. Japanese administrative area ends up to the second level. Thus, from level 3 to level 5 are empty。
Figure 2. Relationships between latitude and area of grid squares at each level: (a) 1st level, (b) 2nd level, (c) 3rd level, (d) 4th level, (e) 5th level, and (f) 6th level.
9000
140
1.4
8000
120
1.2
4000 3000
80 60 40
2000
(a)
0 -100 -80 -60 -40 -20 0 20 40 60 80 100 latitude [degree]
0.2
0.09
0.3
0.08 0.07
0.15 0.1
0.06
2
area [km ]
2
area [km ]
2
0.2
0.05 0.04 0.03 0.02
0.05 0 -100 -80 -60 -40 -20
(d)0.01 0
20 40 60 80 100
0.6
(b)
0 -100 -80 -60 -40 -20 0 20 40 60 80 100 latitude [degree]
0.35
0.25
0.8
0.4
20
1000
area [km ]
1
2
5000
2
6000
area [km ]
100 area [km ]
2
area [km ]
7000
0 -100 -80 -60 -40 -20
latitude [degree]
20 40 60 80 100
0.008 0.006 0.004 0.002 0 -100 -80 -60 -40 -20
latitude [degree]
Table IV E XAMPLE OF GRID SQUARE DATA FOR ADMINISTRATIVE AREAS OF JAPAN field name ISO 3166-1 alpha-3 country name level 1 administrative area name level 1 original name level 1 type level 2 administrative area name level 2 original name level 2 type level 3 administrative area name level 3 original name level 3 type level 4 administrative area name level 4 original name level 4 type level 5 administrative area name level 5 original name level 5 type world grid square code north side latitude (◦ ) west side longitude (◦ ) south side latitude (◦ ) east side longitude (◦ ) northern west-to-east span (km) southern west-to-east span (km) north-to-south span (km) area (km2 )
entity JPN Japan Kyoto 京都府 Prefecture Kyoto 京都市 City
2052353533 34.950000 135.662500 34.941667 135.675000 1.141796 1.141911 0.923458 1.054454
Figure 3 shows a visualization of the results for administrative areas of eight countries. The same color expresses grid squares included in the same administrative area. Table V shows descriptive statistics on the third-level grid square statistics data for administrative areas for eight countries. The data can be downloaded from a web page of the Research Institute for World Grid Squares [8]. The web
(c)
0.02 0.018 0.016 0.014 0.012 0.01
(e) 0
0 -100 -80 -60 -40 -20 0 20 40 60 80 100 latitude [degree]
(f) 0
20 40 60 80 100
latitude [degree]
page provides grid square data for administrative areas for over 252 countries and regions including the eight countries. Table V D ESCRIPTIVE STATISTICS OF 3 RD - LEVEL GRID SQUARE STATISTICS DATA FOR ADMINISTRATIVE AREAS FOR 8 COUNTRIES Country name Argentina Australia France Italy India Japan South Korea New Zealand
ISO3166-1 ARG AUS FRA ITA IND JPN KOR NZL
# records 2,757,659 6,862,241 854,787 424,847 3,129,219 428,242 109,624 370,112
File size (MB) 433.952 951.824 176.476 69.716 498.368 67.216 21.412 68.080
V. C ASE S TUDIES In the case of Japan, we have more than 40 years history about grid square statistics. We have various types of grid square statistics generated from both government bodies [2], [3] and industrial sectors. Several government statistics such as administrative areas, population census, economic census, land numerical information, land usage information, and hazards of some natural disasters are provided as grid square statistics, which are part of government open data. We identified that World Grid Square statistics can be used in the following six cases: 1) Data linkage and data processing: We can link different grid square statistics (linkage) and use them in operations among different grid square statistics, synthesizing new grid square statistics from several original data types. 2) Mapping: We can visualize grid square statistics on a map for use in analyzing our focal area.
(a)
(b)
(c)
(e)
(g)
(d)
(f)
(h)
Figure 3. These maps shows level 2 administrative areas. Grid squares with the same administrative area are drawn with the same color. (a) Argentina, (b) Australia, (c) France, (d) Italy, (e) India (f) Japan, (g) South Korea, and (h) New Zealand.
3) Data creation on given areas: We can generate statistics on a given area by recalculating grid square statistics for that area. 4) Identifying effective areas: As grid squares make it easy to measure distances among grid squares, World Grid Square statistics can be used to calculate demand within a given distance. 5) Defining observation areas: Grid square statistics can be used to define an area for collection of data or samples. 6) Unit for numerical simulation: Grid square statistics can be used to conduct numerical simulations for a unit such as diffusion processes, percolation models,
and migration processes. In order to show usefulness of grid square codes and grid square statistics, we mention three types of case studies such as government accommodation statistics survey, comparative analysis between job opportunity data collected from an internet job matching site and government statistics, and risk assessment of a natural disaster. A. Accommodation activity in government statistics The Accommodation Survey in Japanese Tourism Statistics is a quarterly survey conducted by the Japan Tourism Agency (Kankocho) of the Ministry of Land, Infrastructure, Transport and Tourism (MLIT). The survey collects data concerning accommodations and travel trends from several
perspectives across the country and provides a fundamental database that captures the Japanese tourism sector and informs tourism policies. The 3rd level grid square statistics data can be generated from micro-data of the Accommodation Survey based on Article 33 of the Statistics Act. Such grid square data were generated from micro-data of Accommodation Survey in Japanese Tourism Statistics [10]. There are four steps to generate grid square statistics from micro-data: (1) Data validation, (2) Geocoding, (3) Encoding and (4) Totaling. The total and actual number of travelers, the total and actual number of foreign travelers per month, the actual number of used rooms (Question 7), the number of foreign travelers per nationality. Figure 4 shows the 3rd level grid square statistics of the total number of foreign travelers per nationality (China and USA). We confirmed that foreign travelers stayed at various areas in Japan. Meanwhile, we confirmed that there are some highly dense areas. We can recognize difference of travel patterns between people in China and USA. It is confirmed that Chinese tourists visited western Japan and Hokkaido more than tourists from USA. We can also compare the number of tourists in each grid square.
B. Comparative analysis between job opportunity data and government statistics The second example is a comparative analysis between data collected from an internet job matching site and government statistics. We used data on job opportunities gathered from the “from A navi” website of Recruit’s job matching service [11]. The data contain job types, duration of calls, wages, geographical positions (latitude and longitude) of job interview places and/or working places, prefecture, the names of firms offering the job opportunities, wages, types of occupations, workplace, and working hours. By using the geographical information of working and/or interview places, we computed the daily number of job opportunities for each grid square. We conducted a regression analysis of the number of job opportunities with respect to three socioeconomic quantities: population (2010) Xp (c), the number of firms (2012) Xf (c), and the number of workers (2012) Xw (c). These government grid square statistics were collected from web pages of Statistics Bureau of Japan, Ministry of Internal Affairs and Communications [2]. We assumed there is a power-law relationship between the number of job creations Y (c) and each of the three socioeconomic quantities Xi (c) (i ∈ {p, f, w}). Y (c) = Ci Xi (c)αi ,
# of persons
H2508 : China
14000 12000 10000 8000 6000 4000 2000 0 125
130
135 140 longitude
145
150
50 45 40 latitude 35 30 25
(1)
(24)
Figure 5 shows double logarithmic plots between the number of job opportunities and the three types of socioeconomic quantities (the population, the number of firms, and the number of workers). We confirmed there are power-law relationships. Table VI shows parameter estimates obtained with the ordinary least squares (OLS). We confirmed that a value of αi for each case is statistically significant based on the t-test. Specifically, the number of workers is the strongest correlation with the number of job opportunities in three socioeconomic quantities. Table VI PARAMETER ESTIMATES FOR LOGARITHMIC FORM
H2508 : USA
# of persons
OF THE POWER - LAW RELATIONSHIPS BETWEEN THE NUMBER OF JOB OPPORTUNITIES AND SOCIOECONOMIC QUANTITIES SUCH AS POPULATION IN 2010, THE NUMBER OF FIRMS IN 2012, AND THE NUMBER OF WORKERS IN 2012.
9000 8000 7000 6000 5000 4000 3000 2000 1000 0 125
130
50 45 40 latitude 35 30 25
(population) (firms) (workers)
adj R2 0.208520 0.310414 0.316345
α 0.388387 0.488481 0.457518
t-value 43.410256 56.776629 57.564345
p-value < 10−10 < 10−10 < 10−10
(2)
135 140 145 150 longitude Figure 4. 3rd level grid square statistics of the total number of foreign travelers per nation in August 2013. (1) China and (2) USA.
C. Risk Assessment of a natural disaster In general, risk is assessed as multiplication among hazard, vulnerability, and exposed value. In the case of risk
7
We confirmed that several 3rd-level grid squares contain significant risk of earthquakes against population. Moreover, we found that population in grid squares with high seismic hazard is less than one in grid squares with low seismic hazard.
jobs(populations)
# of job opportunities
6 5 4 3 2 1
(a)
0 0
2
4 6 8 # of populations
7
10
12
jobs(firms)
# of job opportunities
6 5 4 3 2 1
(b)
0 0
1
2
3
4
5
6
7
8
9
10
# of firms 7
Figure 6. Scatter plots between the 3rd level grid square statistics of seismic hazards more than scale 6 in Japan and 3rd level grid square statistics of Japanese population census in 2010.
jobs(workers)
# of job opportunities
6 5 4
VI. D ISCUSSION
3 2 1
(c)
0 0
2
4
6 8 logarithmic plots # of workers
10
12
14
Figure 5. Double between the number of job opportunity ads and socioeconomic quantities: (a) the population, (b) the number of firms, and (c) the number of workers.
of loss of lives at a grid square c, risk R(c) represents an expected value of loss of lives, defined as R(c) = F (c) × V ul(c) × P op(c),
(25)
where in a given grid square c, F (c) [events/year] (hazard) represents the frequency of a natural disaster, V ul(c) is vulnerability, having a value ranging from 0 to 1, and P op(c) [killed persons/year] expresses the population exposed to the natural disaster. Specifically, the worst case scenario (V ul = 1) is called physical exposure, which is defined as P hExp(c) = F (c) × P op(c). (26) In Japan, we have grid square statistics of seismic hazards published by National Research Institute for Earth Science and Disaster Resilience (NIED) [18], which is called Japan Seismic Hazard Information Station (J-SHIS) [19]. Figure 6 shows scatter plots between the 3rd level grid square statistics of seismic hazards more than scale 6 published by J-SHIS and 3rd level grid square statistics of population census in 2010 published by Statistics Bureau of Japan, Ministry Internal Affairs and Communications.
Generally speaking, data are collected by different persons in different organizations. Each organization has different data protection policy. This means that it hard for them to communicate each other and share their data. All we can do is to share an industrial standard to generate data regarding anonymity of original data. On the other hand, users who use data want to know situations expressed by the data with high both temporal and spatial resolution. The users normally want to compare, integrate, synthesize data collected from different data sources. To do so, we normally need original data including private information. In the data sharing activity, we need to carefully consider a balance between privacy and public utility. For this purpose, grid square statistics proposed by our paper can be used to share the data regarding anonymization. If we share grid square codes and their grid square as a standard, then we can generate grid square statistics in different organizations independently without communications. The users can compare, integrate, and synthesize data provided by different organizations. Moreover, we can apply several anonymization model such as k-anonymity [12], l-diversity [13] and tcloseness [14] to generating grid square statistics from original point data. For present purposes, we identified three types of data source: government statistics, satellite imagery, and point data collected from the Internet. For example, Global Soil Information Facilities provide girded data originated from satellite imagery [15], and OpenStreetMap provides point
data about objects [16]. The 3 ×3 resolution of the 3word addresses of What3words [17] can be embedded. These data can be used as primary data to produce World Grid Square statistics.
[3] National Land Numerical Information download service, National Land Information Division, National Spatial Planning and Regional Policy Bureau, Ministry of Land, Infrastructure, Transport and Tourism in Japan. [Online]. Available: http://nlftp.mlit.go.jp/ksj-e/. Accessed on: Sep. 19, 2015.
VII. C ONCLUSION
[4] Ordnance Survey. [Online]. Available: https://www. ordnancesurvey.co.uk/support/the-national-grid.html. Accessed on: Jun. 21, 2014.
Japan has a standard definition of grid square codes (JIS X0410) that was established in 1976. In this paper, we proposed a novel extension of JIS X0410 for worldwide usage. By using this definition, we can identify 510 million km2 on the earth with six levels of resolution and express each area as a unique numeric sequential code. We implemented the algorithm to compute grid square codes, including a position indicated by latitude and longitude, with R, JavaScript, PHP, and python. Moreover, we showed examples of third-level world grid square data about administrative areas for 252 countries and regions all over the world based on GADM. These data have been released through a web page of the Research Institute for World Grid Squares [8]. This Japanese standard can be extended for worldwide usage. We discussed usefulness of grid square codes and corresponding grid squares. Several anonymization models such as k-anonymity, l-diversity, and t-closeness can be applied to generating grid square statistics. We showed three types of case studies on grid square statistics in Japan, comparative analysis between data collected from an internet site and government statistics, and application to risk assessment. If we can share the world gird square statistics data globally, we can compare and merge different grid square data and create new grid square data from different grid square data for each country. Such activities can be realized by the world grid square statistics data and enable us to understand our socioeconomic activities on a global scale. This will contribute to deepening our understanding of the human influence on the earth and to stabilize socioeconomic activities. ACKNOWLEDGEMENT This work is supported by Japan Science and Technology Agency (JST) PRESTO Grant Number JPMJPR1504, Japan. This research also used computational resources of the High Performance Computing Infrastructure (HPCI) system provided by the Institute of Statistical Mathematics through the HPCI System Research Project (Project ID:hp160060). R EFERENCES [1] Statistics Bureau, Ministry of Internal Affairs and Communications, “Overview of Grid Square Statistics.” [Online]. Available: www.stat.go.jp/english/data/mesh/gaiyou.htm. Accessed on: Sep. 19, 2015. [2] Statistical GIS, Statistics Bureau, Ministry of Internal Affairs and Communications. [Online]. Available: http://www. e-stat.go.jp/SG1/estat/toukeiChiri.do?method=init, Accessed on: Jun. 14, 2014.
[5] A. Annoni, Ed., “European Reference Grids,” volume EUR 21494 EN. European Commission, Joint Research Centre, 2005. [Online]. Available: http://www.ec-gis.org/sdi/publist/ pdfs/annoni2005eurgrids.pdf. Accessed on: Sep. 14, 2015. [6] GEOSTAT 1A. [Online]. Available: http://www.efgs.info/ geostat/1a. Accessed on: Sep. 14, 2015. [7] OGC Discrete Global Grid System (DGGS) Core Standard (15-104r3). [Online]. Available. http://portal.opengeospatial. org/files/66643. Accessed on: Nov. 10, 2016. [8] Research Institute for World Grid Squares (RIWGS). [Online]. Available: http://www.fttsus.jp/worldmesh [9] Global Administrative Data (GADM). [Online]. Available: http://www.gadm.org. Accessed on: Nov. 10 2016. [10] Aki-Hiro Sato, “Microdata analysis of the accommodation survey in Japanese tourism statistics,” Big Data (Big Data), 2015 IEEE International Conference on, Oct. 29 2015-Nov. 1, 2015, pp. 2700–2708. [11] “from A navi” in Recruit Web Service, http://www.froma. com/, Accessed on: Oct. 10 2017. [12] L. Sweeney, “k-anonymity: a model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, vol. 10, no. 5, pp. 557–570, Oct. 2002. [13] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-diversity: Privacy Beyond K-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, 3, Mar. 2007. [14] N. Li, T. Li, and S. Venkatasubramanian, “t-Closeness: Privacy beyond k-anonymity and l-diversity,” Data Engineering, 2007. IEEE 23rd International Conference on, 15-20 April, 2007. [15] Global environmental layers. [Online]. Available: http:// worldgrids.org/. Accessed on: Oct. 14, 2017. [16] Open Street Map. [Online]. Available: http://www. openstreetmap.org/, Accessed on: Oct. 14, 2017. [17] What3Words. [Online]. Available: https://what3words.com. Accessed on: Nov. 12, 2017. [18] National Research Institute for Earth Science and Disaster Resilience (NIED). [Online]. Available: http://www.bosai.go. jp/e/. Accessed on: Dec. 15, 2016. [19] Japan Seismic Hazard Information Station (J-SHIS). [Online]. Available: http://www.j-shis.bosai.go.jp/en/. Accessed on: Dec. 12, 2016.