GeoJournal (2012) 77:119–134 DOI 10.1007/s10708-010-9390-6
Using web demographics to model population change of Vietnamese-Americans in Texas between 2000 and 2009 Tzee Kiu Edwin Chow · Yan Lin · Niem Tu Huynh · John M. Davis
Published online: 23 October 2010 © Springer Science+Business Media B.V. 2010
Abstract With respect to the great wealth of information available online, the Internet can be viewed as a gigantic database with diverse resources. One of the pressing issues is to investigate the effectiveness and usefulness of the available information over the Internet. This research modeled the population change of Vietnamese-Americans (VA) in Texas from 2000 to 2009 by obtaining web demographic data from the Internet. The project objective is to pilot study a novel approach to conducting online “census” by using Web 2.0 technologies and to investigate the effectiveness of web data for GISbased demographic application. The solicited VA demographics were geocoded at both county and census tract levels and compared with the Census 2000 demographics in the Geographic Information Systems. Spatial and statistical analyses were used to explore the spatial distribution of VA and to model their population change between 2000 and 2009. The findings of this study include: (1) in general, there are
T. K. E. Chow (&) · Y. Lin · N. T. Huynh Texas Center for Geographic Information Science, Department of Geography, Texas State University – San Marcos, 601, University Dr., San Marcos, TX 78666, USA e-mail:
[email protected] J. M. Davis Department of Psychology, Texas State University – San Marcos, 601, University Dr., San Marcos, TX 78666, USA
significant differences in the spatial distribution of the VA population between the web demographics and Census 2000 at both county and census tract levels, (2) the Hoover Indices of VA population in Texas at 2000 and 2009 revealed a trend of deconcentration which conforms to the general rural-urbansuburban migration among major metropolitan areas in Texas. This study sheds new insights to using web demographic data for population predictions and applications to plan services for ethnic groups. Keywords Web Demographics · Vietnamese-Americans · Web 2.0 · Census · Population change · Geographic information systems (GIS)
Introduction International immigrants in Texas Numerous factors, economic and geographical to name a few, have made it possible for Texas to possess the third largest population of foreign-born immigrants in the US after California and New York according to the Census 2000 (Malone et al. 2003). The foreign-born population in Texas consists mainly of Hispanic, African American, Asian, and European immigrants. Major metropolitan areas, including Dallas/Fort Worth, San Antonio, Houston, and El Paso, are attractive to many international immigrants,
123
120
notably the Hispanic population, due to its connection to the international border and sea ports (Jordan 1986; Mohl 2003). Thus, the number of immigrants arriving in Texas continues to increase. Among the international immigrants in Texas, Vietnamese-Americans (VA) are unique from a historical, cultural, and demographic perspective. In 2000, Texas had the second largest Vietnamese population after California. Starting in the 1970s, multiple waves of Vietnamese refugees arrived in the United States because of the war and domestic unrest in Vietnam. After their initial establishment, the VA continued their migration until the 1990s. From 1980 to 1990, the VA population grew by more than 135%, the fastest among Asian immigrants (Bruno and Beilke 2005). Unlike other immigrants in the United States, the first generation of VA refugees were scattered throughout the country as a result of the refugee dispersal policy (Do 1996). The US government later revised the dispersal policy and a relocation of the Vietnamese population occurred. In 1990, the VA population was more concentrated in major metropolitan areas than it had been a decade ago. Cities such as Los Angeles, California and Boston, Massachusetts gained large VA communities. Within Texas, rapid VA population growth occurred from 1980 to 2000 in Harris County. During this period, the core Vietnamese community and its associated businesses shifted from downtown Houston to midtown areas, however, many VA remained in the downtown area. The segmented VA communities reflected the socioeconomic gap among various VA subgroups (Vu 2006). Similar to other ethnic minorities (e.g., Chinese, Filipino), VA immigrants commonly lived in highly concentrated communities and ethnic enclaves giving them a strong sense of ethnic identity (Walker and Hannan 1989). After a period of transitional settlement, and after having gained higher socioeconomic status, VA immigrants dispersed across the nation and into suburban neighborhoods. Nevertheless, the story of Vietnamese immigration was far more complicated. Vu (2006) studied the construction of a VA community in Houston, Texas, during 1975–2005 and reported that there were three waves of immigration among VA. The first wave of refugees arrived in 1975 and included professionals who were able to speak fluent English and who possessed higher levels of education; they were intentionally dispersed by the US government into many states. The second wave of
123
GeoJournal (2012) 77:119–134
Vietnamese refugees consisted primarily of the “boat people” who arrived during 1978–1982. The “boat people” included a mix of Chinese entrepreneurs who had been a minority group in Vietnam and ethnic Vietnamese. Both groups sought to escape the socialist regime. The third wave, between the late 1980s and the early 1990s, brought prisoners of war, including former South Vietnamese military members and former South Vietnamese politicians. The diverse backgrounds of the three waves of refugees resulted in various migration determinants. Moreover, these diverse backgrounds influenced patterns of acculturation into the American society as well as on the construction of the various VA communities in Houston (Walker and Hannan 1989; Vu 2006). Population monitoring Census is the art and science of collecting demographic and socioeconomic information about a population at a given time (Kish 2004). In the United States, a decennial census is conducted every decade. In addition, during the intervening years, ongoing surveys, known as the American Community Survey (ACS), are conducted to sample selected households using a detailed survey. The census data provide valuable information for policy making, resource allocation, and for many scientific studies (Passell 2001). Nevertheless, taking an accurate census over a large, diverse, mobile and growing population is not easy (Wright 2000). There have been growing concerns regarding the completeness of census counts and accuracy of its derived information. These concerns have resulted in numerous criticisms. As a result, the Census Bureau has adopted various new enumeration strategies (Anderson and Fienberg 1999; U.S. Census Bureau 2004). Conventional methods for monitoring the spatial and temporal distribution of a population within a geographic area typically involve various forms of field surveying, such as mail delivery, phone and personal interviews. These surveying methods, however, are expensive, labor intensive, and time consuming. According to the US General Accounting Office (GAO 2001), the total cost of Census 2000 was close to $6.5 billion, nearly double the cost of Census 1990 ten years before. The most expensive component of the conventional surveying method is the field data collection together with its support system. The cost of this component was approximately $32 per
GeoJournal (2012) 77:119–134
household in Census 1990 and $56 per household in Census 2000 (GAO 2001). Another major drawback of conventional surveying methods is the nonresponse or unwillingness of individuals to provide their demographics. Although the decennial census is mandated in the US constitution, many perceive the collection of household demographics as an intrusion of their privacy, indicated by a decreasing mail response rate since the 1970s (Quarzo 2000). The low response rate and other complicating factors such as a mobile population and group quarters result in the underestimation of certain population subgroups (U.S. Census Bureau 2010). Scientists are exploring alternative methods in order to improve population estimation. Alternate methods include using an independent survey (Wright 2000), satellite imagery (Chen 2002), spatial analyses using Geographic Information System (GIS) (Hansen et al. 2007), and spatial interpolation techniques (Lo 2008). While there has been a solid foundation in statistical research to adjust for non-response errors through sampling and statistical inferences (Wright 2000; Nirel and Glickman 2009), application of such techniques is not without legal controversies. For example, the US Supreme Court ruled against the use of sampling for Congress apportionment in Department of Commerce v. United States House of Representatives 525 US 316 in 1999. The controversial nature and increasing cost of conventional surveying methods remain major obstacles to improving the completeness and accuracy of population monitoring. Personal level demographics on the internet On the other hand, a repository of personal information can be harvested from the Internet. There are some search engines, such as WhitePages, provide easy access to individual records based on simple search criteria such as last name and zip code. These people search engines gather personal information from publicly available and third party data suppliers, including phone directories, birth, property, marriage, death, and court records, vehicle registration, criminal history, social networking websites, service subscription, and credit card history. The emergence of people search engine websites is part of a social phenomenon that is often referred to as Web 2.0. The term Web 2.0 was coined by O’Reilly (2007) and has gained
121
popularity in recent years. Web 2.0 is a web culture that allows users to personalize and contribute contents to web pages, such as wiki-projects (Sui 2008), social networking sites, the “mash-up” of web services/data from multiple sources (Goodchild 2008), and the overflow of Volunteered Geographic Information (Goodchild 2007). Web 2.0 has gained notoriety since its inception in 2003. It represents an emergence of diverse web technologies, such as aggregators, web scraping, and Application Programming Interface (API). These technologies power the interconnectivity and interactivity of dynamic personal content over the Internet. The advent of Web 2.0 technologies presents an alternative way to collect large quantities of personal information on the Internet (hereafter referred to as web demographics) as supplementary data. The web demographics available through these people search engines are very different from official census data, which are collected to describe a population at a specific time. The people search engines obtain personal level data, including full name, street level address, phone number, household member(s), and other information from the public domain and these data are updated regularly. For example, WhitePages offers two primary search modes for updates of their listing (2010). These are near-time (monthly) and real-time (daily). The attributes of such web demographics may be subject to errors, such as typographical errors in name, incorrect age, or even outdated address. This type of error is inevitable and is possible at the point of data entry, even for Census statistics. However, the presence of such a record is likely to be the result of some socioeconomic activities (e.g. driving records and credit history), in other words, a trace of an individual in the digital age. For counting purposes, population estimates based web demographics which provide data regarding the presence or absence of individuals are more accurate and reliable than the demographics attributes (e.g. household information). Therefore, the population estimate based on web demographics can potentially be useful for population monitoring. Many investigations of the World Wide Web (WWW) for scientific studies have focused on the role of the internet as a platform for online surveys (Pettit 1999; Reips 2001; Pealer et al. 2001). Pettit (1999) and Reips (2001) endorsed the use of a webbased questionnaire or virtual experimental laboratory to collect psychological data. Pealer et al. (2001)
123
122
conducted a study to examine the reliability of health data collected from undergraduate students through a web-based survey. They found that there was no significant difference between the data from conventional surveying methods (such as mail surveys) and from the web-based survey in terms of the demographics, response rate, item completion and error. What is less well understood from these studies is the accuracy and reliability of existing web data. In fact, Herring (2001, p. 213) reported that faculty in higher education are generally satisfied with the web as a research tool, though “they question the accuracy and reliability of much Web-based information and the sufficiency of Web Resources for research.” Linberger and White (1998) suggested that geographic data gathered over the web provides a substantial amount of useful information for GIS and market research. In addition to the wealth of information available on the web, existing Web 2.0 technologies also provide tools and resources, such as Application Programming Interface (API), to develop powerful Internet GIS applications (Chow 2008). However, more research is needed to examine the accuracy and potential for collecting spatial data over the web (Goodchild 2008). Despite concerns regarding the accuracy of web demographics, no prior scientific study has substantiated such concerns nor systematically examined their usefulness of web demographics for their potential to model population change. The purpose of the present study was to evaluate the usefulness of web demographics to model population change of VA in Texas between 2000 and 2009. This study was designed to answer the following research questions: (1) Is there any significant difference between web demographics and Census 2000 in VA population count? (2) Was there a change in the VA population distribution in Texas from 2000 to 2009? (3) In which geographic areas were there significant changes in the VA population between 2000 and 2009?
Methods Study area According to the Census 2000, Texas has the second largest Vietnamese population in the United States. The total population of Texas increased 18.8% from 20,851,820 in 2000 to 24,782,302 in 2009. During a
123
GeoJournal (2012) 77:119–134
similar period, the VA population in Texas had a 41.3% growth from 134,961 in 2000 to approximately 190,647 according to the American Community Survey (ACS) 2008 data. Among all Texan immigrants, VA have a unique migration history and a strong bonding within the ethnic group. In fact, Vietnamese is the third largest language group, after English and Spanish, in Texas (U.S. Census Bureau 2000). Approximately 80% of the VA population in Texas is distributed around metropolitan areas with total populations greater than 100,000, such as Dallas, Austin, Houston, and San Antonio (Pfeifer 2001). Clusters of high VA concentration in Texas are also found in some rural areas, such as Palacios—the shrimp capital of Texas, along the gulf coast. Soliciting web demographics The present study sampled web demographics from two popular people search engines: (1) Intelius (www. intelius.com) and (2) WhitePages (www.whitepages. com) during December 2009. WhitePages and Intelius are two of the most popular people search engines which empower many other similar websites. For example, WhitePages shares its database with other people search engines, including SwitchBoard (www. switchboard.com), MSN White Pages (msn.white pages.com), 411 (www.411.com), and Address (www. address.com). Similarly, Intelius empowers Spock (www.spock.com), provides analogous returns as PublicRecordsNow (www.publicrecordsnow.com) and USA-people-search (www.usa-peoplesearch.com). The later websites are restricted to a maximum quota of a hundred records per query. Social networking websites, such as Facebook (www.facebook.com) and MySpace (www.myspace. com), provide limited demographics and city locations, information requested (optional) at sign-up. Elimination of duplicate or fictitious records and verification of attribute accuracy from these social networking websites is not feasible. Thus, data from these networking sites were not included in the present research. Integrative people search engines, such as Pipl (www.pipl.com), Wink (www.wink. com), and Yahoo! People Search (people.yahoo. com), pull together personal demographics along with pictures, videos, publications, and company profiles from multiple sources, including the social networking websites. The quality of such data varies
GeoJournal (2012) 77:119–134
and would have required intensive clean up; hence, these sources were also not considered in the present research. A total of 92 Vietnamese family names and an exhaustive list of all Texas zip codes were used to solicit individual records of VA in Texas from the people search engines. The family names adopted in this research were expanded from 1) a list of family names collected from Vietnamese boat people during 1980 and 1981 in United Nations High Commission for Refugees (UNHCR) Transit camps in Hong Kong (Davis 1985), and 2) a representative sample of common Vietnamese family names (Hoa 2005). It was assumed that such a comprehensive list of Vietnamese family names would represent many VA in Texas. Since throughout Vietnamese history, many Chinese have been naturalized as VietnameseChinese through numerous historical interchanges, the list also included common Chinese family names that have known Vietnamese translation. The demographic data files obtained from the two people search engines were merged, reconciled, and then stored in a database. Erroneous enumeration (i.e. duplicate, fictitious, and incomplete records) were identified and removed by automated screening—a unique enumeration is defined as a person having a unique combination of first name, family name, and any optional demographic attributes, including streetlevel address, age, and name(s) of other household members. For example, a common name (e.g. Joe Smith) with two different street-level addresses (or age/name(s) of other household member) would be assumed to be two unique individuals. If no demographic attributes were available other than the person’s first and family name, only one of the enumerations was considered to be unique and all other records with the same name were considered duplicate as a conservative measure. It was assumed that the change of physical address by the same person had been updated to the web demographics in a timely manner. While most family names of VA are unique (e.g. Nguyen) and can be easily distinguished from other ethnic groups, a few family names (e.g. Cao and Lam) share common spelling in Mandarin and Cantonese. Web demographics with these common family names were screened manually to ensure that the returned records represented an accurate counting of VA. The authors, with their personal background
123
and knowledge in Vietnamese and Chinese languages, derived general rules for semantic inspection of web demographics for Quality Assurance and Quality Control (QA/QC). The WhitePages search engine returned 81,354 records whereas Intelius provided 20,957 records, representing 42.7% and 11.0%, respectively of 190,647 counts reported by ACS. After removing the erroneous enumerations, the combined web demographics totaled 97,600 records, representing a respectable sample of 51.2%. It was noted that individual records from WhitePages included street addresses whereas Intelius only provided city addresses. In addition, some records also supplied age and names of other household member. The Intelius/WhitePages records were later addressmatched into separate point layers and spatially joined, a process to summarize the count of point locations within a polygon, for each county/census tract. Due to the insufficient spatial details in the Intelius records (i.e. city address but not street address), the census tract layer contained only the WhitePages records whereas the county layer contained combined records from both search engines. Model population change Since the web demographics represented only a sample of VA in Texas in 2009, the solicited data was normalized to population percentages by using the sum of the total number of web demographics obtained. Similarly, county- and census tract-aggregated VA population data from Census 2000 were converted into population percentages as the baseline population. Hence, the population change of VA between 2000 and 2009 was simply the difference between the 2000 dataset and the 2009 dataset as follows: DP% ¼ P%2009 P%2000
ð1Þ
where ΔP% is the percentage of population change, P%2000 is the population percentage from Census 2000 and P%2009 is the population percentage of web demographics solicited in 2009. Hence, a positive ΔP % indicated an increasing VA population and a negative ΔP% represented a decreasing VA population between 2000 and 2009. Areas with significant population change were determined by calculating the z-score based on the mean (DP%) and standard deviation (s) of the population percentage as follows:
123
124
GeoJournal (2012) 77:119–134
DP%i DP% zi ¼ s
ð2Þ
The upper and lower 2.5% of the z-score distributions were used as the critical regions and alpha was set at the 0.05 level. In order to examine whether there was a significant difference in the VA distribution between 2000 and 2009, statistical tests were performed at county and census tract level, respectively. An examination of the spatial VA distribution, using statistical analyses, between the decennial Census and web demographics data was performed. At the county level, a repeated measures one-way Analysis of Variance (ANOVA) was used to compare the mean of VA population percentage among Intelius, Whitepages, combined web demographics (i.e. Intelius plus Whitepages), and Census 2000 for each county i. Post-hoc tukey tests were also used to compare the pair-wise difference within the group. The null hypothesis could be stated as: H1: P%Intelius 2009;i ¼ P%WhitePages 2009;i ¼ P%InteliusþWhitepages 2009;i ¼ P%Census 2000;i It was noted that the web records reflected the distribution of VA in 2009 while the census demographic portrayed their pattern in 2000. A paired t test (i.e. related samples) was used to compare the means of VA population percentages between Whitepages and Census 2000 for each census tract j. The second null hypothesis was: H2: P%WhitePages 2009;i ¼ P%Census 2000;i Pearson correlation was also used to evaluate the relationship among these datasets at the county and census tract levels. In additional to modeling change in population percentage, Hoover index was calculated for the population datasets at both county and census tract levels to detect change in VA population concentration from 2000 to 2009 (Long and Nucci 1997). Hoover Index is defined by Eq. 3: H ¼ 50
n X
j Ai Pi j
ð3Þ
i¼1
where H is the Hoover index, Pi is the proportion of the population in area i, Ai is the proportion of the land area in area i, and n is the total number of
123
geographic units. A perfectly concentrated population would have a high H approaching 100 while a perfectly dispersed population would have a score of 0 (Long and Nucci 1997). The Hoover index is useful to examine the overall population distribution and its change during a time period.
Results Distribution of population (percentage) The VA population data collected from web demographics in 2009 and from Census 2000 were normalized to population percentages for direct comparison. Figure 1 illustrates the spatial distribution of VA population percentage from Intelius, Whitepages, both search engines, and Census 2000 at the county level (n = 254). The descriptive statistics of population percentage from all databases revealed a high skewness and kurtosis, and hence the raw data were transformed into the log scale for statistical analyses (Table 1). The result of repeated measures one-way ANOVA was F(3, 759) = 19.96 and revealed a significant difference of the within-subject effect among the web demographics and Census 2000 data (p \ 0.001). Hence, the first null hypothesis was rejected, indicating significant change in the overall spatial difference of the population change between 2000 and 2009 at the county level. The Pearson correlation demonstrated moderate positive correlation among the datasets significant to the 0.001 level (Table 2). The correlation coefficients (r) were higher between Intelius (r = 0.449) and Census 2000 and between WhitePages (r = 0.425) and Census 2000 than between the combined web demographics (i.e. Whitepages and Intelius; r = 0.354) and Census 2000. The combined web demographics had a high correlation with WhitePages (r = 0.795), which provided more individual records than Intelius. At the census tract level (n = 4,388), only the records from Whitepages were compared with the Census 2000 demographics (Fig. 2). Also, the descriptive statistics of population percentage revealed a high skewness and kurtosis so the raw data were transformed into the log scale for statistical analyses (Table 3). A paired t test revealed a significant t (4,387) value of −12.265 (p \ 0.001), indicating a
GeoJournal (2012) 77:119–134
125
Fig. 1 Population percentages of Vietnam-American from a Census 2000, b Intelius, c WhitePages, and d Intelius and WhitePages at the county level
significant difference between WhitePages and Census 2000. Hence, the second null hypothesis was rejected. A Pearson correlation showed that the two datasets were positively correlated (r = + 0.229, p \ 0.001, df = 4,387). Population change between 2000 and 2009 Population change from 2000 to 2009 (ΔP%i) was calculated by subtracting the Census 2000 VA population percentage (P%Census 2000, i) from the web
demographics collected in 2009 (P%2009, i) for each geographic unit i. At the county level, the combined web demographics from Intelius and WhitePages were compared against the Census 2000 records (Fig. 3a). A cutoff threshold of ±0.005%, approximating 10 people based on the latest ACS data, was used to identify areas of negative or positive population change. It was found that most population change occurred in southeastern Texas and counties with high population concentration. The three counties that experienced the most population percentage decline
123
126
GeoJournal (2012) 77:119–134
Table 1 Descriptive statistics of the population percentage of Vietnamese-American from Census 2000 (C2k), Intelius (Int), WhitePages (WP), and both Int and WP (Int + WP) at the county level Raw data C2k
Log Int
Int + WP
WP
C2k
Int
WP
Int + WP −1.474
Mean
0.394
0.394
0.394
0.394
−1.228
−0.937
−1.322
SE of mean
0.184
0.166
0.150
0.153
0.078
0.060
0.070
0.069
Median
0.002
0.010
0.006
0.007
−1.162
−0.941
−1.594
−1.733
SD
2.937
2.652
2.386
2.434
1.248
0.955
1.116
1.107
Variance
8.625
7.034
5.693
5.927
1.558
0.911
1.246
1.225
Skewness
11.658
11.133
10.118
10.357
−0.121
−0.013
0.177
0.373
Kurtosis
151.779
140.257
119.492
124.648
−1.469
−1.344
−1.148
−0.948
Minimum
0.000
0.000
0.000
0.000
−3.155
−2.321
−3.000
−3.000
Maximum
41.115
36.351
31.434
32.442
1.614
1.561
1.497
1.511
Table 2 Pearson correlation of the Vietnamese-American population percentage from log-transformed Census 2000 (C2k), Intelius (Int), WhitePages (WP), and both Intelius and WhitePages (Int + WP) Log(C2k)
Log(Int)
Log(WP)
Log(Int + WP)
Log(C2k)
1
0.449 (0.000)
0.425 (0.000)
0.354 (0.000)
Log(Int)
–
1
0.328 (0.000)
0.522 (0.000)
Log(WP)
–
–
1
0.795 (0.000)
Fig. 2 Population percentages of Vietnam-American from a Census 2000, b WhitePages at the census tract level
include Harris, Dallas and Tarrant; while Collins, Fort Bend and Travis counties had the most growth between 2000 and 2009. The same analysis at the census tract level revealed further details within the counties that
123
underwent negative/positive population change (Fig. 4). Major metropolitan areas, including Dallas/ Fort Worth, Houston, and the corridor between Austin and San Antonio, experienced a mixed population change. In general, the trend revealed an
GeoJournal (2012) 77:119–134
127
Table 3 Descriptive statistics of the population percentage of Vietnamese-American from Census 2000 (C2k), WhitePages (WP) and their log transformed counterparts at the census tract level C2k
WP
Log(C2k)
Log(WP)
−1.349
−1.592
Mean
0.023
0.023
SE of mean
0.001
0.001
0.017
0.015
Median
0.002
0.006
−1.550
−1.831
SD
0.069
0.054
1.127
0.974
Variance
0.005
0.003
1.270
0.949
Skewness Kurtosis
6.282 50.403
5.987 54.493
0.001 −1.482
0.503 −0.991
Minimum
0.000
0.000
−3.131
−2.910
Maximum
0.888
0.946
0.000
0.000
change of VA, whereas areas with z-score = ±1 represents ~68% of the sample under a normallydistributed population. Thus, the z-score map (Fig. 5) emphasized the “hotspots” with the largest population deficit/growth between 2000 and 2009. It was found that significant growth of VA in Texas occurred in the rural suburbs of major metropolitan areas, including Dallas/Fort Worth, Houston, and the Austin/San Antonio corridor. In contrast, areas with significant VA population decline commonly occurred around the city limit of major metropolitan areas and Palacios among the rural landscape.
Hoover index for population concentration Figures 1 and 2 reveal that large concentrations of the Texas VA population were located in the Dallas/Fort Worth, Austin, San Antonio, Houston, and Palacio areas at both county and census tract levels. Hoover index, a measurement of concentration, was used to cross check the pattern observed in the overall population change of VA (Table 4). In general, the Hoover indices were high, indicating a high concentration of VA population in Texas. Comparisons of the Census data with the web demographics revealed that there was an incidence of a lower Hoover index between 2000 and 2009 (i.e. population deconcentration of VA) at both census tract and county levels. These comparisons also showed that the Hoover Indices were consistently higher at the census tract level than at the county level with the same dataset.
Fig. 3 Population change of Vietnam-American between 2000 and 2009 in Texas Counties
Discussion alternate pattern of growth among VA around the city center, a decrease in the outer ring and an increase in the remote fringe between 2000 and 2009 (Fig. 4a, b, d). It was worthy to note that Palacios, Texas, a small fishery village along the gulf coastline, experienced remarkable population decline during this period (Fig. 4c). Based on the mean and standard deviation of the population change (i.e. ΔP%i), a z-score was calculated for each census tract unit by using Eq. 2 (Fig. 5). As explained in the methodology section, a z-score of ±1.96 indicates the lowest or highest 2.5% of the z-score distribution in positive/negative population
This study examined the usefulness of web demographics to model population change of VA in Texas. Of course, implications drawn from this study would have to rely upon the premise that the data from the web demographics are reliable and accurate. A scientifically accurate assessment of web demographics for an ethnic population would require a largescale field surveying at the regional level to avoid sampling bias, and is beyond the scope of this study. Comparisons among ACS 2008, ACS 2006-2008, and the web demographics were conducted at the state and county level (Tables 5, 6).
123
128
GeoJournal (2012) 77:119–134
Fig. 4 Population change of Vietnam-American between 2000 and 2009 in a Austin-San Antonio corridor, b Dallas/Fort Worth, c Palacios, and d Houston in Texas
Table 5 shows that the unique records of web demographics, including Intelius and WhitePages, represent more than 50% of estimated VA population in ACS 2008 and 2006-2008 for the state of Texas. Similarly, Table 6 reveals that the differences of normalized populations between the web demographics and ACS data (1- and 3- year estimates) are less than 5% in all counties. The preliminary assessment indicates that the spatial distributions of sampled VA from web demographics are very similar to the geographic patterns described by the available census data at the county level. Hence, the solicited web demographics are reasonably reliable, although the accuracy has yet to be validated by large-scale field surveying.
123
The population count of VA collected from Intelius and/or WhitePages in 2009 was found to be significantly different from Census 2000 at the county level and census tract levels. However, Pearson coefficients revealed modest but significant correlations among the datasets, suggesting that the overall population distribution was similar for the years 2000 and 2009. The correlation coefficient (r) was higher between either Intelius or WhitePages to Census 2000 than the combined web demographics supporting the notion that both people search engine provided some unique demographics compared to the Census 2000 data. The web 2.0 records solicited from the Internet can be considered a sample of the official census data. In
GeoJournal (2012) 77:119–134
129
Fig. 5 Z-score of population change of Vietnam-American between 2000 and 2009 in a Austin-San Antonio corridor, b Dallas/Fort Worth, c Palacios, and d Houston in Texas Table 4 Hoover index of Vietnamese-American population in 2000 and 2009 County
Census tract
Census 2000
89.70
95.94
Intelius
86.78
–
WhitePages
86.83
93.32
Intelius + WhitePages
86.74
–
turn, the official census is an estimation of the “true” population. Dual System Estimation (DSE), a method adopted by the US Census Bureau to estimate net undercount based on a capture-recapture assessment, surveyed a small sample of the population independently from the decennial census (Wright 2000). The
Table 5 Comparison between the web demographics and ACS data at the state level State level
WP + Intelius
ACS 2008
ACS 2006–2008
Population
97600
190,647
187,613
WP + Intelius/ ACS (%)
–
51.19
52.02
samples from an independent survey were then matched the sample from the decennial census data to derive a match/no-match ratio to estimate census coverage. Although web demographics achieves an incomplete count and does not survey every VA in Texas, the samples from both people search engines
123
130
GeoJournal (2012) 77:119–134
Table 6 Comparison between the web demographics and ACS data at the county level County level
WP + Intelius
%
ACS 2008
%
Diff %
ACS 2006–2008
%
Diff %
239
0.2
1,716
0.9
−0.7
863
0.5
Bexar County
3,168
3.2
6,280
3.3
0.0
4,416
2.4
0.9
Brazoria County Brazos County
2,175 409
2.2 0.4
1,790 570
0.9 0.3
1.3 0.1
2,699 573
1.4 0.3
0.8 0.1
Calhoun County
80
0.1
–
–
–
213
0.1
0.0
Cameron County
166
0.2
–
–
–
329
0.2
0.0
Collin County
5,995
6.1
10,647
5.6
0.6
8,523
4.5
1.6
Comal County
90
0.1
–
–
–
92
0.0
0.0
–
–
Bell County
−0.2
56
0.1
–
238
0.1
−0.1
Dallas County
12,484
12.8
23,654
12.4
0.4
26,137
13.9
−1.1
Denton County
2,734
2.8
8701
4.6
−1.8
6,037
3.2
−0.4
El Paso County
437
0.4
1955
1.0
−0.6
1,377
0.7
−0.3
Fort Bend County
5,789
5.9
9030
4.7
1.2
10,420
5.6
0.4
Galveston County
1,495
1.5
1635
0.9
0.7
2,335
1.2
0.3
Coryell County
Guadalupe County Harris County
182
0.2
–
31,393
32.2
68,690
–
–
36.0
−3.9
209
0.1
0.1
67,413
35.9
−3.8
Hays County
121
0.1
–
–
–
172
0.1
0.0
Hidalgo County
249
0.3
172
0.1
0.2
434
0.2
0.0
1,805 283
1.8 0.3
– 368
– 0.2
– 0.1
3,425 738
1.8 0.4
0.0 −0.1
514
0.3
0.0
56
0.0
0.1
Jefferson County Lubbock County McLennan County
314
0.3
412
Midland County
116
0.1
–
Montgomery County
702
0.7
506
Nueces County
416
0.4
–
0.2
0.1
–
–
0.3
0.5
485
0.3
0.5
–
–
394
0.2
0.2
Potter County
592
0.6
–
–
–
1,410
0.8
−0.1
Randall County
177
0.2
–
–
–
480
0.3
−0.1
Rockwall County
223
0.2
–
–
–
25
0.0
0.2
12,749
13.1
26,919
14.1
−1.1
26,935
14.4
−1.3
Tarrant County Taylor County
156
0.2
–
–
–
126
0.1
0.1
Tom Green County
163
0.2
–
–
–
195
0.1
0.1
6,681
6.8
11,663
632
0.6
–
1,480
1.5
1,728
Travis County Wichita County Williamson County
represented a respectable sample size. Moreover, web demographics can both provide supplementary data for DSE and provide data for analyzing demographic patterns. According to the Hoover Indices derived from Census and web demographics, there were high concentrations of the VA population at both the county and census tract levels in Texas. The high concentration may be attributed in part to the fact that many VA have moved from California to Texas
123
6.1
0.7
11,804
6.3
0.6
–
–
1,320
0.7
−0.1
0.9
0.6
1,925
1.0
0.5
cities, such as Houston, since 2000 because of affordable housing in Texas (Houston Institute for Culture 2010). Another factor that may help to explain this migration is that Houston contains the second largest VA community in Texas. Such large communities typically provide resources that serve the needs of VA migrants and these movements produce a snowball effect that further encouraged VA migration and concentration. According to the theory of ‘spatial assimilation’ (Massey 1985), it is common
GeoJournal (2012) 77:119–134
for immigrants to live in dense communities during the early stages of acculturation but move to more mixed ethnic communities in later stages of the assimilation process (Alba et al. 1999). Such clustering allows the immigrants to maintain a cohesive bonding to families, relatives, and friends, who help them make their transition into the new environment. Such concentration also reduces disadvantages (such as prejudice) that the immigrants might experience from exposure to other ethnicities at the initial transition stage. The present study showed slight decreases in the overall concentration of VA at both county and census tract levels between 2000 and 2009. The results were consistent with other studies of immigrant distribution. For example, Liaw and Frey (2007) reported a deconcentration trend across immigrants in United States from the late 1980s to the late 1990s. After gaining more capital and experience through work and education, the immigrants were able to rely less on their ethnic community and were able to strive for higher socioeconomic status and a better chance for jobs. Hall (2009) suggested several stages for immigrant’s distribution: concentration, deconcentration, and dispersal. Thus, the trend of concentration and deconcentration observed in this study conforms to the general patterns of acculturation strategies among immigrants. Significant population decline was observed in the present study around the city edge. This growth in the suburbs in major metropolitan areas revealed a trend of urban to suburb migration among the VA population. Further statistical analysis revealed a modest but significant correlation between VA population change and overall population density at both county and census tract levels (Table 7). The negative correlation between population change and population density reflected a scenario in which higher VA population growth was associated with low-density areas and vice versa. This phenomenon has been referred to as “counter-urbanisation” (Simpson and Finney 2009; Hall 2009). It may be that many immigrants move from central metropolitan areas to suburbs with low concentration of immigrant populations because of social and economic factors, such expensive housing in central cities (Saiz 2007) and the attractiveness of living in the suburbs. The counter-urbanization phenomenon can be considered a part of the urban sprawl in many US cities. Some
131 Table 7 Pearson correlations for the Vietnamese-American population changes with overall population density Data source of population change
County
Census tract
Intelius—Census 2000
−0.338 (0.000)
–
−0.482 (0.000) Intelius and WhitePages—Census −0.477 2000 (0.000)
WhitePages—Censuus 2000
−0.222 (0.000) –
studies find that such migration patterns are different for the second and third generation immigrants as compared to the first generation (Walker and Hannan 1989; Newbold 1999). A reverse trend, the case of rural–urban migration was found in coastal Palacios, Texas, where the new generation moved out of the traditional shrimp capital to pursue better jobs or education elsewhere. Two possible limitations of the present study are the reliability of web demographics, and the completeness of Census data. First, the biggest uncertainty in this study is the quality of web demographics data. Because the data of web demographics came from existing administrative records in the public domain (e.g. driving record), VGI and other third party data suppliers, there were undoubtedly errors in the web demographics (such as typographical errors in names and addresses). There were also inherent uncertainties associating with such data because the sample was limited to the list of common Vietnamese family names examined and the QA/QC process and the filtering non-VA records was primarily based on the subjective knowledge and experience of the authors. The uncertainty was further complicated by the fact that several VA family names are common to other Asian groups (such as Lam and Cao among the Chinese, and Kim among Koreans). It was likely that the web demographics might have incorporated some non-VA and introduced some bias in the results. Second, the present study modeled population change by using the Census 2000 data, which has issues of net under/overcount that have been well documented (Anderson and Fienberg 1999; 2005). The Census Bureau assessed the net under/overcount problem in the Census 2000 by using demographic analysis and dual system estimation, but their results were inconsistent and uncertain (U.S. Census Bureau 2001a, b, 2010; Anderson and Fienberg 2002;
123
132
Robinson and Adlakha 2002). Hence, the Executive Steering Committee for ACE Policy (ESCAP) recommended no adjustment for the Census 2000 data for both redistricting and non-redistricting uses (U.S. Census Bureau 2001b). Without an estimate of possible undercount in Census 2000, the result of counter-urbanisation in this study might be biased. Simpson and Middleton (1999) concluded that the estimate of non-response, when taken into account, has the effect of reducing the count of the net flow out of central cities. Despite the limitations of the present study, it nevertheless reveals the potential of web demographics in modeling population change in a way that conforms to our understanding of population dynamics. The automation of web mining of personal-level demographics from the people search engines makes the process fast and is repeatable. Compared to conventional surveying methods, such as mail surveys, telephone and personal interviews, web demographics methods provide an economical alternative (US GAO 2001). In addition, web demographics provide a way to include people who ignore the Census because the web approach is a passive data collection tool as opposed to active surveying methods. The present investigation has also identified several areas for future research. First, the accuracy of web demographics could be assessed by validating web demographics data against field surveying data. In order to conduct an accuracy assessment of web demographics for an ethnic population, it will be necessary to conduct a large-scale field survey at the regional level to minimize sampling bias. The present study provided evidence for the need to validate the accuracy and reliability of data available on the Internet. A natural extension of this study would be to conduct a similar analysis comparing web demographics with the forthcoming Census 2010 demographics. It would also be useful to examine the relationship between demographic profiles, census participation rates, and the acculturation strategies of individuals and communities. Future research could also investigate the driving force (s) for demographic change, test predictions of internal migration patterns, and include other demographic attributes (e.g., age, household information, occupation, education, and other socio-economic status). Furthermore, data collected from the Internet can provide continuous population monitoring with applications to the understanding of the spatio-temporal evolution of a
123
GeoJournal (2012) 77:119–134
population and the development of demographic profiles of people moving in and out of ethnic enclaves. While the present study illustrates the potential of web demographics in modeling population dynamics, future research regarding the utilization of web demographics must be cautious about the strategic implications for “personal privacy and integrity of personal identity” (Goss 1995). Conclusion The present study explored the usefulness of web demographics to analyze the spatio-temporal distribution among minorities and ethnic groups and also to improve the accuracy of the census. Comparisons of 2000 census data and web demographics data collected for this study revealed significant changes in residential patterns of the VA population in Texas between 2000 and 2009. The trend of population change of VA in Texas can be described as dispersal from urban areas to suburban areas. This trend was evident regardless of whether the analysis of population change was at the county level or at the census tract level. The potential for using web demographics is far reaching in applications such as identifying and predicting areas of significant population change among VA and other ethnic groups with similar migration patterns. Web demographics can also serve as an effective planning tool for future censuses to better allocate resources for post enumeration surveys and to make population adjustments. Finally, more accurate and timely predictions of future growth and decline of towns and cities in the USA may become possible by using web demographics. Acknowledgments The authors are grateful to Derek Chan who provided technical assistance to debug the initial version of the web application used in this research. The comments from two anonymous reviewers were helpful in improving the draft of this manuscript.
References Alba, R. D., Logan, J. R., Stults, B. J., Marzan, G., Zhang, W., et al. (1999). Immigrant groups in the suburbs: A reexamination of suburbanization and spatial assimilation. American Sociological Review, 64(3), 446–460. Anderson, M. J., & Fienberg, S. E. (1999). Who counts? The politics of census-taking in contemporary America. New York, NY: Russell Sage Foundation.
GeoJournal (2012) 77:119–134 Anderson, M., & Fienberg, S. E. (2002). Why is there still a controversy about adjusting the census for undercount. PSOnline, March, 83–85. Anderson, M., & Fienberg, S. E. (2005). Census undercount and adjustment. Encyclopedia of Social Measurement, 267–274. Bruno, F. A., & Beilke, P. F. (2005). Vietnam and Vietnamese Americans: Helping K-8 school librarians and educators understand the history, culture, and literature. Multicultural Review, Fall, 53–61. Chen, K. (2002). An approach to linking remotely sensed data and areal census data. International Journal of Remote Sensing, 23(1), 37–48. Chow, T. E. (2008). The potential of maps APIs for internet GIS. Transactions in GIS, 12(2), 179–191. Davis, J. M. (1985). Der Stress einer Emigration - eine Analyses der Reaktionen vietnamesischer Fluchtlinge [Stress of immigration - an analysis of the reactions of Vietnamese refugees]. In M. Rosch (Ed.), Auslandische Arbeitnehmer und Immigranten - sozialwissenschaftliche Beitrage zu einem praktischen Problem [Foreign guestworkers and immigrants—social science contributions to a practical problem] (pp. 181–201). Weiheim, Germany: Beltz Verlag. Do, H. D. (1996). The new migrants from Asia: Vietnamese in the United States. Organization of American Historians, 10(4), 61–66. Goodchild, M. F. (2007). Citizens as voluntary sensors: Spatial data infrastructure in the world of Web 2.0. International Journal of Spatial Data Infrastructures Research, 2, 24–32. Goodchild, M. F. (2008). Spatial accuracy 2.0. In J.-X. Zhang & M. F. Goodchild (Eds.), Spatial uncertainty. Proceedings of the eighth international symposium on spatial accuracy assessment in natural resources and environmental sciences, (Vol. 1, pp. 1–7). Liverpool, UK: World Academic Union. Goss, J. (1995). We know who you are and we know where you live”: The instrumental rationality of geodemographic systems. Economic Geography, 71, 171–198. Hall, M. (2009). Interstate migration, spatial assimilation, and the incorporation of US immigrants. Population Space and Place, 15(1), 57–77. Hansen, R. A., Henley, A. C., Brouwer, E. S., Roth, M. T., et al. (2007). Geographic information system mapping as a tool to assess non response next term bias in survey research. Research in Social and Administrative Pharmacy, 3(3):249–264. Herring, S. D. (2001). Using the World Wide Web for research: Are faculty satisfied? The Journal of Academic Librarianship, 27(3), 213–219. Hoa, L. T. (2005). Họ Và Tên Người Việt Nam (Vietnamese Family and Personal Names). Social Sciences Publishing House. Houston Institute for Culture. (2010). The Asian American experience: Building New Saigon. http://www.houston culture.org/cultures/viet.html. Accessed 4 June 2010. Jordan, T. G. (1986). A century and a half of ethnic change in Texas, 1836–1986. Southwestern Historical Quarterly, 89 (4), 385–422.
133 Kish, L. (2004). Statistical systems: Censuses of population. International Encyclopedia of the Social & Behavioral Sciences, 15049–15053. Liaw, K., & Frey, W. H. (2007). Multivariate explanation of the 1985–1990 and 1995–2000 destination choices of newly arrived immigrants in the United States: The beginning of a new trend? Population Space and Place, 13 (5), 377–399. Linberger, P., & White, G. W. (1998). Geographic information on the Web: Extracting demographic and market research information. In Proceedings of 19th annual national online meeting (pp. 235–242). Lo, C. P. (2008). Population estimation using geographically weighted regression. Geographic Information Science & Remote Sensing, 45(2), 131–148. Long, L., & Nucci, A. (1997). The Hoover index of population concentration: A correction and update. Professional Geographer, 49(4), 431–440. Malone, N., Baluja, K. F., Costanzo, J. M., Davis C. J., et al. (2003). The foreign-born population: 2000. Census 2000 Brief, C2KBR-34, U.S. Census Bureau, Washington, D.C. http://www.census.gov/prod/2003pubs/c2kbr-34.pdf. Accessed 1 June 2010. Massey, D. S. (1985). Ethnic residential segregation—a theoretical synthesis and empirical review. Sociology and Social Research, 69(3), 315–350. Mohl, R. A. (2003). Globalization, Latinization, and the nuevo New South. Journal of American Ethnic History, 22(4), 31–66. Newbold, K. B. (1999). Internal migration of the foreign-born: Population concentration or dispersion? Population and Environment, 20(3), 259–276. Nirel, R., & Glickman, H. (2009). Sampling surveys and censuses. In D. Pfeffermann & C. R. Rao (Eds.), Handbook of statistics, Vol. 29, Part 1, Chap. 21 (pp. 539–565). New York: Elsevier. O’Reilly, T. (2007). What is Web 2.0: Design patterns and business models for the next generation of software. Communications and Strategies, 65(1st Q), 17–37. Passell, J. S. (2001). Censuses: Demographic issues. International Encyclopedia of Social and Behavioral Sciences, 1599–1605. Pealer, L. N., Weiler, R. M., Pigg, R. M., Miller, D., Dorman, S. M., et al. (2001). The feasibility of a Web-based surveillance system to collect health risk behavior data from college students. Health Education & Behavior, 28(5), 547–559. Pettit, F. A. (1999). Exploring the use of the World Wide Web as a psychology data collection tool. Computers in Human Behavior, 15(1), 67–71. Pfeifer, M. E. (2001). U.S. Census 2000: An overview of national and regional trends in Vietnamese residential distribution. The Review of Vietnamese Studies, 1(1), 1–9. Quarzo, A. (2000). Plans for census 2000. Government Information Quarterly, 17(2), 97–120. Reips, U. D. (2001). The Web experimental psychology lab: Five years of data collection on the Internet. Behavior Research Methods Instruments & Computers, 33(2), 201–211. Robinson, J. G., & Adlakha, A. (2002). Comparison of A.C.E. revision II results with demographic analysis. U.S. Bureau
123
134 of the Census, DSSD A.C.E. Revision II Estimates Memorandum Series #PP-41, December 31, 2002. http://www.census.gov/dmd/www/pdf/pp-41r.pdf. Accessed August 10, 2010. Saiz, A. (2007). Immigration and housing rents in American cities. Journal of Urban Economics, 61(2), 345–371. Simpson, L., & Finney, N. (2009). Spatial patterns of internal migration: Evidence for ethnic groups in Britain. Population Space and Place, 15(1), 37–56. Simpson, L., & Middleton, E. (1999). Undercount of migration in the UK 1991 census and its impact on counterurbanisation and population projections. International Journal of Population Geography, 5, 387–405. Sui, D. Z. (2008). The wikification of GIS and its consequences: Or Angelina Jolie’s new tattoo and the future of GIS. Computers Environment and Urban Systems, 32(1), 1–5. U.S. Census Bureau. (2000). Census 2000 summary file 3. http://factfinder.census.gov/servlet/DatasetMainPageServlet?_program=DEC&_submenuId=&_lang=en&_ts. Accessed February 23, 2010. U.S. Census Bureau. (2001a). Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy. http://www.census.gov/dmd/www/pdf/Escap2.pdf. Accessed February 23, 2010. U.S. Census Bureau. (2001b). Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy on Adjustment for Non-Redistricting Uses. http://www. census.gov/dmd/www/pdf/Recommend2.pdf. Accessed August 10, 2010.
123
GeoJournal (2012) 77:119–134 U.S. Census Bureau. (2004). Meeting 21st century demographic data needs—implementing the american community survey: Report 8: Comparison of the American community survey three-year averages and the census sample for a sample of counties and tracts. http://www.census.gov/ acs/www/AdvMeth/acs_census/creports/Report08.pdf. Accessed February 23, 2010. U.S. Census Bureau. (2010). Net undercount and undercount rate for U.S. and states (1990). http://www.census.gov/ dmd/www//pdf/understate.pdf. Accessed 1 June 2010. U.S. General Accounting Office. (2001). Significant increase in cost per housing unit compared to 1990. GAO-02-31, http://www.gao.gov/new.items/d0231.pdf. Accessed 1 June 2010. Vu, R. (2006). Rising from the cold war ashes: Construction of a Vietnamese American community in Houston, 1975– 2005 (pp. 1–406). PhD. dissertation, Department of History, University of Houston. Walker, R., & Hannan, M. (1989). Dynamic settlement process: The case of US immigration. The Professional Geographer, 41(2), 172–183. WhitePages. (2010). WhitePages pro API implementation guide—Version 2.1. http://pro.whitepages.com/static/White Pages_PRO_API_Implementation_Guide.pdf. Accessed 20 August 2010. Wright, T. (2000). Census 2000: Who says counting is easy as 1–2–3? Government Information Quarterly, 17(2), 121–136.