These ado files allow the researcher to take data aggregated to other units and aggregate them to egohoods. Note that it is preferable to have the original data contained in small units such as blocks—this allows for the smoothest construction of egohoods. If the data are in larger units, there will be an additional level of measurement error introduced to the egohoods measures. This is not fatal—it just implies some more error. Nearly all measures in social science research contain measurement error, and the prudent researcher might consider an analytic technique that explicitly accounts for this (e.g., structural equation modeling).
This ado file creates variables aggregated to egohoods.
There are two different ado files contained here. The MakeEgohoods.ado
file will work in essentially all instances, but it runs slower. The MakeEgohoodsFast.ado
file runs considerably faster, and will work fine in many instances, but it can sometimes have difficulties in that it will use up all of a computer’s active memory: when this occurs, it will either result in an error message, or running extremely slowly.
You can download the two ado files and the example datasets here. They are in a zipped file. Or you can download the MakeEgohoods.ado
here, or the MakeEgohoodsFast.ado
here.
*Path to the location of the egohoods ado file
adopath + "\FILEPATH\"
where FILEPATH
points to the location on your computer where the ado file is saved.
When invoking the code for MakeEgohoods.ado, the command is:
MakeEgohoods, countvars(VARNAMES) meanvars(VARNAMES) latitude(VARNAME) longitude(VARNAME) bufferdistance(XX)
where VARNAME(s)
is the name of the variables of interest, and XX
refers to the distance in miles for the preferred radius of the egohood. For example, to declare a 0.5 mile egohood, enter .5
for bufferdistance.
To invoke the code for MakeEgohoodsFast.ado
, the command is similar with the exception that there is one additional option that can be added:
squnit(XXX)
this is the value of the size of the square units that the code uses for grabbing many geographic units at once. The default specification if not over-ridden by the user is 0.017, which is the latitude value approximately equal to one mile—the code will therefore grab geographic units within one square-mile blocks and compute the egohoods for them, and then go on to the next one square-mile unit. (In contrast, the MakeEgohoods.ado
file grabs each geographic unit one at a time and computes the egohood for it). Thus, increasing the value for squint will increase the speed of the MakeEgohoodsFast.ado
code, but runs the risk of over-burdening the active memory.
Note: if you get an error message when using MakeEgohoodsFast.ado
that memory has been exhausted, run the code using a smaller value for squnit
.
This code uses Austin Nichols’ vincenty.ado
file to calculate distances (http://ideas.repec.org/c/boc/bocode/s456815.html).
To install, type in the command line:
ssc install vincenty
When the code finishes running, you will have new variables in your dataset that have added the prefix
EH_
to the original variable names. This will let you know that these are egohood versions of your initial variables. So, you will not want any of the variables in your initial dataset to already have this particular prefix, and this will likely have some unhappy consequences for the algorithm.
*****What does the code do?*****
For every observation in the dataset, the code draws a buffer of some specified distance around the observation.
Once the buffer around an observation is drawn, the code sums all other observations in the dataset that are spatially located within the buffer.
The focal observation is also summed in this calculation.
When using census blocks, these are egohoods. There is no explicit distance decay function.
The code is not specific to any particular Census unit or year. The code could be used for points (e.g., individuals) or polygons (needs centroids).
The example uses year 2000 census blocks for Irvine California (mydata.dta), or for Los Angeles.
*****Boundary Problem (See Wong, David W. S. 1997. Spatial Dependency of Segregation Indices. Canadian Geographer 41 (2):128-136.):
This code only examines units that are within the current dataset. In our example case, it is only blocks
within the city boundary of Irvine. As result, the blocks near the city boundary will be biased if we do not include blocks
outside of the city boundary. To overcome this bias, we would need to include blocks from the other side of the boundary.
It’s better to include blocks that are a bit farther than the size of the buffer of egohoods.
*The code uses geographic coordinates (not projected), and makes no adjustments for where the coordinates are on the earth. This is likely good enough for most purposes.
Contents:
MakeEgohoods.ado
– basic ado file to create variables aggregated to egohoods
MakeEgohoodsFast.ado
– advanced ado file to create variables aggregated to egohoods
example_do_file.do
– do file to run example code to create variables aggregated to egohoods
mydatairv.dta
– example dataset
mydata.dta
– example dataset
mydataLA.dta
– example dataset