Egohoods

This ado file creates variables aggregated to egohoods. This approach was described in this published article:

Hipp, John R. and Adam Boessen. 2013. “Egohoods as waves washing across the city: A new measure of “neighborhoods”.” Criminology 51:287-327.

Egohoods are an overlapping approach to constructing neighborhoods—whereas almost all other approaches to constructing neighborhoods utilize a non-overlapping approach (neighborhoods do not overlap with one another), egohoods take an explicitly spatial approach to measuring context. These ado files allow the researcher to take data aggregated to other units and aggregate them to egohoods. Note that it is preferable to have the original data contained in small units such as blocks—this allows for the smoothest construction of egohoods. If the data are in larger units, there will be an additional level of measurement error introduced to the egohoods measures. This is not fatal—it just implies some more error. Nearly all measures in social science research contain measurement error, and the prudent researcher might consider an analytic technique that explicitly accounts for this (e.g., structural equation modeling).

There are two different ado files contained here. The MakeEgohoods.ado file will work in essentially all instances, but it runs slower. The MakeEgohoodsFast.ado file runs considerably faster, and will work fine in many instances, but it can sometimes have difficulties in that it will use up all of a computer’s active memory: when this occurs, it will either result in an error message, or running extremely slowly.

You can download the two ado files and the example datasets here. They are in a zipped file. Or you can download the MakeEgohoods.ado here, or the MakeEgohoodsFast.ado here.

To use either of these ado files, you must either save them to the location where your computer stores Stata ado files, or else use this command:

*Path to the location of the egohoods ado file

adopath + “\FILEPATH\”

where FILEPATH points to the location on your computer where the ado file is saved.

When invoking the code for MakeEgohoods.ado, the command is:

MakeEgohoods, countvars(VARNAMES) meanvars(VARNAMES) latitude(VARNAME) longitude(VARNAME) bufferdistance(XX)

where VARNAME(s) is the name of the variables of interest, and XX refers to the distance in miles for the preferred radius of the egohood. For example, to declare a 0.5 mile egohood, enter .5 for bufferdistance.

To invoke the code for MakeEgohoodsFast.ado, the command is similar with the exception that there is one additional option that can be added:

squnit(XXX)

this is the value of the size of the square units that the code uses for grabbing many geographic units at once. The default specification if not over-ridden by the user is 0.017, which is the latitude value approximately equal to one mile—the code will therefore grab geographic units within one square-mile blocks and compute the egohoods for them, and then go on to the next one square-mile unit. (In contrast, the MakeEgohoods.ado file grabs each geographic unit one at a time and computes the egohood for it). Thus, increasing the value for squint will increase the speed of the MakeEgohoodsFast.ado code, but runs the risk of over-burdening the active memory.

Note: if you get an error message when using MakeEgohoodsFast.ado that memory has been exhausted, run the code using a smaller value for squnit.

This code uses Austin Nichols’ vincenty ado file to calculate distances (http://ideas.repec.org/c/boc/bocode/s456815.html).

To install, type in the command line:

ssc install vincenty

When the code finishes running, you will have new variables in your dataset that have added the prefix

EH_

to the original variable names. This will let you know that these are egohood versions of your initial variables. So, you will not want any of the variables in your initial dataset to already have this particular prefix, and this will likely have some unhappy consequences for the algorithm.

*****What does the code do?*****

For every observation in the dataset, the code draws a buffer of some specified distance around the observation.

Once the buffer around an observation is drawn, the code sums all other observations in the dataset that are spatially located within the buffer.

The focal observation is also summed in this calculation.

When using census blocks, these are egohoods. There is no explicit distance decay function.

The code is not specific to any particular Census unit or year. The code could be used for points (e.g., individuals) or polygons (needs centroids).

The example uses year 2000 census blocks for Irvine California (mydata.dta), or for Los Angeles.

*****Boundary Problem (See Wong, David W. S. 1997. Spatial Dependency of Segregation Indices. Canadian Geographer 41 (2):128-136.):

This code only examines units that are within the current dataset. In our example case, it is only blocks

within the city boundary of Irvine. As result, the blocks near the city boundary will be biased if we do not include blocks

outside of the city boundary. To overcome this bias, we would need to include blocks from the other side of the boundary.

It’s better to include blocks that are a bit farther than the size of the buffer of egohoods.

*The code uses geographic coordinates (not projected), and makes no adjustments for where the coordinates are on the earth. This is likely good enough for most purposes.

Example dataset zip file

Contents:

MakeEgohoods.ado – basic ado file to create variables aggregated to egohoods

MakeEgohoodsFast.ado – advanced ado file to create variables aggregated to egohoods

example_do_file.do – do file to run example code to create variables aggregated to egohoods

mydatairv.dta – example dataset

mydata.dta – example dataset

mydataLA.dta – example dataset