Latent Class Analysis Fix

Automated cleaning/adjustment of MPlus LCA .txt files imported into Stata

Written by Nicholas Branic — Dec 4, 2014

You can download the mpluslcafix ado file here

********** Description of this .ado file

When using MPlus to conduct latent class analysis, users may instruct MPlus to generate an output file that includes the predicted probabilities for each estimated class in addition to the source variables used to estimate the models. MPlus provides this output file in a default .txt format and uses asterisks to denote missing data. When read into Stata, these asterisks cause the data in certain rows to “shift” into different columns depending on the amount of missing data. In addition, these shifts can turn numeric variables into strings and distort the overall dataset. Running the code below will correct these shift patterns and output the following variables:

1) All of the predicted probabilities variables generated by MPlus
2) The “indicator” variable signifying class membership

********** How to use this code

Before running this .ado file, users will need to import the .txt file generated by MPlus into Stata. Below are the two easiest methods that I have found:

1) Use StatTransfer to convert the original .txt file directly into a .dta file that may then be read into Stata. The most direct approach is to use the “stcmd” command from within Stata, although users may also open the StatTransfer program outside of Stata, convert the file, and then open the new .dta file in Stata.

2) Open the .txt file in Microsoft Excel and then save the data as a .csv file. Then, use the following commands to call the .csv file into Stata: clear import delimited <yourfilenamehere>

Alternative approaches probably exist, although note that some importing strategies may induce additional column shifting — for example, calling in an
.xlsx file — that may cause the code in this .ado file to malfunction. Once the MPlus data are loaded into Stata, execute the .ado file using the following commands:

adopath + “<foldercontainingadofile>”

For example:

adopath + “C:\Stata Code\Ado Files\”

Running this code generates a new .dta file that excludes the original variables used in the latent class analysis and keeps only the predicted
probability and class indicator variables. Next, users will need to merge the adjusted MPlus data back into their original Stata dataset. To do so, use the following commands:

use <originaldataset>, clear
merge 1:1 _n using <newdataset>

Now the LCA variables should correspond to the original data and match to each respective case in the dataset.

********** Additional notes

1) Prior to using the “outfile” command in Stata, which will export your data for use in latent class analysis within MPlus, users should perform two steps:

a) Drop any cases that feature all missing values on the variables that you will include in your latent class analysis, as MPlus will drop these cases automatically during latent class analysis and the loss of these cases will result in a faulty merge after cleaning the LCA data

b) As an added precaution against faulty merging, use the “sort” command on one or more variables. MPlus retains the original sorting of the data during latent class analysis and in its output .txt file, but this step provides additional confidence that the LCA data will merge back to their appropriate place within the original dataset.

2) This code was designed for use with MPlus only and has not been tested with other statistical software capable of conducting latent class analysis. The code was originally run and tested using Stata 13 and MPlus 5.


You can download the mpluslcafix ado file here.

A powerpoint describing the procedure is available here.

I hope that this code helps and please direct all comments, suggestions, and issues to Nicholas Branic at