asdocx: Export from Stata to Word, Excel, LaTeX & HTML › Forums › asdocx Forum › Bug with Regression Labels on Categorical Variables
Tagged: encoded variables, labels, nested regression
-
AuthorPosts
-
So I have some encoded categorical variables I make like this
gen Fund_Strategy = "" // Initialize the new variable replace Fund_Strategy = "[01] Micro Cap" if trim(lower(Fund_Strategy_EV_Size)) == "micro cap" replace Fund_Strategy = "[02] Small Cap" if trim(lower(Fund_Strategy_EV_Size)) == "small cap" replace Fund_Strategy = "[03] Lower Mid Cap" if trim(lower(Fund_Strategy_EV_Size)) == "lower mid cap" replace Fund_Strategy = "[04] Mid Cap" if trim(lower(Fund_Strategy_EV_Size)) == "mid cap" replace Fund_Strategy = "[05] Large Cap" if trim(lower(Fund_Strategy_EV_Size)) == "large cap" replace Fund_Strategy = "[99] Unknown" if missing(Fund_Strategy_EV_Size) * Step 4: Encode the Fund_Strategy variable into a numeric variable encode Fund_Strategy, generate(cat_Fund_Strategy) label(cat_Fund_Strategy) label variable cat_Fund_Strategy "Fund Strategy Categories, From Raw Data"
I label the variable and the categories of said variable, when setting the base to anything other than the defaul of 1. That is I set another category as a base and run my regression using the other categories, like this…
global x_fact_controls /// ib(2).cat_Fund_Strategy /// // Base: [02] Small Cap ib(3).cat_Fund_Status /// // Base: [03] Divestment
ASDOCX word ouput for nested regressions seems to mis-interpret that. Now this is just the labeling the regressions properly drop the base from the output values.
Also, the formatting on binary variables where I have a 0 and 1 where 0 is automatically the base, the labels seem to be tabbed over to the right quite a bit.
Dear Ross
To replicate the issue you’re facing, I have created a dummy dataset and performed a nested regression. In the code and output below, I noticed that the base category includes extra wording (“Fund Strategies, Categories from Raw Data”), which I will fix. If you are experiencing a different issue, please either use my dataset or share a sample of your own so I can better understand the problem.*----------------------------------------- * Step 1: Create dummy data *----------------------------------------- clear set seed 1234 set obs 10 * Create a string variable Fund_Strategy_EV_Size gen Fund_Strategy_EV_Size = "" * Populate dummy data with various cases and missing values: replace Fund_Strategy_EV_Size = "micro cap" in 1 replace Fund_Strategy_EV_Size = "small cap" in 2 replace Fund_Strategy_EV_Size = "lower mid cap" in 3 replace Fund_Strategy_EV_Size = "mid cap" in 4 replace Fund_Strategy_EV_Size = "large cap" in 5 replace Fund_Strategy_EV_Size = "MICRO CAP" in 6 // test case: uppercase letters replace Fund_Strategy_EV_Size = "sMaLL CaP" in 7 // test case: mixed case * Observations 8 and 10 remain empty (i.e., missing) replace Fund_Strategy_EV_Size = "mid cap" in 9 *----------------------------------------- * Step 2: Create and assign Fund_Strategy based on EV_Size *----------------------------------------- gen Fund_Strategy = "" // Initialize the new variable replace Fund_Strategy = "[01] Micro Cap" if trim(lower(Fund_Strategy_EV_Size)) == "micro cap" replace Fund_Strategy = "[02] Small Cap" if trim(lower(Fund_Strategy_EV_Size)) == "small cap" replace Fund_Strategy = "[03] Lower Mid Cap" if trim(lower(Fund_Strategy_EV_Size)) == "lower mid cap" replace Fund_Strategy = "[04] Mid Cap" if trim(lower(Fund_Strategy_EV_Size)) == "mid cap" replace Fund_Strategy = "[05] Large Cap" if trim(lower(Fund_Strategy_EV_Size)) == "large cap" * For observations with a missing Fund_Strategy_EV_Size, mark as Unknown replace Fund_Strategy = "[99] Unknown" if missing(Fund_Strategy_EV_Size) *----------------------------------------- * Step 3: Encode Fund_Strategy into a numeric variable *----------------------------------------- encode Fund_Strategy, generate(cat_Fund_Strategy) label(cat_Fund_Strategy) label variable cat_Fund_Strategy "Fund Strategy Categories, From Raw Data" expand 100 gen returns = uniform() gen size = uniform() gen expense = returns + uniform()/10 gen cat_Fund_Status = mod(_n,4) label define fundstatus 1 "Active" 2 "Passive" 3 "Divestment" 0 "Unknown", modify label values cat_Fund_Status fundstatus global x_fact_controls /// ib(2).cat_Fund_Strategy /// // Base: [02] Small Cap ib(3).cat_Fund_Status /// // Base: [03] Divestment asdocx reg returns size expense $x_fact_controls, replace label tzok fs(9) abb(.) nest
Table: Regression results(1) Variables returns size -0.001 (0.003) expense 0.990*** (0.003) [01] Micro Cap 0.002 (0.003) Fund Strategy Categories, From Raw Data : base [02] Small Cap [03] Lower Mid Cap 0.001 (0.003) [04] Mid Cap 0.000 (0.003) [05] Large Cap -0.003 (0.003) [99] Unknown 0.003 (0.003) Unknown -0.001 (0.003) Active -0.002 (0.003) Passive 0.001 (0.003) : base Divestment Intercept -0.044*** (0.003) Observations 1000.000 R2 0.991 Notes: Standard errors are in parentheses. *** p<.01, ** p<.05, * p<.1 Thank you Dr. Shah, but the issue is a bit different. Actually I have 2 issues.
1) It is basically that the base (aka reference) generated text is being assigned to the wrong label. So the generated text “Reference: )” is appearing on the 1st category and not the one I assign as the base for the regression.So I am getting using the above example, I receive “Fund Strategy Categories, From Raw Data (Reference: Micro Cap)” on Micro, which is not the base I set with the syntax, it should be assigned to Small Cap
global x_fact_controls /// ib(2).cat_Fund_Strategy /// // Base: [02] Small Cap
Said differently, the text “(Reference: )” should move from the 1st label to the 2nd label “Small Cap.” Actually, I rather like the label before the categories “Fund Strategy Categories, From Raw Data” if it can stay that would be great!
2) there is spacing in the docx output sometimes and sometimes not on the individual variables/labels. It seems to only happen on variables that I have which are dummies and the values are either 0 or 1.
COVID-19 Recession (2020)
New Normal, Post-Pandemic Economy (2021-2024)
Dummy indicator if fund is the first-time fund of the PE firm (Reference: Not first-time fund)
Yes a first-time fund
Dummy Indicator If Fund Is Completed (Reference: Unrealized Fund)
Also the 1st part of the dummy variables was always cutoff, but I came up with a fix by adding some characters before.
Syntax: gen d_first_time_fund = 0 replace d_first_time_fund = 1 if Fund_Generation_Sequence == 1 label variable d_first_time_fund "Dummy indicator if fund is the first-time fund of the PE firm" label define first_time_fund_lbl 0 "0 Not first-time fund" 1 "1 Yes a first-time fund" label values d_first_time_fund first_time_fund_lbl
Actually, I just noted your output has “base” and not “(Reference:” before the category label anbd after the variable label. I wonder why that is?
My regression syntax is like this.
asdocx regress V1 V2 /// $x_cont_controls $x_fact_controls, /// replace nest /// rep(p) save($export_folder\temp_reg01.doc) title(`reg1') dec(2) abb(.) fs(8) label asdocx regress V3 V2 /// $x_cont_controls $x_fact_controls, /// nest /// rep(p) save($export_folder\temp_reg01) title(`reg1') dec(2) stat(r2_a) abb(.) fs(8) label
Dear Ross
Can you please send (email) some sample data that can replicate the issue you are having. I think the specific issue might be an artifact of your dataset. -
AuthorPosts
- You must be logged in to reply to this topic.