Attaullah Shah
Moderator
Post count: 80

Dear Ross
To replicate the issue you’re facing, I have created a dummy dataset and performed a nested regression. In the code and output below, I noticed that the base category includes extra wording (“Fund Strategies, Categories from Raw Data”), which I will fix. If you are experiencing a different issue, please either use my dataset or share a sample of your own so I can better understand the problem.
[code]
*—————————————–
* Step 1: Create dummy data
*—————————————–
clear
set seed 1234
set obs 10

* Create a string variable Fund_Strategy_EV_Size
gen Fund_Strategy_EV_Size = “”

* Populate dummy data with various cases and missing values:
replace Fund_Strategy_EV_Size = “micro cap” in 1
replace Fund_Strategy_EV_Size = “small cap” in 2
replace Fund_Strategy_EV_Size = “lower mid cap” in 3
replace Fund_Strategy_EV_Size = “mid cap” in 4
replace Fund_Strategy_EV_Size = “large cap” in 5
replace Fund_Strategy_EV_Size = “MICRO CAP” in 6 // test case: uppercase letters
replace Fund_Strategy_EV_Size = “sMaLL CaP” in 7 // test case: mixed case
* Observations 8 and 10 remain empty (i.e., missing)
replace Fund_Strategy_EV_Size = “mid cap” in 9

*—————————————–
* Step 2: Create and assign Fund_Strategy based on EV_Size
*—————————————–
gen Fund_Strategy = “” // Initialize the new variable

replace Fund_Strategy = “[01] Micro Cap” if trim(lower(Fund_Strategy_EV_Size)) == “micro cap”
replace Fund_Strategy = “[02] Small Cap” if trim(lower(Fund_Strategy_EV_Size)) == “small cap”
replace Fund_Strategy = “[03] Lower Mid Cap” if trim(lower(Fund_Strategy_EV_Size)) == “lower mid cap”
replace Fund_Strategy = “[04] Mid Cap” if trim(lower(Fund_Strategy_EV_Size)) == “mid cap”
replace Fund_Strategy = “[05] Large Cap” if trim(lower(Fund_Strategy_EV_Size)) == “large cap”

* For observations with a missing Fund_Strategy_EV_Size, mark as Unknown
replace Fund_Strategy = “[99] Unknown” if missing(Fund_Strategy_EV_Size)

*—————————————–
* Step 3: Encode Fund_Strategy into a numeric variable
*—————————————–
encode Fund_Strategy, generate(cat_Fund_Strategy) label(cat_Fund_Strategy)
label variable cat_Fund_Strategy “Fund Strategy Categories, From Raw Data”
expand 100

gen returns = uniform()
gen size = uniform()
gen expense = returns + uniform()/10
gen cat_Fund_Status = mod(_n,4)
label define fundstatus 1 “Active” 2 “Passive” 3 “Divestment” 0 “Unknown”, modify
label values cat_Fund_Status fundstatus

global x_fact_controls ///
ib(2).cat_Fund_Strategy /// // Base: [02] Small Cap
ib(3).cat_Fund_Status /// // Base: [03] Divestment

asdocx reg returns size expense $x_fact_controls, replace label tzok fs(9) abb(.) nest
[/code]

Table: Regression results
(1)
Variables returns
size -0.001
(0.003)
expense 0.990***
(0.003)
[01] Micro Cap 0.002
(0.003)
Fund Strategy Categories, From Raw Data : base [02] Small Cap
[03] Lower Mid Cap 0.001
(0.003)
[04] Mid Cap 0.000
(0.003)
[05] Large Cap -0.003
(0.003)
[99] Unknown 0.003
(0.003)
Unknown -0.001
(0.003)
Active -0.002
(0.003)
Passive 0.001
(0.003)
: base Divestment
Intercept -0.044***
(0.003)
Observations 1000.000
R2 0.991
Notes: Standard errors are in parentheses. *** p<.01, ** p<.05, * p<.1