Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction terms for Categorical Variables in a RE Model #566

Open
Arceus opened this issue Nov 5, 2023 · 4 comments
Open

Interaction terms for Categorical Variables in a RE Model #566

Arceus opened this issue Nov 5, 2023 · 4 comments

Comments

@Arceus
Copy link

Arceus commented Nov 5, 2023

I'm trying to assess the effects on "returnAvg" of the interactions between my "SupplyClass" and "DupeClass" categorical variables, but I'm not sure I'm doing it right given that I always get the following error:

Traceback (most recent call last):
  File "F:\Python Projects\Tesi\Random Effects Regression.py", line 51, in <module>
    model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + HasGlow + HasCutout + SupplyClass*DupeClass", data=data)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2670, in from_formula
    mod = cls(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2616, in __init__
    super().__init__(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 328, in __init__
    self._validate_data()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 479, in _validate_data
    rank_of_x = self._check_exog_rank()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 434, in _check_exog_rank
    raise ValueError(
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.

Turning check_rank to True leads to Singular Matrix error.

I've tried looking into the documentation but I haven't found any answer on how to do it properly.
Here's how I tried inserting the interaction terms:
model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + SupplyClass+ DupeClass + SupplyClass*DupeClass ", data=data)

@bashtage
Copy link
Owner

bashtage commented Nov 7, 2023

Are these both pandas categorigcals?

@bashtage
Copy link
Owner

bashtage commented Nov 7, 2023

Can you provide some more information on the structure of the data you are modeling?

@Arceus
Copy link
Author

Arceus commented Nov 7, 2023

I'm pulling the data from my .csv longform unbalanced panel database. When I say categorical, e.g. SupplyClass or DupeClass, I refer to an array of strings classifying each id from column "itemName" differently. I don't run into any problem using these as separated explanatory variables without creating dummies beforehand, as linearmodels recognizes them as categoricals and handles them automatically.

@bashtage
Copy link
Owner

bashtage commented Nov 7, 2023

What size is the array? What are the entity and time indices?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants