Assume we're examining the factors affecting the price of residential properties in a certain region. The dataset contains the following variables:
-
Price (dependent variable): The selling price of the property.
-
Size (independent variable): The total square footage of the property.
-
Bedrooms (independent variable): Number of bedrooms in the property.
-
Bathrooms (independent variable): Number of bathrooms in the property.
-
Age (independent variable): Age of the property in years.
-
Location (independent variable, categorical): Categorized as 'Urban', 'Suburban', or 'Rural'.
-
Condition (independent variable, ordinal): Rated on a scale from (needs major repairs) to (excellent condition).
Suppose we apply linear regression to predict the house price and we include location as a predictor via one-hot encoding.
If our goal is to determine if location is a statistically significant predictor for housing price, how many binary indicator variables should we add to add to the model?