Consider the sf_crime data. These data contain information on 10,000 police inci
ID: 2928328 • Letter: C
Question
Consider the sf_crime data. These data contain information on 10,000 police incidents in San Francisco, CA during 2003-2015. In what follows, we will investigate the Category and PdDistrict categorical variables, indicating the category of offense and police district in which the offense occurred, respectively. We will restrict attention to only the 12 most prevalent incident categories. Run the following code to prepare the data and construct the joint distribution, the marginal distributions, and the conditional distributions.
## Read data.
dta <- read.csv("sf_crime.csv")
dim(dta)
head(dta)
summary(dta)
n <- nrow(dta)
## Separate out and filter the 'Category' and 'PdDistrict' variables.
category <- dta$Category
district <- dta$PdDistrict
tt <- table(category)
oo <- order(tt, decreasing = TRUE)
ii_keep <- (1:n)[category %in% names(tt)[oo][1:12]]
category <- factor(as.character(category)[ii_keep])
district <- factor(as.character(district)[ii_keep])
## Joint and marginal distributions.
f_cat_dis <- prop.table(table(category, district))
f_cat <- rowSums(f_cat_dis)
f_dis <- colSums(f_cat_dis)
## Conditional distributions of 'category' given 'district'.
f_cat_given_dis <- t(f_cat_dis) / f_dis
f_dis_given_cat <- f_cat_dis / f_cat
Question 1a
What is the probability the incident was drug / narcotic-related and occurred in the Tenderloin district? Answer to four significant figures.
Question 1b
What is the probability the incident was drug / narcotic-related and occurred in the Richmond district? Answer to four significant figures.
Question 1c
What is the probability the incident was drug / narcotic-related, regardless of the district in which it occurred? Answer to four significant figures.
Question 1d
What is the probability the incident occurred in the Tenderloin district, regardless of the incident category? Answer to four significant figures.
Question 1e
What is the probability the incident occurred in the Richmond district, regardless of the incident category? Answer to four significant figures.
Question 1f
Given that a crime occurred in the Tenderloin district, what is the probability it was drug / narcotic-related? Answer to four significant figures.
Question 1g
Given that a crime occurred in the Richmond district, what is the probability it was drug / narcotic-related? Answer to four significant figures.
Question 1h
Are Category and PdDistrict independent?
Explanation / Answer
Question 1a
What is the probability the incident was drug / narcotic-related and occurred in the Tenderloin district? Answer to four significant figures.
P (drug/narcotic and Tenderloin) = 0.020922
Question 1b
What is the probability the incident was drug / narcotic-related and occurred in the Richmond district? Answer to four significant figures. P (drug/narcotic and Richmond) = 0.001350
Question 1c
What is the probability the incident was drug / narcotic-related, regardless of the district in which it occurred? Answer to four significant figures .P (drug/narcotic-related) = 0.070866
Question 1d
What is the probability the incident occurred in the Tenderloin district, regardless of the incident category? Answer to four significant figures.P (Tenderloin) =0.09190
Question 1e
What is the probability the incident occurred in the Richmond district, regardless of the incident category? Answer to four significant figures. P (Richmond) = 0.05467
Question 1f
Given that a crime occurred in the Tenderloin district, what is the probability it was drug / narcotic-related? Answer to four significant figures.P (drug/narcotic | Tenderloin) = P (drug/narcotic and Tenderloin)/ P(Tenderloin) =0.2277
Question 1g
Given that a crime occurred in the Richmond district, what is the probability it was drug / narcotic-related? Answer to four significant figures. P (drug/narcotic | Richmond) = P (drug/ narcotic and Richmond) / P (Richmond) = 0.02469
Question 1h
Are Category and PdDistrict independent?
The condition for two events, say A and B, to be independent the following condition has to be satisfied:
In our case, let category be A and PdDistrict be B then:
P (drug /narcotic and Richmond) = P(drug/narcotic) * P(Richmond) if category and PdDistrict are independent. To verify we replace with actual figures:
0.001350 = 0.070866 * 0.05467 = 0.003874
Since 0.001350 is not equal to 0.003874 or rather P (drug/narcotic and Richmond) is not equal to P (drug/narcotic) * P (Richmond) we conclude that category and PdDistrict are not independent of each other.