Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the sf_crime data. These data contain information on 10,000 police inci

ID: 3331915 • Letter: C

Question

Consider the sf_crime data. These data contain information on 10,000 police incidents in San Francisco, CA during 2003-2015. In what follows, we will investigate the Category and PdDistrict categorical variables, indicating the category of offense and police district in which the offense occurred, respectively. We will restrict attention to only the 12 most prevalent incident categories. Run the following code to prepare the data and construct the joint distribution, the marginal distributions, and the conditional distributions.

## Read data.
dta <- read.csv("sf_crime.csv")
dim(dta)
head(dta)
summary(dta)

n <- nrow(dta)

## Separate out and filter the 'Category' and 'PdDistrict' variables.
category <- dta$Category
district <- dta$PdDistrict

tt <- table(category)
oo <- order(tt, decreasing = TRUE)
ii_keep <- (1:n)[category %in% names(tt)[oo][1:12]]
category <- factor(as.character(category)[ii_keep])
district <- factor(as.character(district)[ii_keep])

## Joint and marginal distributions.
f_cat_dis <- prop.table(table(category, district))
f_cat <- rowSums(f_cat_dis)
f_dis <- colSums(f_cat_dis)

## Conditional distributions of 'category' given 'district'.
f_cat_given_dis <- t(f_cat_dis) / f_dis
f_dis_given_cat <- f_cat_dis / f_cat

Question 1a

What is the probability the incident was drug / narcotic-related and occurred in the Tenderloin district? Answer to four significant figures.

Question 1b

What is the probability the incident was drug / narcotic-related and occurred in the Richmond district? Answer to four significant figures.

Question 1c

What is the probability the incident was drug / narcotic-related, regardless of the district in which it occurred? Answer to four significant figures.

Question 1d

What is the probability the incident occurred in the Tenderloin district, regardless of the incident category? Answer to four significant figures.

Question 1e

What is the probability the incident occurred in the Richmond district, regardless of the incident category? Answer to four significant figures.

Question 1f

Given that a crime occurred in the Tenderloin district, what is the probability it was drug / narcotic-related? Answer to four significant figures.

Question 1g

Given that a crime occurred in the Richmond district, what is the probability it was drug / narcotic-related? Answer to four significant figures.

Question 1h

Are Category and PdDistrict independent?

Explanation / Answer

Question 1a

What is the probability the incident was drug / narcotic-related and occurred in the Tenderloin district? Answer to four significant figures.

P (drug/narcotic and Tenderloin) = 0.020922

Question 1b

What is the probability the incident was drug / narcotic-related and occurred in the Richmond district? Answer to four significant figures. P (drug/narcotic and Richmond) = 0.001350

Question 1c

What is the probability the incident was drug / narcotic-related, regardless of the district in which it occurred? Answer to four significant figures .P (drug/narcotic-related) = 0.070866

Question 1d

What is the probability the incident occurred in the Tenderloin district, regardless of the incident category? Answer to four significant figures.P (Tenderloin) =0.09190

Question 1e

What is the probability the incident occurred in the Richmond district, regardless of the incident category? Answer to four significant figures. P (Richmond) = 0.05467

Question 1f

Given that a crime occurred in the Tenderloin district, what is the probability it was drug / narcotic-related? Answer to four significant figures.P (drug/narcotic | Tenderloin) = P (drug/narcotic and Tenderloin)/ P(Tenderloin) =0.2277

Question 1g

Given that a crime occurred in the Richmond district, what is the probability it was drug / narcotic-related? Answer to four significant figures. P (drug/narcotic | Richmond) = P (drug/ narcotic and Richmond) / P (Richmond) = 0.02469

Question 1h

Are Category and PdDistrict independent?

The condition for two events, say A and B, to be independent the following condition has to be satisfied:

In our case, let category be A and PdDistrict be B then:

P (drug /narcotic and Richmond) = P(drug/narcotic) * P(Richmond) if category and PdDistrict are independent. To verify we replace with actual figures:

0.001350 = 0.070866 * 0.05467 = 0.003874

Since 0.001350 is not equal to 0.003874 or rather P (drug/narcotic and Richmond) is not equal to P (drug/narcotic) * P (Richmond) we conclude that category and PdDistrict are not independent of each other.