Please write this python code: Data Mining the Internet Movie Database Websites
ID: 3780337 • Letter: P
Question
Please write this python code:
Data Mining the Internet Movie Database
Websites like the Internet Movie Database (www.imdb.com) maintain extensive information about movies and actors. If you search for a movie on the website, a web page showing information about the movie is displayed. It also shows all the actors in the movie. If you click on the link for an actor, you are taken to an actor’s page, where you can find information about him or her, including the movies the actor has appeared in. This assignment should give you some insight into the working of such websites. Here is what we’d like to do with the data:
(a) Given two titles of a movie, each representing the set of actors in that movie:
Find all the actors in those movies: i.e., A union B (A & B).
Find the common actors in the two movies: i.e., A intersection B (A | B).
Find the actors who are in either of them movies but not both: symmetric difference (A - B).
(b) Given an actor’s name, find all the actors with whom he or she has acted.
The data are available as a huge, compressed text file (at www.imdb.com/interfaces) that lists each actor followed by his or her movies and the year the movies were made.
Here is a small sample (also available on the text website) that you can work with for
this exercise:
What is an appropriate data structure? A dictionary is suggested, as we want to access the movies and actors efficiently, but what should be the key? A key needs to be unique, which rules out actors’ names—they are unique in our sample but not in the whole database. On the other hand, movie titles and production dates form a unique identity that suggests an immutable tuple—perfect as keys. We can arrange our dictionary with (title,year) pairs as keys and have a collection of actors in each movie as the dictionary values. As we will be looking at the intersection and union of actor combinations, that suggests using sets for the collection of actors’ names in each movie. Read in the data and add the data to a dictionary that is structured as described.
Repeatedly prompt the user until some sentinel is entered. If two movies are entered, they should be separated by the appropriate operator: &, |, to indicate the appropriate set operation to be performed (union, intersection, symmetric difference). If an actor is entered, find all the actors that he or she has been in movies with.
In the comments at the top of your program, put your name, the course, and the date. Then code your algorithm in Python and include proper comments along with the code to explain it. Missing or altering the order of the template will result in points being taken from your grade. You can copy your algorithm into the Python code and add further comments.
Explanation / Answer
MovieDataMining.py
==============================================================
imdb_Data.txt
=======================================================================