Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Create Python Program: Open four websites of your choice Print the title of each

ID: 3715749 • Letter: C

Question

Create Python Program: Open four websites of your choice Print the title of each website Open the web page https://commons.wikipedia.org/wiki/Main_Page Search for all the links that contain the word Category. As you find each link, open the link and print the first 10 links on that page Write all links to a text file On the website http://www.gutenberg.org/files/ is a list of ebooks. The text of each book can be found by using the full address (with book number 2000 as an example): http://www.gutenberg.org/files/2000/2000.txt. The user is asked to enter a certain number of books for search for. Since the user does not know the number of all the books (such as the 2000 used above), that number should be a random number selected from the numbers 2000 to 8000. When a valid number is found, print the first 300 characters of the book text. If there is no matching number to the random number generated, give a message that says the book with that number cannot be found. Also, do not include that unsuccessful try in the total number of books being searched for. If the user entered five books to search for, the first 300 characters of five books should be printed.

Explanation / Answer

I'll always use lxml for such tasks. You could use beautifulsoup as well.

import lxml.html

t = lxml.html.parse(url)

print t.find(".//title").text

Here's a simplified version

import urllib2

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://www.google.com"))

print soup.title.string

NOTE:

•        soup.title finds the first title element anywhere in the html document

•        title.string assumes it has only one child node, and that child node is a string