Create Python Program: Open four websites of your choice Print the title of each
ID: 3715749 • Letter: C
Question
Create Python Program: Open four websites of your choice Print the title of each website Open the web page https://commons.wikipedia.org/wiki/Main_Page Search for all the links that contain the word Category. As you find each link, open the link and print the first 10 links on that page Write all links to a text file On the website http://www.gutenberg.org/files/ is a list of ebooks. The text of each book can be found by using the full address (with book number 2000 as an example): http://www.gutenberg.org/files/2000/2000.txt. The user is asked to enter a certain number of books for search for. Since the user does not know the number of all the books (such as the 2000 used above), that number should be a random number selected from the numbers 2000 to 8000. When a valid number is found, print the first 300 characters of the book text. If there is no matching number to the random number generated, give a message that says the book with that number cannot be found. Also, do not include that unsuccessful try in the total number of books being searched for. If the user entered five books to search for, the first 300 characters of five books should be printed.
Explanation / Answer
I'll always use lxml for such tasks. You could use beautifulsoup as well.
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text
Here's a simplified version
import urllib2
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen("https://www.google.com"))
print soup.title.string
NOTE:
• soup.title finds the first title element anywhere in the html document
• title.string assumes it has only one child node, and that child node is a string