Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A cookie is data that a Web site stores on your computer to record something abo

ID: 420905 • Letter: A

Question

A cookie is data that a Web site stores on your computer to record something about its interaction with you. The cookie might contain data such as the date you last visited, whether you are currently signed in, or something else about your interaction with that site. Cookies can also contain a key value to one or more tables in a database that the server company maintains about your past interactions. In that case, when you access a site, the server uses the value of the cookie to look up your history. Such data could include your past purchases, portions of incomplete transactions, or the data and appearance you want for your web page. Most of the time cookies ease your interaction with Web sites.

Cookie dagta includes gthe URL of the Web site of the cookie's owner. Thus, for example, when you go to Amazon, it asks your browser to place a cookie on your computer that includes its name, www.amazon.com (Links to an external site.)Links to an external site.. Your browser will do so unless you have turned cookies off.

A third party cookie is a cookie created by a site other than the one you visited. Such cookies are generated in several ways, but the most common occurs when a Web page includes content from multiple sources. For example, Amazon designs its pages so that one or more sections contain ads provided by the ad-servicing company, DoubleClick. When the browser constructs your Amazon page, it contacts DoubleClick to obtain the content for such sections (in this case, ads). When it responds with the content, DoubleClick instructs your browser to store a DoubleClick cookie. That cookie is a third-party cookie. In general, third-party cookies do not contain the name or any value that identifies a particular user. Instead, they include the IP address to which the content was delivered.

On its own servers, when it creates the cookie, DoubleClick records that data in a log, and if you click on the ad, it will add that fact of that click to the log. This logging is repeated every time DoubleClick shows an ad. Cookies have an expiration date, but that date is set by the cookie creator., and they can last many years. So, over time, DoubleClick, and any other third-party cookie owner will have a history of what they've shown, what ads have been clicked, and the intervals between interactions.

But the opportunity is even greater. DoubleClick has agreements not only with Amazon, but also with many others, such as Facebook. If Facebook includes any DoubleClick content on its site, DoubleClick will place another cookie on your computer. This cookie is different from the one that it placed via Amazon, but both cookies have your IP address and other data sufficient to associate the second cookie as originating from the same source as the first. So, DoubleClick now has a record of your ad response data on two sites. Over time, the cookie log will contain data to show not only how you respond to ads, but also your pattern of visiting various Web sites on all those sites in which it places ads.

You might be surprised to learn how many third-party cookies you have. The browser Firefox has an optional feature called Lightbeam that tracks and graphs all the cookies on your computer. This feature can easily show how quickly a computer user can collect third-party cookies from numerous sources without even knowing it.

Who are these companies that specialize in gathering browser behavior data? Lightbeam can also determine the name and location of companies who placed cookies on your computer. If you conduct analysis using Lightbeam, you might find familiar URLs such as Facebook, DoubleClick and other commonly recognized names. However you might be surprised to find that a large majority of these cookies are more than likely to originate from websites that you have never heard of, such as Bluekai or Rubiconproject.

Third-party cookies generate incredible volumes of log data. For example, suppose a company, such as DoubleClick, shows 100 ads to a given computer in a day. If it is showing ads to 10 million computers (this number is very much within the realm of possibility), that is a total of 1 billion log entries per day, or 360 billion in one year. Truly this is BigData!

Storage is essentially free, but how can they possibly process all that data? How do they parse the log to find entries just for your computer? How do they integrate data from different cookies on the same IP address? How do they analyze those entries to determine which ads you clicked on? How do they then characterize differences in ads to determine which characteristics matter most to you? The answer, is to use parallel processing. Using a MapReduce algorithm, they distribute the work to thousands of processors that work in parallel. They then aggregate the results of these independent processors and then, possibly, move to a second phase of analysis where they do it again. Hadoop, the open-source program that you learned about in Chapter 9 of your text book is a favorite for this process.

1/ Suppose you are an ad-serving company, and you have a log of cookie data for ads served to Web pages of all your customers (Amazon, Facebook, etc).

1/ Describe, in general terms, how you can process the cookie data to associate log entries for a particular IP address.

2/ Describe how you can use this log data to determine users who consistently seek the lowest price.

3/ Describe how you can use this log data to determine users who consistently seek the latest fashion.

4/ Explain why uses like those in 1.2 and 1.3 above are only possible with MapReduce or a similar technique.

Explanation / Answer

The cookie data can be processed by IP address. With cookie data, the ads that have been clicked on can be determined. the IP address of a user can be found through the ads IP address.

Ads are not only used to see which are the popular interests among the users, it also helps to get the actual IP addresses of these users.

With the cookies, one can find people who are constantly looking for the cheaper options by looking up for discounts or sales.

Same as in the previous question, With the cookies, one can find people who are constantly looking for the latest trends, using words such as fashion, trending or clothes.

Because MapReduce is a programming model used for processing and generating large data sets on clusters of computers. Using MapReduce, thousands of computers work in parallel and users can get and share information, speeding up search and gathering more information using various computers instead of only one.