Studyspark Study Document

Extracting Information Sentiment From Blogs Research Proposal

Pages:20 (5886 words)

Sources:50

Subject:Technology

Topic:Net Neutrality

Document Type:Research Proposal

Document:#44933379




4. Transparency, authenticity, and focus are good. Bland is bad. Many people are looking for someone who is in authority to share their ideas, experiences, or suggestions (Bielski, 2007, p. 9).

Moreover, just as content analysis of other written and symbolic forms has provided new insights that might have otherwise gone unnoticed, the analysis of blog content may reveal some unexpected findings concerning hot topics and significant social trends that are shaping the users of this information. For example, a data infrastructure engineering team intern working at Facebook recently generated an eerily accurate global map based on Facebook friendship links. According to the developer, "I was interested in seeing how geography and political borders affected where people lived relative to their friends. I wanted a visualization that would show which cities had a lot of friendships between them" (Butler, 2010, para. 3). While Butler had some vague ideas about the types of clusters that would populate the map, he would surprised by the results in the way they mirrored the population densities of the world so accurately, with some noticeable absences (Cuba, North Korea, large parts of Africa and South America, the western half of the United States, etc.).

Based on his content analysis of 10 million Facebook friendship links, Butler plotted the location of each individual's latitude and longitude lines and generated connecting lines between each friendship pair, with higher levels of paired links being shown as brighter lines in the map in Figure 1 below.

Figure 1. Butler's Facebook friendship links map: dark areas on the map represent where Facebook use is less prevalent

The map's striking similarity to geopolitical maps was also noted by Butler. According to Butler, "Not only were continents visible, certain international borders were apparent as well. What really struck me, though, was knowing that the lines didn't represent coasts or rivers or political borders, but real human relationships. Each line might represent a friendship made while travelling, a family member abroad, or an old college friend pulled away by the various forces of life" (2010, para. 4).

This analytical approach is also used by Finin and his associates for sentiment-identification purposes. According to these authorities, "Our approach uses the link structure of a blog graph to associate sentiments with the links connecting blogs. Such links are manifested as a URL that blogger a uses in his blog post to refer to blogger B's post. We call this sentiment link polarity, and the sign and magnitude of this value is based on the sentiment of text surrounding the link" (p. 78). Clearly, this type of online data can be used to reveal some valuable new information in ways that have never been possible in the past.

Such graphic representations are just some of the attributes of written communication that content analysis can provide. Because blogs (and this term can be expanded to include the idle chit-chat, back-and-forth, thoughts, ramblings, viewpoints and other posts shared on Facebook and other social networking fora ever day) represent an incredibly accessible way to reach other people, and people who know those people and so forth in an ever-widening network of social interaction. This accessibility may be fundamentally more significant in the long-term than other important innovations in communication such as the telephone. In this regard, a growing number of observers cite the increasing importance of the Internet in the business world and suggest that blogging has become the platform of choice for consumers and their favorite companies (Pikas, 2005). For instance, Bielski emphasizes that not all bloggers are created equally, at least with respect to their online posts. "Certainly, there is hype surrounding Web 2.0 with its dual message of the internet as application platform and internet as the ultimate participatory forum. and, blogging is viewed as a staple of this new internet" (2007, p. 8).

Identifying recurring themes and emerging trends in this type of dynamic environment is a challenging enterprise to be sure. As Bielski points out, "Yet out of the glare, the reality of user-generated content is a mixed bag. The writing can be freeform, to put it politely. Many blogs look horrible," she notes and adds that many are "boring, or 'safe' might be better adjectives" (2007, p. 8). Furthermore, this "mixed bag" of blog content makes identifying posts that may communicate certain sentiments even more challenging. According to Bielski, "Corporate creators don't make these blogs easy to subscribe to, search through, or otherwise interact with" (2007, p. 8).

Fortunately, Google provides a series of URL templates that can be "invoked via command M-x emacspeak-url-template-fetch normally bound to control e u . This command prompts for the name of the template, and completion is available via Emacs' minibuffer completion" (Google Blog Search, 2010, para. 2). The steps involved in conducting this analysis for each URL template are as follows:

A. Prompt for the relevant information.

B. Fetch the resulting URL using an appropriate fetcher.

C. Set up the resulting resource with appropriate customizations.

Although "unblog-related," the template application used by Google Blog Search developers provides a useful example of how this procedure operates. According to Google Blog Search, "As an example, the URL templates that enable access to NPR media streams prompt for a program id and date, and automatically launch the realmedia player after fetching the resource" (2010, para. 3). As to their online application, the developers at Google Blog Search describe their efforts thusly: "Blog Search is Google search technology focused on blogs. Google is a strong believer in the self-publishing phenomenon represented by blogging, and we hope Blog Search will help our users to explore the blogging universe more effectively, and perhaps inspire many to join the revolution themselves" (2010, para. 2). As to the expected blog content that will be sentiment related, the developers make it clear their hosting ranges the entire human experience:

Whether you're looking for Harry Potter reviews, political commentary, summer salad recipes or anything else, Blog Search enables you to find out what people are saying on any subject of your choice. Your results include all blogs, not just those published through Blogger; our blog index is continually updated, so you'll always get the most accurate and up-to-date results; and you can search not just for blogs written in English, but in French, Italian, German, Spanish, Korean, Brazilian Portuguese, Dutch, Russian, Japanese, Swedish, Malay, Polish, Thai, Indonesian, Tagalog, Turkish, Vietnamese and other languages as well (Google Blog Search, 2010, para. 3).

Some of the other key features that make Google Blog Search useful for the purposes of the proposed study include the following:

A. The links allow user to browse Google Blog Search results by topic. For example, clicking the Technology link shows top stories in the tech world.

B. The goal of Blog Search is to include every blog that publishes a site feed (either RSS or Atom). It is not restricted to Blogger blogs, or blogs from any other service.

C. Google Blog Search uses a set of algorithms to try to determine the most popular stories in the blogosphere. The applications takes into account factors such as a blog's title and content, as well as its popularity throughout the rest of the blogging community. The results are displayed based on groups of posts that are closely related..

An informal blog search using Google's "search blogs" feature provides the following raw sentiment-related search results:

Table 1

Blog Search Results of Sentiment-Related Terms (as of December 20, 2010)

Search Term

Number of Matches

Love

467,098,607

Hate

67,059,281

Awesome

79,550,156

Terrible

17,692,083

Angry

24,621,192

Like

821,870,100

Dislike

6,399,023

Enjoy

152,132,318

Clearly, there is a great deal of sentiment being expressed in blogs, but without knowing the specific context in which these sentiment-related terms are used, though, it is impossible to discern their true meanings. For instance, some bloggers might enthuse that they "just love the pasta at Joe's Spaghetti House," while others might state they "love the president's economic policies." Likewise, other bloggers might "hate the weather" while others "hate the president's economic policies." Given the enormous response to the search term "like," it is clear that some bloggers might "like Ike" while others use the term as a comparison as in, "Eating at this restaurant is like a trip to the dentist's office." The context of the sentiment-related posts will therefore require comparison to a corpus of various sentiments used in common practice to identify positive from negative sentiments (Ojala, 2009). For example, the word "like" or "love" when used immediately with or adjacent to descriptors such as "movie" or "restaurant" could be categorized as a review, while these words used with descriptors such as personal nouns might indicate a romantic relationship. This corpus would be fine-tuned as the learning process proceeded through additional permutations of the supporting algorithms.

The results of a study by Manning (2009) that sought to identify effective ways to garner sentiment-related data from online reviews provides…


Sample Source(s) Used

References

Bichard, S.L. (2006). Building blogs: a multi-dimensional analysis of the distribution of frames on the 2004 presidential candidate Web sites. Journalism and Mass Communication

Quarterly, 83, 329-333.

Bielski, L. (2007). Got blogs? Not exactly a banking staple, a few pioneers have embraced this 'new media.' ABA Banking Journal, 99(5), 7-9.

Brynko, B. (2007, June). Northern Light's MI Analyst: New visions in marketing research.

Cite this Document

Join thousands of other students and "spark your studies."

Sign Up for FREE
Related Documents

Studyspark Study Document

Sentiment There Are As Many Sentiment Analysis

Pages: 15 (4564 words) Sources: 15 Subject: Business Document: #32285381

Sentiment There are as many sentiment analysis techniques as there are reasons for conducting sentiment analysis. Analysis techniques are employed to discern sentence, phrase, word and text meanings, and predictive, machine-related, emotional and psychological aspects are measured by sentiment analysis as well. This literature review will attempt to navigate the various avenues presented by such diverse usage of sentiment analysis and provide information that categorizes and differentiates between the various techniques

Studyspark Study Document

Promising Phenomenon That Lends Itself

Pages: 96 (26560 words) Sources: 53 Subject: Business Document: #20339371

66).

Furthermore, social software will only increase in importance in helping organizations maintain and manage their domains of knowledge and information. When networks are enabled and flourish, their value to all users and to the organization increases as well. That increase in value is typically nonlinear, where some additions yield more than proportionate values to the organization (McCluskey and Korobow, 2009). Some of the key characteristics of social

Studyspark Study Document

Cloud Computing Will Be Discussed to Show

Pages: 26 (9986 words) Sources: 20 Subject: Education - Computers Document: #12816018

cloud computing will be discussed to show that the good outweighs the bad. Furthermore, it will be further discussed that the government is looking into using cloud computing because it will cut IT cost down and increase capabilities despite the fact people are concerned with security issues that this may bring to the public. In completing a dissertation, it is very hard to go through the challenges that it requires.

Studyspark Study Document

Ford Motor Company Background and

Pages: 21 (5756 words) Sources: 7 Subject: Business Document: #64462795

The economic environment is difficult. The United States may finally be showing signs of emerging from recession, but the recent economic difficulty has taken its toll of Ford. Following the short-lived spike provided by the 'cash for clunkers' program, auto sales have slumped again. Many competitors saw sales fall dramatically in the wake of that program. Ford, however, did not suffer as much. While two of its most popular models,

Studyspark Study Document

Unilever Is a Consumer Products Multinational Is

Pages: 10 (2881 words) Sources: 10 Subject: Business Document: #97591294

Unilever is a consumer products multinational is listed in London and the Netherlands simultaneously. The company has a highly diversified product base such that it is not dependent on any one business or market for its success. The consumer products industry in the West is a mature business, and it is a growing business in the growing economies of the world like the BRIC economies and many Asian countries. Being a

Join thousands of other students and

"spark your studies".