Blog Category: Social media

Visually Interpreting Names as Demographic Attributes


In the AAAI 2015 conference, we presented the work “Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data,” a collaboration with a research team in National Taiwan University. This study aims to automatically associate a name and its likely demographic attributes, e.g., gender and ethnicity. More specifically, the associations are driven by web-scale search logs that are collected via a search engine when internet users retrieve images.

Demographic attributes are vital to semantically characterize a person or a community. This makes it valuable for marketing, personalization, face retrieval, social computing and more human-centric research. Since users tend to keep their online profiles private, name is the most reachable piece of personal information among these contexts. The problem we address is – given a name, associating and predicting its likely demographic attributes. For example, given a person named “Amy Liu,” the person is likely an Asian female. Name makes the first impression of a person because naming conventions are strongly influenced by culture, e.g., first name and gender, last name and location of origin. Typically, the associations between names and the two attributes are made by referring to demographics maintained by governments or by manually labeling attributes based on the given personal information (e.g., photo). The former is limited in regional census data. The latter has major concerns in time and cost when it adapts to large-scale data.

Different from prior approaches, we propose to exploit click-throughs between text queries and retrieved face images in web search logs, where the names are extracted from queries and the attributes are detected from face images automatically. In this paper, a click-through means when one of the URLs returned by a text query has been clicked by a user to view a web image it directs to. The mechanism delivers two messages, (1) the association between a query and an image is based on viewers’ clicks, that is, human intelligence from web-scale users; (2) users may have considerable knowledge to the associations because they might be partially aware of what they are looking for and search engines are getting much better at satisfying user intent. Both characteristics of click-throughs reduce concerns of incorrect associations. Moreover, the Internet users’ knowledge enables discovering name-attribute associations with high generality to more countries.

In the experiments, the proposed name-attribute associations are demonstrated with competitive accuracy compared to using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially the adaption to unseen names. This is the first work to interpret a name to demographic attributes in visual-data-driven manner using web search logs. In the future, we are going to extend the visual interpretation of an abstract name to more targets for which naming conventions are highly influenced by visual appearance.

Do Topic-Dependent Models Improve Microblog Sentiment Estimation?


When estimating the sentiment of movie and product reviews, domain adaptation has been shown to improve sentiment estimation performance.  But when estimating the sentiment in microblogs, topic-independent sentiment models are commonly used.

We examined whether topic-dependent models improve performance when a large number of training tweets are available. We collected tweets with emoticons for six months and then created two types of topic-dependent polarity estimation models:  models trained on Twitter tweets containing a target keyword and models trained on an enlarged set of tweets containing terms related to a topic. We also created a topic-independent model trained on a general sample of tweets. When we compared the performance of the models, we noted that for some topics, topic-dependent models performed better, although for the majority of topics, there was no significant difference in performance between a topic-dependent and a topic-independent model.

We then proposed a method for predicting which topics are likely to have better sentiment estimation performance when a topic-dependent sentiment model is used. This method also identifies terms and contexts for which the term polarity often differs from the expected polariy. For example, ‘charge’ is generally positive, but in the context of ‘phone’, it is often negative. Details can be found in our ICWSM 2014 paper.

Social media mining intern


We are looking for an intern to work with us this summer in the area of social media analysis. The project will involve understanding and mining patterns within Twitter data, in both text and images. An ideal candidate is a PhD student with strong machine learning skills. Prior experience in image understanding, text data mining, social network analysis, or statistical modeling is a plus.  If you are interested in this project, please send your CV to Dhiraj or Francine

Facebook UX, an analogy

on Comments (2)

This may be old news to some of the true social media junkies, but thanks to Gentry Underwood’s PARC forum today, I saw a great video analogy for the Facebook interaction style. Enjoy.

The video is made by a British comedy group called Idiots of Ants; the pun becomes evident when the group’s name is pronounced with a British accent.

Social Media Overload

on Comments (2)

In the aftermath of the recent SXSW event, Alexandra Samuel wrote on the HBR blog about five unsolved problems facing Social Media. She enumerated contact list overload, search overload, information overload, brand overload, and apathy overload. It’s not clear to me, however, whether these are pressing issues, and whether universal solutions to them would constitute an improvement over the current chaos.

Continue Reading

Rapid evolution of social media has its drawbacks

on Comments (1)

(Please be aware that some ChatRoulette links may contain mature content.)

Dear me. All those folks doing naughty things on ChatRoulette, secure in their Net-anonymity, may suddenly meet a rude awakening: Chat Roulette Map, a new Google Maps mash-up, maps users’ chat image to their location, based on IP address. Last week, it also showed users’ ip addresses.

Note that Chat Roulette Map has just added a new pop-up window when you first load the page:

Welcome To Chat Roulette Map
We’d like to advise to stop using
student’s names in their hostnames.

We’ve decided, at least for the time being, to
hide IP & host information as some user-identifiable
information was found in some entries.

No, you think? It’ll be interesting to see how this warning window evolves over the next few weeks.

Eddi-fying tweet browsing


Michael Bernstein and the usual suspects wrote a nice position paper for the CHI2010 microblogging workshop. They describe Eddi, a system that allows people to group tweets by topic to make sense of large numbers of tweets. In some sense, they are addressing a similar problem to the one that Miles Efron and I tackled in our paper. In both cases, the system uses various sorts of analysis to group and filter tweets to help people understand the collection or the stream.

Continue Reading

Microblogging Inside and Outside the Workplace


Kate Ehrlich and N. Sadat Shami have written a paper (accepted to ICWSM 2010) that compares IBMers’ use of Twitter and an internal micro-blogging tool (with the unfortunate title of BlueTwit). The paper analyzes tweeting patterns of 34 people over a four month period. The authors found that people in their sample tended to use both system more for question asking/answering and dissemination of information than for status updates, which contrasts with Namaan et al.’s finding that “meformers” (i.e., people who tweet about what they are up to) out-number “informers” in the sample they analyzed.

Ehrlich and Shami’s study found that people used these tools to improve the social status: internally to manage their reputation, to be seen as a source of useful answers rather than just of questions, and on Twitter both to promote their company and to develop their professional status.

Continue Reading

Modeling social media


Marti Hearst gave an interesting talk at JHU on Social Media in which she described some important dimensions of through which we can understand the variety of phenomena that are tagged with that label. She examined expertise, the degree to which data are shared (synchronized!) among the people engaged in some activity, and the degree to which participants are working toward an explicitly-shared goal (even if they approach it different personal motivation).

Continue Reading

Ask not what Twitter can do for Yahoo!…


Yesterday Yahoo! announced that it reached an agreement with Twitter to incorporate the twitter feed into its properties in a variety of ways, including surfacing tweets related to particular topics, return  more tweets in search results, and allow users to read their tweets and tweet directly from their Yahoo! pages. The move is interesting more as another vote for the importance of Twitter as a communication channel than in the value it introduces into people’s interactions with Yahoo!

Continue Reading