In the AAAI 2015 conference, we presented the work “Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data,” a collaboration with a research team in National Taiwan University. This study aims to automatically associate a name and its likely demographic attributes, e.g., gender and ethnicity. More specifically, the associations are driven by web-scale search logs that are collected via a search engine when internet users retrieve images.
Demographic attributes are vital to semantically characterize a person or a community. This makes it valuable for marketing, personalization, face retrieval, social computing and more human-centric research. Since users tend to keep their online profiles private, name is the most reachable piece of personal information among these contexts. The problem we address is – given a name, associating and predicting its likely demographic attributes. For example, given a person named “Amy Liu,” the person is likely an Asian female. Name makes the first impression of a person because naming conventions are strongly influenced by culture, e.g., first name and gender, last name and location of origin. Typically, the associations between names and the two attributes are made by referring to demographics maintained by governments or by manually labeling attributes based on the given personal information (e.g., photo). The former is limited in regional census data. The latter has major concerns in time and cost when it adapts to large-scale data.
Different from prior approaches, we propose to exploit click-throughs between text queries and retrieved face images in web search logs, where the names are extracted from queries and the attributes are detected from face images automatically. In this paper, a click-through means when one of the URLs returned by a text query has been clicked by a user to view a web image it directs to. The mechanism delivers two messages, (1) the association between a query and an image is based on viewers’ clicks, that is, human intelligence from web-scale users; (2) users may have considerable knowledge to the associations because they might be partially aware of what they are looking for and search engines are getting much better at satisfying user intent. Both characteristics of click-throughs reduce concerns of incorrect associations. Moreover, the Internet users’ knowledge enables discovering name-attribute associations with high generality to more countries.
In the experiments, the proposed name-attribute associations are demonstrated with competitive accuracy compared to using manual labeling. It also benefits profiling social media users and keyword-based face image retrieval, especially the adaption to unseen names. This is the first work to interpret a name to demographic attributes in visual-data-driven manner using web search logs. In the future, we are going to extend the visual interpretation of an abstract name to more targets for which naming conventions are highly influenced by visual appearance.