Study delves into predicting social network users’ activities and relationships

A new study has proven how capturing social influence quantitatively is useful for predicting social network users’ activities and relationships simultaneously.

The study, A unified Probabilistic Model of User Activities and Relations on Social Networking Sites, was released as part of the International Joint Conference on Artificial Intelligence in Argentina this week.

The researchers of the study wanted to explore the effect of social influence when trying to predict users’ activities and user-user relationships in a single framework. Other studies in this field only look at social influence heuristically, and do no measure and capture it quantitatively to be able to take into account key factors when doing predictions.

“We study the social influence quantitatively as the probability that a user follows an opinion from others, for both [a] user’s activities and this user’s relationships to others,” researchers from HP Labs China wrote in their paper.

“Social activity prediction and social relationship discovery have become critical research goals in academia and industry recently in the field of social network analysis and have played an essentially important role in a variety of World Wide Web applications.”

The researchers found that as of December 2014, there were more than 500 million Twitter users, 284 million whom are active users – a large, rich pool of data to work with.

The researchers built a joint activity and relation (JAR) predictive model, which learns the social influence between users and their personal preferences for both activity prediction and relationship discovery.

“We demonstrate that the learned social influence and users’ personal preferences in the JAR model are very useful for boosting both user activity prediction and user-user relation discovery.”

The researchers set out to see how a user performs an activity probabilistically to his or her’s own preferences from others through social influence.

“Ultimately, the activity of u [user] is performed implicitly in accordance with the overall collective interests of u’s and others’. In summary, activity performing procedure of user u is to draw a preference from the collection based on u’s independence and others’ influence.

“As a result, a preference similar to u has a higher probability to be drawn than a different preference from others. Similarly, u creates relationship with v [a different user] probabilistically to his prior impressions to v or the agreement from others. For user-user relationships, we may conclude similar preference drawing process which captures influence from others,” the paper said.

The model uses conditional probability, where it predicts an event given certain known or observed conditions. It finds the conditional probability of a user’s (u) activity (y) or P(y|u) and the conditional probability of a user’s relationship to another user (v) or P(v|u).

A set of latent topics (t) are used to capture a user’s own interest and preferences, where a topic (t) is correlated with an activity and its associated tag (w).

The model jointly looks at the distribution of:

  • social influence P(u|s) from s (another user) to u,
  • User u’s own preference P(t|u) over the latent topics,
  • s’s preference P(s|t) over the latent topics,
  • activity P(y|t) for each topic,
  • generated content P(w|t) for each topic.

“Both the activity of user u and the relationship of user u to user v are probabilistically determined based on u’s own interests or the preferences of s … via social influence. We define the social influence dependency P(s|u) as the probability of user u to be influenced by user s."

To find the most likely activity and relationship for user u, the researchers needed to optimise the joint conditional probability or P(y, v|u), which can be calculated as:

The researchers used Twitter data to train and test their model. They used a microblogging system to crawl the social network site and obtain a dataset of 5,275 users, including 22,382 friendship/followership links, 120,285 posted tweets, and 282,450 user activities such as commenting and sharing tweets.

The researchers also decided to optimise the model parameters by using an expectation maximisation (EM) algorithm and generalised it so that it doesn’t overfit or perform badly when being given previously unseen data for testing.

Five-fold cross validation was also used to prevent overfitting the trained model - a technique that partitions data into subsets and does rounds of applying the model to each partition, and then averages the results from the rounds.

The tested model was compared with other popular methods for doing similar prediction tasks such as Logistic Regression, correlation (CORR) based classifier and Friendship-Interest Propagation (FIP).

The results in the study show that the researcher’s JAR with generalised EM model outperformed the others. When predicting user activity, the JAR with generalised EM received a precision or accuracy score of 84.68 per cent, recall of 82.97 per cent and F1 (combines precision and recall) of 83.82 per cent.

When it comes to user-user relationship discovery, the researcher’s model received a precision score of 90.82 per cent, recall of 88.83 per cent and F1 measure of 89.81 per cent.

In comparison, Logistic Regression only got an F1 score of 68.39 per cent for predicting user activity and 73.86 per cent for user-user relationship discovery. It received a precision of 69.44 per cent and recall of 67.37 per cent for user activity, with user-user relationship receiving a precision of 74.55 per cent and recall of 73.19 per cent.

CORR got a F1 score of 69.84 per cent for user activity, with a precision of 70.89 per cent and recall of 68.82 per cent. With user-user relationship it got a F1 of 71.14 per cent, 72.48 per cent for precision and 69.85 per cent for recall.

FIP also didn’t come close to the researcher’s model, receiving an F1 score of 74.46 per cent for user activity and 77.95 per cent for user-user relationship. For precision and recall, it got 75.25 per cent and 73.69 per cent for user activity, 79.65 per cent and 76.33 per cent for user-user relationship.

For future work, the researchers want to test the model on data from other large social networking websites, and extend the model to do more general semi-supervised and unsupervised tasks.


Copyright © 2015 IDG Communications, Inc.

7 secrets of successful remote IT teams