Social Network analysis

In this post, we will discuss the basics of social network analysis. With the recent pandemic disease caused by covid-19 virus, it is important to develop certain set of models that can better assess how the disease will be propagated with time. Social networks is of significant importance in modeling spread of infectious disease.

Preliminary definitions

Nodes – Nodes represent the basic entity or building block of a network. In a social network, it represents the people. Different kind of nodes will lead to multimode network.

Links – Links represents the communication between two nodes. In a social network, it represents the communication or contact between people. Differnt kind of links will lead to multiplex network.

Walk – It is defined as passing among nodes through links.

Cycle – It is a kind of walk where we start and end at the same node.

Path – It is a special kind of walk in which there is passing through different nodes. There is no repeatation of any node when you walk along nodes.

Geodesic – Geodesic between two nodes is defined as the shortest path between them.

Diameter – For a network, it is defined as the largest geodesic (longest shortest path).

Average path length – For a network, It is the defined as the average geodesic (average shortest path).

Degree of separation – Degree of separation defines the minimum number of steps that it takes for one node to reach another node in a network. On world network, there is a famous theory called six degrees of separation that suggests that every person knows every other person through \( 5.5 \approx 6 \) steps or connections or people away. On social networking site, facebook, the degrees of separation was found to be \( \approx 4.7 \).

Triangles – In a group of network, it is defined as the connection that would be established if they share a common connection.

Structural holes – It is defines as the places in a network when nodes/people are unconnected. In other words, it can be seen as a separation between non-redundant contacts.

Network Centrality – Network centrality indicates which nodes are most central in the network. The interpretation of the term centrality varies by context/purpose and therefore it is generally divided to the following:

Degree Centrality – This indicates the most connected node (or having highest degree) in the network.
Closeness Centrality – This indicates the node that are closest to all other nodes. The minimum number of steps that is required from a node to every other node determines its closeness centrality. In other words, this centrality also determines the degree of separation of a node.
Betweenness Centrality – This indicates that how often does a node lie on the path that is connecting every node with every node. In simpler terms, we are determining the sum of shortest paths passing through the node.
EigenVector centrality – It includes not only how well a node is connected, but whether the connection of that node are themseleves popular or having good connections. In other words, it’s proportional to the sum of the neighbor’s centrality. Example - Google page rank algorithm - algorithm that render pages after search according to ranks that is proportional to the sum of the rank of the linked pages. The more popular page links to the page, the better it is, and it will find its way to the popular results when searching. So, it is important to get a page being linked by the most popular page, the page that is most searched by people across the globe.

Classes of Network measures – Network measures are divided into the following classes:

Global measures– average degree, degree distribution, path length, etc. computes one number for the entire network.
Local measures – clustering, transitivity, structural equivalence etc. computes several numbers for each of the subnetwork within the entire network.
Individual measures – degree centrality, closeness, betweenness, eigenvector, etc. computes one number for each node in the network.

Conductance – This is computed as the number of edges pointing outside the community divided by the number of edges inside the community.

Homophily – It is a way that like-minded people link with the other like-minded people.

Community Detection

Girvan-Newman algorithm takes advantage of the use of betweenness in order to detect different groups or communities in the network. This works in the following manner:

Compute the betweeness of all existing ties (edges).
Remove the tie (edge) with the highest betweenness.
Recompute the betweeness of all ties (edges) affected by the removal.
Repeat steps 2 and 3 until no ties (edges) remain.

Each time after step 2 (removing of an edge), there is computation of modularity which measures the density of links between communities.

Network Evolution

The idea behind network evolution is that network is never static and always changes with time. The formation of the networks evolves over time and is govern by some network model. This model captures the behavior of the changing real-world networks and explains the patterns in their behavior. In general, networks can be formed as a combination of different model, however, in the context of social netwrorking science, we study the most frequently used models:

Small world network

A small world network comes from the idea that everyone knows everyone else using common connections. Every node may or may not necessarily connected to each other directly. If they are directly connected, it is fine, but, if they are not, then they are reachable to each other using small number of steps. Small world networks consists of the following characteristics:

There is inherent high-level of clustering
Small average path length

Growing a network

The important question that one can formulate when doing the analysis on social networks is Can we grow networks with certain characteristics?

Predicting diffusion patterns

Let us consider a stable network and see what happens inside the network. So for example, like something spreading in the network, a diffusion model. A disease could spread in the network, or a rumor could spread, or innovations could spread.

Erdos Renyi network

Erdos Renyi network corresponds to a random graph model where each edge can appear with independent and identical probability.

Preferential attachment network

This network forms a preference of new getting new links for nodes that are well connected. This implies that the more connected node (node with higher degree) will preferably recieve new links.

Hubs are important because they are more likely to not only get but also pass disease/innovation/opinion/rumour in a social network. Node with degree 1000 are 1000^2 or 1000000 more likely to be contagious.

Influentials

In a social network, influentials or more aptly influencers are the people who with all their reachability, popularity and celebrity status has a large impact on people. This implies that their reachability effectiveness can be used greatly for promoting a product, advertising an annoucement, or spread an awareness about useful news to large set of audience in a more effective way.

The idea of influentials comes from the law of the few. This law suggests that whenever something spreads, somewhere along the way, one of the exceptional people found about the trend and using their social connections, enthusiasm and charisma, those exceptional people creates the cascade that gets everybody else on board with a trend.

The idea of engaging influential people or influencers is a common strategy used in viral or word-of-mouth marketing. The use of influencers can help spread a positive word of mouth about a product, business or service.

Useful Resources

Gephi – For viusal analyzation of a social network

Stanford Large Network Dataset Collection

Comprehensive collection of network graph data

Datasets for Social Network Analysis