Fast filtering and animation of large dynamic networks (2024)

(1)

R E S E A R C H

Open Access

Fast ﬁltering and animation of large dynamic

networks

Przemyslaw A Grabowicz

1,2*

_{, Luca Maria Aiello}

_{and Filippo Menczer}

*_{Correspondence:}

pms@mpi-sws.org

1_{Max Planck Institute for Software}

Systems, Saarland University,Saarbrucken, Germany

2_{Institute for Cross-Disciplinary}

Physics and Complex Systems,University of Balearic Islands, Palmade Mallorca, Spain

Full list of author information isavailable at the end of the article

Abstract

Detecting and visualizing what are the most relevant changes in an evolving networkis an open challenge in several domains. We present a fast algorithm that ﬁlterssubsets of the strongest nodes and edges representing an evolving weighted graphand visualize it by either creating a movie, or by streaming it to an interactive networkvisualization tool. The algorithm is an approximation of exponential sliding

time-window that scales linearly with the number of interactions. We compare thealgorithm against rectangular and exponential sliding time-window methods. Ournetwork ﬁltering algorithm: (i) captures persistent trends in the structure of dynamicweighted networks, (ii) smoothens transitions between the snapshots of dynamicnetwork, and (iii) uses limited memory and processor time. The algorithm is publiclyavailable as open-source software.

1 Introduction

Network visualization is widely adopted to make sense of, and gain insight from, com-plex and large interaction data. These visualizations are typically static, and incapable todeal with quickly changing networks. Dynamic graphs, where nodes and edges churn andchange over time, can be eﬀective means of visualizing evolving networked systems suchas social media, similarity graphs, or interaction networks between real world entities.The recent availability of live data streams from online social media motivated the devel-opment of interfaces to process and visualize evolving graphs. Dynamic visualization issupported by several tools [–]. In particular, Gephi [] supports graph streaming witha dedicated API based on JSON events and enables the association of timestamps to eachgraph component.

While there is some literature on dynamic layout of graphs [–], not much work hasbeen done so far about developing information ﬁltering techniques for dynamic visual-ization of large and quickly changing networks. Yet, for large networks in which the rateof structural changes in time could be very high, the task of determining the nodes andedges that can represent and transmit the salient structural properties of the network at acertain time is crucial to produce meaningful visualizations of the graph evolution.

We contribute to ﬁlling this gap by presenting a new graph ﬁltering and visualizationtool calledfastviz that processes a chronological sequence of weighted interactionsbetween the graph nodes and dynamically ﬁlters the most relevant parts of the networkto visualize. Our algorithm:

(2)

• captures persistent trends in structural properties of dynamic networks, whileremoving no longer relevant portions of the networks and emphasizing old nodes andlinks that show fresh activity;

• smoothens transitions between the snapshots of a dynamic network by leveragingshort-term and long-term node activity;

• uses limited memory and processor time and is fast enough to be applied to large livedata streams and visualize their representation in the form of a network.

The reminder of this paper is structured as follows. First, we introduce related studies inSection . Next, we introduce thefastvizﬁltering method for dynamic networks in Sec-tion . We compare this method against rectangular and exponential sliding time-windowapproaches and show what are the advantages of our method. Finally, we present visual-izations created with our ﬁltering methods for four diﬀerent real datasets in Section ,and conclude the study.

2 Related work

Graph drawing [, ] is a branch of information visualization that has acquired great im-portance in complex systems analysis. A good pictorial representation of a graph can high-light its most important structural components, logically partition its diﬀerent regions,and point out the most central nodes and the edges on which the information ﬂows morefrequently or quickly. The rapid development of computer-aided visualization tools andthe reﬁnement of graph layout algorithms [–] allowed increasingly higher-quality vi-sualizations of large graphs []. As a result, many open tools for static graph analysis andvisualization have been developed in the last decade. Among the best known we mentionWalrus [], Pajek [, ], Visone [], GUESS [], Networkbench [], NodeXL [], andTulip []. Studies about comparisons of diﬀerent tools have also been published recently[].

The interest in depicting the shape of online social networks [, ] and the avail-ability of live data streams from online social media motivated the development of toolsfor animated visualizations ofdynamic graphs[], inoﬄinecontexts, where temporalgraph evolution is known in advance, as well as in onlinescenarios, where the graphupdates are received in a streaming fashion []. Several tools supporting dynamics vi-sualization emerged, including GraphAEL [] (http://graphael.cs.arizona.edu/), GleamViz(www.gleamviz.org), Gephi [] (gephi.org), and GraphStream [] (graphstream-project.org). Despite static visualizations based on time-windows [], alluvial diagrams [], ormatrices [–] have been explored as solutions to capture the graph evolution, dynamicgraph drawing remains the technique that has attracted more interest in the research com-munity so far. Compared to static visualizations, dynamic animations present additionalchallenges: user studies have shown that they can be perceived as harder to parse visually,even though they have the potential to be more informative and engaging [].

(3)

that make the visualization more enjoyable, such as fadeout deletion of nodes. Graph

readabilityhas been measured in user studies in relation to several tasks [–]; the ex-perimental ﬁndings highlight the importance of visualization criteria such as minimizingbends and edge crossings and maximizing cluster separation in facilitating the viewer’sinterpretation and understanding of the graph. A general concept that has been studiedfor long in relation to the quality of dynamic graph visualization is themental map[–] that the viewer has of the graph structure. In practical terms, the placement of existingnodes and edges should change as little as possible when a change is made to the graph [],under the hypothesis that if the mental map is preserved the parsing of the visual informa-tion is faster and more accurate. More recent work [] has reappraised the importance ofthe mental map in the comprehension of a dynamic graph series, while identifying somecases in which it may help [, ] (e.g., memorability of the graph evolution, followinglong paths, recognition of recurrent patterns, tracking a large number of moving objects).More in general, there are several open fronts in empirical research in graph visualiza-tion to identify the impact of certain factors on the quality of the animavisualiza-tion (e.g., speed[], interactivity []). An extensive overview of this aspect has been conducted recentlyby Kriglsteinet al.[]. Methods to preserve the stability of nodes and the consistency ofthe network structure leveraging hierarchical organization on nodes have been proposed[–]. User studies have shown that hierarchical approaches that collapse several nodesin larger meta-nodes can improve graph readability in cases of high edge density []. Thegraph layout also has a signiﬁcant impact on the readability of graphs []. Some work hasbeen done to adapt spectral and force-directed graph layouts [] toincremental layouts

that recompute the position of nodes at timetbased on the previous positions at timet– minimizing displacement of vertices [, –] or to propose new “stress-minimization”strategies to map the changes in the graph [].

Although much exploration has been done in the visualization principles to achievehighly-readable animations, two aspects have been overlooked so far.

First, not many techniques to extract and visualize the most relevant information from

very largegraphs have been studied yet. Graph decomposition has been used in a staticcontext to increase the readability of the network by splitting it into modules to be visu-alized separately [], while sliding time-windows have been employed to discard oldernodes and edges in visualization of graph evolution []. A hierarchical organization ofnodes according to some authority or centrality measure allows to visualize the graph atdiﬀerent levels of details, eliminating the need to display all nodes and edges at once [].Some work has been done about interactive exploration by blending diﬀerent visualiza-tion paradigms [] and time-varying clustering []. Indices to measure the relevance ofevents in a dynamic graph at both node and community level have also been proposed[], even if they have not been applied to any graph animation task. Yet, none of thesetechniques has been tested on very large data and none of the modern visualization toolsprovide features for the detection of the most relevant components of a graph at a giventime. On the other hand, quantitative studies on the characterization of temporal net-works [–] have been conducted, but with no direct connection with the dynamicvisualization task.

(4)

carried out about information selection techniques for dynamic graph visualization, in-cluding solutions based on temporal decay of nodes and edges [], node clustering [],and centrality indices [, ].

3 Network ﬁltering

We introduce thefastvizalgorithm that takes in input a chronological stream of in-teractions between nodes (i.e., network edges) and converts it into a set of graph updatesthat account only for the most relevant part of the network. The algorithm has two stages:buﬀering of ﬁltered network and generation of diﬀerential updates for the visualization(see Figure ). The algorithm stores and visualizes the nodes with the highest strengths,i.e., the highest sum of weights of their connections.

3.1 Input

The data taken as input is an ordered chronological sequence of interactions betweennodes. The interactions can be either pairwise or cliques of interacting nodes. For instance,the following input:

ti,n, . . . ,nm,wi

represents the occurrence of interactions between nodesn, . . . ,nmof weightwiat epoch

timeti. Entries with more than two nodes are interpreted as interactions happening

be-tween each pair of members of the clique with the respective weight. Multiple interactions

(5)

between the same pair of nodes sum up by adding up their corresponding weights. Theadvantage of the clique-wise format over the pairwise format is that the size of input ﬁlesis smaller.

3.2 Filtering criterion

In the ﬁrst stage of the algorithm, at mostNbnodes with the highest strengths are saved

in the buﬀer together with the interactions among them. The strengthSi of a nodeiis

a sum of weights of all connections of that node, i.e.,Si=

jwij, wherewijis the weight

of an undirected connection between nodesiandj. Whenever a new node, which doesnot appear in the buﬀer yet, is read from the input, it replaces the node in the buﬀer withthe lowest value of the strength. If an incoming input involves a node that is already in thebuﬀer, then the strength of the node is increased by the weight of the incoming connection.To emphasize the most recent events and penalize stale ones, a forgetting mechanism thatdecreases the strengths of all nodes and weights of all edges is run periodically every timeperiodTfby multiplying their current values by a forgetting factor ≤Cf< . This process

leads to the removal of old inactive nodes having low strength and storage of old nodeswith fresh activity and high strength.

Note that the forgetting mechanism corresponds to a sliding time-window with expo-nential decay. The decay determines the weighting of the past interactions in the slid-ing window aggregation of a dynamic network. Standard rectangular slidslid-ing time-window aggregates all past events within the width Ttw of the time-window weighting

them equally. In contrast, infastvizand in the sliding time-window with an exponen-tial decay the weighting decreases exponenexponen-tially (see Figure ). (Under a set of assumptionsone can calculate how much time will a given node stay in the buﬀered network. Let usassume that at the timetnthe strength of a nodenisSn(tn), that this strength will not be

in-creased after timetn, that the next forgetting will happen inTftime, and that the strength

of the weakest buﬀered nodeSw<Sn(tn) is constant over time. Under these assumptions,

the nodenwill stay buﬀered for timet–tn>log(S_log(Cw/Sn(tn))

f) Tf.) Such exponential decay has

two advantages over a standard rectangular sliding time-window approach. First, it givesmore importance both to the most recent and to the oldest connections, while giving lessimportance to the middle-aged interactions. Second, it produces a dynamic network in

Figure 2 The aggregating curves of dynamic network ﬁltering methods.The aggregating curves for

fastviz(black line), rectangular sliding time-window (green dashed), and exponential time-window (bluedotted). The steps of thefastvizmethod correspond to consecutive multiplications by the forgettingfactorCf= 2/3 performed after each forgetting periodTf. The rectangular time-window width is set to

Ttw= 3Tf. The exponent of the exponentially decaying time-window corresponds to the forgetting factor of

(6)

which changes are smoother due to the balanced weighting of old and new connections.Finally, instead of using the sliding time-window with exponential decay, we introducedthe fastvizalgorithm to limit the computational complexity of network ﬁltering. Inprinciple, time-window methods do not introduce such a bound. We explore and conﬁrmthese points in the following subsections using real dynamic networks.

3.3 Filtering criterion versus rectangular and exponential sliding time-windows

Comparison of structural properties of networks produced with diﬀerent ﬁltering meth-ods is not straightforward. First, since the networks are dynamic, one needs to compare thestructural properties of the static snapshots of the networks produced by the two meth-ods at the same time. Second, parameters of the methmeth-ods, i.e., forgetting factorCf and

time-window widthTtw, inﬂuence the algorithms, so one needs to draw an equivalency

between them to compare the methods under the same conditions. A natural condition toconsider is the one of equal areas under the curves from Figure , representing the contri-bution of an interaction event to the representation of a node over time. Note that underthis condition a node with constant non-zero activity in time will have the same strengthin networks created with each method. Forfastviz, the areaAfvunder the aggregation

curve is equal to the sum of a geometric progression. Assuming an inﬁnite geometric pro-gression, we get the approximateAfv=Tf/( –Cf). The area under the aggregation curve

of the rectangular time-window is simplyAtw=Ttw. By demanding the areas to be equal,

we obtain the relation between the parameters of the two methods

Ttw=

 –Cf

. ()

In general, the forgetting period Tf is ﬁxed, therefore there is only one free parameter

controlling the ﬁltering, e.g., the forgetting factorCf, which we assign according to the

dy-namic network, i.e., the faster the network densiﬁes in time, the more aggressive forgettingwe use (see Appendix B for more details about the values of parameters). In the followingparagraphs, we analyze the dynamics of several structural properties of the networks pro-duced withfastviz, rectangular, and exponential sliding time-window methods havingequal aggregating areas.

To highlight the diﬀerences between the three ﬁltering methods, we apply them to tworeal dynamic networks from Twitter characterized by high changeability and measure thestructural properties of resulting networks (Figure ). The networks represent interactionsin Twitter during two widely popular events: the  Super Bowl and the announcementof Osama bin Laden’s death. Further description and properties of these datasets are pro-vided in the next section.

Due to this fact the computational complexity of sliding-time window methods in-creases in time, whereas it is bounded infastviz. Since network structural propertiessuch as average degree and clustering depend on the size of the network, we calculate theseproperties for the subgraphs of equal size, i.e., for theNbstrongest nodes of the full

net-work produced by each of the sliding time-window methods (Figures C-J). For simplicity,we refer to these subgraphs ofNbnodes as the buﬀered networks.

(7)

Figure 3 Structural properties of ﬁltered dynamic networks.Structural properties of ﬁltered dynamicnetworks representing user interactions surrounding the 2013 Super Bowl or the announcement of Osamabin Laden’s death. The values of the properties are plotted as a function of time for thefastvizﬁltering(black line), rectangular sliding time-window of matching width (green dashed), or exponential slidingtime-window (blue dotted). The following network properties are plotted from the left-most to the right-mostcolumn: the total number of nodesNb, the average degreeki, the global clustering coeﬃcientCg, the

average local clustering coeﬃcientci, and the degree assortativityr.

and I). We conclude that thefastvizﬁltering produces smoother transitions betweennetwork snapshots than rectangular sliding time-window. This property of our methodmay improve readability of visualizations of such dynamic networks.

Finally,fastvizcaptures persistent trends in the values of the properties by leveragingthe short-term and long-term node activity. For instance, it captures the trends in degree,clustering coeﬃcients, and assortativity that are less visible with the rectangular time-window, while they are well-visible with the exponential time-window (Figures C-F, I,and J). Note that high average degree obtained for networks produced with exponentialtime-window corresponds to the nodes that are active over a prolonged time-span, whoseactivity is aggregated over unbounded aggregation period, and the number of nodes isunbounded as well. On the contrary, rectangular sliding time-window shows the degreeaggregated over a ﬁnite time-window, whilefastvizlimits the number of tracked nodes,leading to lower reported average degree.

To measure the similarity of sets of nodes ﬁltered with diﬀerent methods we calculateJaccard similarity coeﬃcient. Speciﬁcally, we measure the Jaccard coeﬃcientJof the sets of

Nbstrongest nodes ﬁltered withfastvizand each of the time-window methods (Figures

(8)

Figure 4 Jaccard similarity of networks produced with various methods.Jaccard similarity coeﬃcientJbetween sets of nodes obtained withfastvizand rectangular sliding time-window (green dashed) orexponential sliding time-window (blue dotted). The nodes belong to either buﬀered or visualized networksrepresenting Twitter interactions during the Super Bowl or Osama bin Laden’s death. Speciﬁcally:(A-D)theJaccard coeﬃcient as a function of time;(E-F)the Jaccard coeﬃcient averaged over time as a function of thenumber of nodes in the visualized network.

3.4 Network updates for visualization

In the second stage, for the purpose of visualization, the algorithm selectsNv<Nbnodes

with the highest strength and creates a diﬀerential update to the visualized network con-sisting of these nodes and the connections between them. Each such diﬀerential updateis meant to be visualized in the resulting animation of the network, e.g., as a frame of amovie.

We compare the visualized networks generated by each of the ﬁltering methods. Eachof the visualized networks consists ofNv=  strongest nodes and all connections

exist-ing between them in the buﬀered network. The similarity of the nodes visualized by thefastvizand exponential time-window methods, measured as Jaccard coeﬃcientJ, is or close to  (Figures C and D). The visualized networks of the two methods are almostidentical. The structural properties of the networks created with the two methods yieldalmost the same values at each point in time (Figures A-J). This result is to be expected,since the forgetting mechanism offastvizcorresponds closely to the exponential decayof connection weights. The advantage of our method over exponential time-window con-sists of the limited computational complexity, which makes thefastvizﬁltering feasibleeven for the largest datasets of pairwise interactions. Naturally, the similarity between vi-sualized networks created with the two methods decreases with the size of the vivi-sualizednetworkNv(Figures E and F). More speciﬁcally, the similarity decreases with the ratio

Nv/Nb, as we keep in our experiments a constant value ofNb= ,. Hence, to visualize

larger networks one can choose to buﬀer more nodes.

(9)

(Fig-Figure 5 Structural properties of visualized dynamic networks.Structural properties of the visualizeddynamic networks representing Twitter interactions during the Super Bowl or Osama bin Laden’s death. Thevalues of the properties are plotted as a function of time for thefastvizﬁltering (black line), rectangularsliding time-window (green dashed), or exponential sliding time-window (blue dotted). The followingnetwork properties are plotted from the left-most to the right-most column: the total number of visualizednodesNv, the average degreeki, the global clustering coeﬃcientCg, the average local clustering coeﬃcient

ci, and the degree assortativityr.

ures C-F vs. Figures C-F). We conclude that the structure of visualized network diﬀerssigniﬁcantly from the structure of buﬀered network, although this diﬀerence is smaller forfastvizthan for rectangular sliding time-window.

3.5 Computational complexity

The computational complexity of the buﬀering stage of the algorithm isO(ENb), where

Eis the total number of the pairwise interactions read (the cliques are made of multiplepairwise interactions). Each time when an interaction includes a node that is not yet storedin the buﬀered graph the adjacency matrix of the graph needs to be updated. Speciﬁcally,the weakest node is replaced with the new node, soNbentries in the adjacency matrix are

zeroed, which corresponds toO(ENb). The memory usage scales asO(N_b), accounting

for the adjacency matrix of the buﬀered graph. (For certain real dynamic networks, thebuﬀered graph is sparse. In such cases, one can propose more optimized implementationsoffastviz. Here, we focus on limiting the time complexity so that it scales linearly withthe number of interactions and describe the generic implementation that achieves it.) Thesecond, update-generating, stage has computational complexity ofO(UNblog(Nb)), where

U is the total number of diﬀerential updates, which is a fraction ofEand commonly itis many times smaller thanE. (Typically, a large number of interactions is aggregated tocreate one diﬀerential update to the visualized network. In the examples that we show inthe next section, one update aggregates from  to  million interactions. Therefore,U

is from  to  million times smaller thanE.) This term corresponds to the fact that thestrengths of all buﬀered nodes are sorted each time an update to the visualized networkis prepared. The memory trace of this stage is very low and scales asO(Nv). We conclude

(10)

4 Visualization

In this section, we describe animations of exemplary dynamic graphs ﬁltered withfastviz. Principally, the sequence of graph updates can be converted into image frames

that are combined into a movie depicting the network evolution. We implement this vi-sualizing technique and create with it the network animations described below. Alterna-tively, the updates can be fed directly to the Gephi Streaming API to produce an interactivevisualization of the evolving network. The Gephi Streaming API allows graph streamingfrom a client to a server where Gephi is running. In such a case, the graphs are streameddirectly from our ﬁltering system to the Gephi server without any third-party modules.In Appendix A, we introduce implementation details of both approaches. Finally, corre-sponding animations can be created by other visualization tools fed with thefastvizupdates; we highly encourage their development.

4.1 Datasets

We test thefastvizﬁltering and our visualizing technique on four datasets very diﬀer-ent from each other in nature, size, and time span (see Table ). The datasets and moviesproduced from each dataset are described in the following subsections (see Figure ). InAppendix A, we present the source code of both tools with their documentation, four dy-namic graph datasets, and instructions to recreate the visualizations introduced in this

Table 1 Statistics of the experimental datasets

Dataset Time period Nodes Edges Nodes drawn

Super Bowl 2 days 49k 1.1M 577

Osama bin Laden’s death 2 hours 95k 198k 279

IMDB movie keywords 6 years 1k 220M 301

US patent title words 34 years 414k 90M 106

Figure 6 Screenshots of the generated movies.Screenshots of the movies generated from the datasets:

(11)

section. In Appendix B, we provide and describe the values of the parameters of the algo-rithm and the visualizing tool used for these datasets.

4.2 Twitter

We use data obtained through the Twittergardenhosestreaming API, which covers around% of the tweet volume. We focus on two events: the announcement of Osama bin Laden’sdeath and the  Super Bowl. We consider user mentions and hashtags as entities andtheir co-occurrence in the same tweet as interactions between them.

The ﬁrst video (Figure A) shows how the anticipation for the Super Bowl steadily growson early Sunday morning and afternoon, and how it explodes when the game is about tostart. Hashtags related to #commercials and concerts (e.g., #beyonce) are evident. Later,the impact of the #blackout is clearly visible. The interest about the event drops rapidlyafter the game is over and stays low during the next day.

The video about the announcement of Osama bin Laden’s death (Figure B) shows theinitial burst caused by @keithurbahn and how the breaking news was spread by users@brianstelter and @jacksonjk. The video shows that the news appears later via #cnn andis announced by @obama. The breaking of this event on Twitter is described in detail byLotan [].

4.3 IMDB movies

We use a dataset from IMDB of all movies, their year of release and all the keywords as-signed to them (from imdb.to/SZD). We create a network of keywords that are asas-signedto the same movies. Our video (Figure C) shows interesting evolution of the keywordsfrom “character-name-in-title” and “based-on-novel” (ﬁrst half of th century), through“martial-arts” (s and s) to “independent-ﬁlm” (s and later), “anime” and “surreal-ism” (s).

4.4 Patents

We use a set of US patents issued between  and  []. We analyze the appearanceof words in their titles. Whenever two or more words appear in a title of a patent we createa link between them at the moment when the patent was issued. To improve readability weﬁlter out stopwords and the generic frequent words: “method,” “device” and “apparatus.”Our video (Figure D) shows that at the beginning of the period techniques related to “en-gine” and “combustion” were popular, and later start to cluster together with “motor” and“vehicle.” Another cluster is sparked by patents about “magnetic” “recording” and “image”“processing.” It merges with a cluster of words related to “semiconductor” and “liquid”“crystal” to form the largest cluster of connected keywords at the end of the period.

4.5 Other visualizations

(12)

4.6 Summary

The datasets in our case studies are fairly diverse in topicality, time span, and size, as shownin Table . Nevertheless, our method is able to narrow down the visualization to meaning-ful small subgraphs with less than  distinct nodes in all cases. The high performance ofthe algorithm makes it viable for real-time visualizations of live and large data streams. Ona desktop machine the algorithm producing diﬀerential updates of the network took sev-eral minutes to ﬁnish for the US patents and less than two minutes for the other datasets.Given such a performance, it is possible to visualize in real-time highly popular eventssuch as the Super Bowl, which produced up to , tweets per second.

5 Conclusions

Tools for dynamic graph visualization developed so far do not provide specialized ways todynamically select the most important portions of large evolving graphs. We contributeto ﬁlling this gap by proposing an algorithm to ﬁlter nodes and edges that best repre-sent the network structure at a given time. Our method captures trends and smoothensthe dynamics of structural properties of weighted networks by leveraging the short-termand long-term node activity. Furthermore, our ﬁltering method uses limited memory andprocessor time making it viable for large live data streams. We implemented our ﬁlteringalgorithm in open source tools that take in input a stream of interaction data and out-put a movie of the network evolution or a live Gephi animation. As future work, we wishto improve our algorithm by means of further optimization and to enhance the tools byproviding a standalone module for live visualization of graph evolution.

Appendix A: Implementation details and source code

We have implemented two independent tools described in the manuscript. The ﬁrst toolis thefastvizalgorithm. The second tool converts the sequence of updates into imageframes that are combined into a movie depicting the network evolution. We release thesource code of both tools (see the project website github.com/WICI/fastviz). Here, wedescribe the two tools in more detail.

The ﬁrst tool is the fastvizalgorithm. It takes in input a chronological stream ofinteractions between nodes and converts it into a set of graph updates that account onlyfor the most relevant part of the network in the JSON format. In the network ﬁltering stage,the algorithm stores a buﬀered network of sizeNb, limiting the computational complexity

and memory usage of the algorithm. In the second stage, for the purpose of visualization,the algorithm selectsNv<Nbnodes with the highest strength and all edges between these

nodes with the highest strength and all edges between these nodes that have weight abovea certain thresholdwmin_{. The subgraph induced by the}_N

v nodes is compared with the

subgraph in the previous state and a diﬀerential update is created. The updates are createdper every time interval that is determined with the time contraction parameterTc. A value

(13)

of nodes and edges. We also introduced a new type of object to deal with labels on thescreen, for example, to write the date and time on the screen.

The second tool converts the sequence of updates into image frames that are combinedinto a movie depicting the network evolution. To this end, the sequence of updates pro-duced by the ﬁltering algorithm is fed to a python module that builds a representation of a

dynamic graph, namely an object that handles each of the updates and reﬂects the changesto its current structure. The transition between the structural states of the graph deter-mined by the received updates is depicted by a sequence of image frames. Each diﬀerentialupdate correspond to one visualization frame, i.e., one frame of an animation. In its ini-tial state, the nodes in the network are arranged according to the Fruchterman Reingoldgraph layout algorithm []. The choice of the layout is arbitrary and other layouts canbe used and compared. However, due to the focus of this study on the ﬁltering method,rather than the quality of the visualization, we do not explore any other layout algorithms.For each new incoming event, a new layout is computed by runningN iterations of thelayout algorithm, using the previous layout as a seed. Intermediate layouts are producedat each iteration of the algorithm. Every intermediate layout is converted to a png framethat is combined through themencodertool [] to produce a movie that shows a smoothtransition between diﬀerent states. The movie is encoded with the frequency of  framesper second. To avoid nodes and edges to appear or disappear abruptly in the movie, weuse animations that smoothly collapse dying nodes and expand new ones. A conﬁgura-tion ﬁle allows to modify the default movie appearance (e.g, resoluconﬁgura-tion, colors) and layoutparameters (see the project website).

We release the source code of both tools with the documentation under the GNU Gen-eral Public License (see the project website github.com/WICI/fastviz). Together with thetools we release the datasets used in this paper and instructions on how to recreate all theexamples of animations presented in this manuscript. Additionally, the updates createdwithfastvizcan be fed directly to the Gephi Streaming API to produce an interactivevisualization of the evolving network. Respective instructions can be found at the websiteof the project.

Appendix B: Algorithm parameters

The exact behavior of thefastvizﬁltering depends on the parameters introduced in themanuscript. We present the values of the parameters used in the case studies and their de-fault values in Table . The dede-fault values of the parameters are meant to be universal andgive reasonably good visualizations for most datasets. Overall, three parameters requireadjustment to the input data, namely time contractionTc, edge width thresholdwmin, and

Table 2 Values of the parameters of thefastvizalgorithm for the introduced case studies

Dataset Tc wmin Cf

Super Bowl 3,600 10 0.8

Osama bin Laden’s death 500 0.95 0.9

IMDB movie keywords 3,600×24×1,095 10 0.75

US patent title words 3,600×24×400 20 0.65

Default 3,600 0.95 0.75

(14)

forgetting factorCf. We provide exemplary values of these parameters for the introduced

datasets in Table  and describe these parameters in detail below.

The time contractionTccorresponds to the number of seconds in data time scale that

are going to be contracted to one second of the visualization. The larger the time span ofthe dataset, the larger should be this parameter in order to keep the length of visualizationﬁxed. For instance, if the timespan of the network is  hours, and one wants to see itsevolution in a -second-long animation, thenTcshould be set to ,. It is crucial to

provide a desired value for this parameter, because providing a value that is too large willcreate just a few network updates and a very short animation, while providing a value thatis too small will create a large number of updates making the JSON ﬁle very big and theanimation very long.

The minimal edge weightwmin _{is a threshold above which edges appear in the}

visual-ization. Low value of this parameter may results in many edges of low weight appearingin the animation, while high value of the parameter may prevent any edges from beingvisualized. In case a user does not have any information about the visualized network,we recommend leaving this parameter at its default value of ., which will visualize alledges of standard weight  or higher.

The forgetting factorCfdecides how fast older interactions among nodes are forgotten

in comparison with more recent interactions. This parameter can be tuned individuallyfor the purpose of the visualization. In general, the faster the network densiﬁes in time,the more aggressive should be the forgetting, i.e., the lower should be the forgetting factor

Cf. In general, keeping the default value of this parameter is safe, although its adjustment

will improve the quality of visualization.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors designed the research. PAG wrote the source code of the algorithm and LMA wrote the source code of thevisualization tool. All authors deployed the tools. PAG and LMA analyzed the data. All authors wrote, reviewed andapproved the manuscript.

Author details

1_{Max Planck Institute for Software Systems, Saarland University, Saarbrucken, Germany.}2_{Institute for Cross-Disciplinary}

Physics and Complex Systems, University of Balearic Islands, Palma de Mallorca, Spain. 3_{Yahoo! Research, Barcelona, Spain.}4_{Center for Complex Networks and Systems Research, Indiana University, Bloomington, USA.}

Acknowledgements

We are grateful to André Panisson for inspiration and to Jacob Ratkiewicz, Bruno Gonçalves, Mark Meiss, and othermembers of the Truthy project (cnets.indiana.edu/groups/nan/truthy) for helpful discussions and suggestions. PAGacknowledges funding from the JAE-Predoc program of CSIC and partial ﬁnancial support from the MINECO underproject MODASS (FIS2011-24785). This work is supported in part by the NSF (ICES award CCF-1101743) and the James S.McDonnell Foundation and by the SocialSensor FP7 project, partially funded by the EC under contract number 287975.

Received: 7 August 2013 Accepted: 17 September 2014

References

1. Erten C, Harding P, Kobourov S, Wampler K, Yee G (2004) GraphAEL: graph animations with evolving layouts. In: LiottaG (ed) Graph drawing. Lecture notes in computer science, vol 2912. Springer, Berlin

2. Broeck W, Gioannini C, Goncalves B, Quaggiotto M, Colizza V, Vespignani A (2011) The GLEaMviz computational tool, apublicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infect Dis 11:373. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks.

In: ICWSM’09: proceedings of the international AAAI conference on weblogs and social media. AAAI Press, MenloPark

4. Dutot A, Guinand F, Olivier D, Pigné Y (2007) GraphStream: a tool for bridging the gap between complex systems anddynamic graphs. In: EPNACS: emergent properties in natural and artiﬁcial complex systems

(15)

6. Brandes U, Fleischer D, Puppe T (2007) Dynamic spectral layout with an application to small worlds. J GraphAlgorithms Appl 11(2):325-343

7. Brandes U, Indlekofer N, Mader M (2012) Visualization methods for longitudinal social networks and stochasticactor-oriented modeling. Soc Netw 34(3):291-308

8. Kamada T (1989) Visualizing abstract objects and relations. World Scientiﬁc, Singapore

9. Tollis IG, Di Battista G, Eades P, Tamassia R (1999) Graph drawing: algorithms for the visualization of graphs. PrenticeHall, New York

10. Kamada T, Kawai S (1989) An algorithm for drawing general undirected graphs. Inf Process Lett 31:7-1511. Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp

21(11):1129-1164. doi:10.1002/spe.4380211102

12. Gansner ER, North SC (1998) Improved force-directed layouts. In: GD’98: proceedings of the 6th internationalsymposium on graph drawing. Springer, London

13. Hu YF (2005) Eﬃcient and high quality force-directed graph drawing. The Mathematica J 10:37-71

14. Herman I, Melançon G, Marshall MS (2000) Graph visualization and navigation in information visualization: a survey.IEEE Trans Vis Comput Graph 6:24-43. doi:10.1109/2945.841119

15. Munzner TM (2000) Interactive visualization of large graphs and networks. PhD thesis, Stanford University, Stanford16. Batagelj V, Mrvar A (2002) Pajek—analysis and visualization of large networks. In: Mutzel P, Junger M, Leipert S (eds)

Graph drawing. Lecture notes in computer science, vol 2265. Springer, Berlin

17. De Nooy W, Mrvar A, Batagelj V (2005) Exploratory social network analysis with Pajek. Cambridge University Press,Cambridge

18. Brandes U, Wagner D (2003) Visone—analysis and visualization of social networks. In: Graph drawing software.Springer, Berlin

19. Adar E (2006) GUESS: a language and interface for graph exploration. In: Proceedings of the SIGCHI conference onhuman factors in computing systems, CHI’06. ACM, New York

20. Network workbench tool, Indiana University, Northeastern University, and University of Michigan.http://nwb.cns.iu.edu. Accessed 13 Mar 2014

21. Smith MA, Shneiderman B, Milic-Frayling N, Mendes Rodrigues E, Barash V, Dunne C, Capone T, Perer A, Gleave E(2009) Analyzing (social media) networks with NodeXL. In: C&T’09: proceedings of the fourth internationalconference on communities and technologies. ACM, New York

22. Auber D, Archambault D, Lambert RBA, Mathiaut M, Mary P, Delest M, Dubois J, Melancon G (2012) The tulip 3framework: a scalable software library for information visualization applications based on relational data. Technicalreport 7860, INRIA

23. Ahn JW, Taieb-Maimon M, Sopan A, Plaisant C, Shneiderman B (2011) Temporal visualization of social networkdynamics: prototypes for nation of neighbors. In: SBP’11: proceedings of the 4th international conference on socialcomputing, behavioral-cultural modeling and prediction. Springer, Berlin

24. Heer J, Vizster BD (2005) Visualizing online social networks. In: InfoVis’05: proceedings of the IEEE symposium oninformation visualization. IEEE Computer Society, Washington

25. Falkowski T, Bartelheimer J, Spiliopoulou M (2006) Mining and visualizing the evolution of subgroups in socialnetworks. In: WI’06: proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence. IEEEComputer Society, Washington

26. Demetrescu C, Eppstein D, Galil Z, Italiano GF (2010) Dynamic graph algorithms. In: Atallah MJ, Blanton M (eds)Algorithms and theory of computation handbook. Chapman & Hall/CRC, Boca Raton

27. Rosvall M, Bergstrom CT (2010) Mapping change in large networks. PLoS ONE 5:e8694.doi:10.1371/journal.pone.0008694

28. Yi JS, Elmqvist N Lee S (2010) TimeMatrix: analyzing temporal social networks using interactive matrix-basedvisualizations. Int J Hum-Comput Interact 26:11-12

29. Stein K, Wegener R, Schlieder C (2010) Pixel-oriented visualization of change in social networks. In: ASONAM’10:proceedings of the international conference on advances in social networks analysis and mining. IEEE ComputerSociety, Washington

30. Gove R, Gramsky N, Kirby R, Sefer E, Sopan A, Dunne C, Shneiderman B, Taieb-Maimon M (2011) NetVisia: heat map &matrix visualization of dynamic social network statistics & content. In: SocialCom’11: proceedings of the 3rd IEEEinternational conference on social computing. IEEE Computer Society, Washington

31. Farrugia M, Quigley A (2011) Eﬀective temporal graph layout: a comparative study of animation versus static displaymethods. Inf Vis 10:47-64. doi:10.1057/ivs.2010.10

32. Ramalingam G, Reps T (1996) On the computational complexity of dynamic graph problems. Theor Comput Sci158:233-277

33. Henzinger MR, King V (1999) Randomized fully dynamic graph algorithms with polylogarithmic time per operation.J ACM 46:502-516

34. Friedrich C, Houle ME (2002) Graph drawing in motion II. In: Mutzel P, Junger M, Leipert S (eds) Graph drawing.Lecture notes in computer science, vol 2265. Springer, Berlin, pp 220-231. doi:10.1007/3-540-45848-4_18

35. Purchase HC (1997) Which aesthetic has the greatest eﬀect on human understanding? In: GD’97: proceedings of the5th international symposium on graph drawing. Springer, London, pp 248-261.

http://dl.acm.org/citation.cfm?id=647549.728779

36. Huang W, Hong SH, Eades P (2006) How people read sociograms: a questionnaire study. In: APVis’06: proceedings ofthe 2006 Asia-Paciﬁc symposium on information visualisation—volume 60. Australian Computer Society,Darlinghurst, pp 199-206. http://dl.acm.org/citation.cfm?id=1151903.1151932

37. Huang W, Eades P, Hong SH (2008) Beyond time and error: a cognitive approach to the evaluation of graph drawings.In: Proceedings of the 2008 workshop on BEyond time and errors: novel evaLuation methods for InformationVisualization, BELIV’08, vol 3. ACM, New York, pp 1-8. http://doi.acm.org/10.1145/1377966.1377970

38. Eades PWL, Misue K, Sugiyama K (1991) Preserving the mental map of a diagram. In: Compugraphics, pp 24-3339. Misue K, Eades P, Lai W, Sugiyama K (1995) Layout adjustment and the mental map. J Vis Lang Comput 6(2):183-21040. Freire M, Rodríguez P (2006) Preserving the mental map in interactive graph interfaces. In: AVI’06: proceedings of the

(16)

41. Coleman MK, Parker DS (1996) Aesthetics-based graph layout for human consumption. Softw Pract Exp26(12):1415-1438. doi:10.1002/(SICI)1097-024X(199612)26:12<1415::AID-SPE69>3.0.CO;2-P

42. Archambault D, Purchase HC (2013) The “Map” in the mental map: experimental results in dynamic graph drawing.Int J Hum-Comput Stud 71(11):1044-1055. http://www.sciencedirect.com/science/article/pii/S107158191300102X43. Archambault D, Purchase H (2012) The mental map and memorability in dynamic graphs. In: 2012 IEEE Paciﬁc

visualization symposium (PaciﬁcVis), pp 89-96

44. Archambault D, Purchase H (2013) Mental map preservation helps user orientation in dynamic graphs. In: Didimo W,Patrignani M (eds) Graph drawing. Lecture notes in computer science, vol 7704. Springer, Berlin, pp 475-486.doi:10.1007/978-3-642-36763-2_42

45. Ghani S, Elmqvist N, Yi JS (2012) Perception of animated node-link diagrams for dynamic graphs. Comput GraphForum 31(3):1205-1214. doi:10.1111/j.1467-8659.2012.03113.x

46. Archambault D, Munzner T, Auber D (2008) GrouseFlocks: steerable exploration of graph hierarchy space. IEEE TransVis Comput Graph 14(4):900-913. doi:10.1109/TVCG.2008.34

47. Kriglstein S, Pohl M, Stachl C (2012) Animation for time-oriented data: an overview of empirical research. In: 16thinternational conference on information visualisation, pp 30-35

48. North SC (1996) Incremental layout in DynaDAG. In: GD’95: proceedings of the symposium on graph drawing.Springer, London

49. North SC, Woodhull G (2002) Online hierarchical graph drawing. In: GD’01: revised papers from the 9th internationalsymposium on graph drawing. Springer, London

50. Archambault D, Munzner T, Auber D (2007) TopoLayout: multilevel graph layout by topological features. IEEE TransVis Comput Graph 13(2):305-317

51. Archambault D (2009) Structural diﬀerences between two graphs through hierarchies. In: GI’09: proceedings ofgraphics interface. Canadian Information Processing Society, Toronto

52. Archambault D, Purchase H, Pinaud B (2010) The readability of path-preserving clusterings of graphs. Comput GraphForum 29(3):1173-1182. http://hal.inria.fr/inria-00471432

53. Blythe J, McGrath C, Krackhardt D (1996) The eﬀect of graph layout on inference from social network data. In: GD’95:proceedings of the symposium on graph drawing. Springer, London, pp 40-51.

http://dl.acm.org/citation.cfm?id=647547.728581

54. Brandes U (2001) Drawing on physical analogies. In: Kaufmann M, Wagner D (eds) Drawing graphs. Springer, London55. Branke J (2001) Dynamic graph drawing. In: Kaufmann M, Wagner D (eds) Drawing graphs. Springer, London56. Diehl S, Gorg C (2002) Graphs, they are changing. In: Goodrich M, Kobourov S (eds) Graph drawing. Lecture notes in

computer science, vol 2528. Springer, Berlin

57. Frishman Y, Tal A (2008) Online dynamic graph drawing. IEEE Trans Vis Comput Graph 14:727-740

58. Rodrigues EM, Milic-Frayling N, Smith M, Shneiderman B, Hansen D (2011) Group-in-a-box layout for multi-facetedanalysis of communities. In: SocialCom’11: proceedings of the 3rd IEEE international conference on social computing.IEEE Computer Society, Washington

59. Dynes SBC, Gloor PA, Gloor PA, Gloor PA, Laubacher R, Laubacher R, Zhao Y, Zhao Y, Dynes S (2004) Temporalvisualization and analysis of social networks. In: NAACSOS’04: conference of North American Association forComputational Social and Organizational Science

60. Kumar G, Garland M (2006) Visual exploration of complex time-varying graphs. IEEE Trans Vis Comput Graph12:805-812

61. Hadlak S, Schulz HJ, Schumann H (2011) In situ exploration of large dynamic networks. IEEE Trans Vis Comput Graph17(12):2334-2343. http://dblp.uni-trier.de/db/journals/tvcg/tvcg17.html#HadlakSS11

62. Sallaberry A, Muelder C, Ma KL (2013) Clustering, visualizing, and navigating for large dynamic graphs. In: GD’12:proceedings of the 20th international conference on graph drawing. Springer, Berlin, pp 487-498.

http://dx.doi.org/10.1007/978-3-642-36763-2_43

63. Asur S, Parthasarathy S, Ucar D (2007) An event-based framework for characterizing the evolutionary behavior ofinteraction graphs. In: KDD’07: proceedings of the 13th ACM SIGKDD international conference on knowledgediscovery and data mining. ACM, New York

64. Clauset AEN (2007) Persistence and periodicity in a dynamic proximity network. In: DIMACS workshop oncomputational methods for dynamic interaction networks

65. Cattuto C, Van den Broeck W, Barrat A, Colizza V, Pinton JF, Vespignani A (2010) Dynamics of person-to-personinteractions from distributed RFID sensor networks. PLoS ONE 5(7):e11596. doi:10.1371/journal.pone.001159666. Krings G, Karsai M, Bernhardsson S, Blondel V, Saramaki J (2012) Eﬀects of time window size and placement on the

structure of an aggregated communication network. EPJ Data Sci 1:4.http://www.epjdatascience.com/content/1/1/4

67. Lotan G (2011) Breaking bin Laden: a closer look.

http://blog.socialﬂow.com/post/5454638896/breaking-bin-laden-a-closer-look. Accessed 13 Mar 201468. LaRowe G, Ambre S, Burgoon J, Ke W, Börner K (2009) The scholarly database and its utility for scientometrics

research. Scientometrics 79(2):219-234

69. Graph Streaming API documentation. http://wiki.gephi.org/index.php/Speciﬁcation_-_GSoC_Graph_Streaming_API.Accessed 13 Mar 2014

70. Mencoder tool documentation. http://www.mplayerhq.hu/design7/documentation.html. Accessed 13 Mar 2014

doi:10.1140/epjds/s13688-014-0027-8