With a wide circulation of 180 million weblogs worldwide, weblogs with good reason are one of the killer applications of the worldwide web. It was already shown on several occasions that it can be highly meaningful for individuals, institutions or even governments to find ways and measures to extract information out of the blogosphere.
However, it is increasingly difficult - if not impossible - for the average internet user and weblog enthusiast to grasp the logosphere’s complexity as a whole, due to thousands of new weblogs and an almost uncountable number of new posts adding up to the before-entioned collective on a daily basis.
Therefore, mining, analyzing, modeling and presenting this immense data collection is of central interest. This could enable the user to detect technical trends, political atmospheric pictures or news articles about a specific topic.
The basis of the Blog-Intelligence project is the big amount of data provided by all weblogs in the world. These data is gathered in the past and in the future by an intelligent crawler. Blog-Intelligence already provides some basic analysis functionality for the crawled data. Through the improvement of the crawler and the consequent growing amount of data, the analysis gets into big performance issues.
Since these performance problems, the analyses are only calculated in a weekly manner to reduce the run time of the analyses algorithms. Therefore the up-to-dateness of the results is not given any more and the web portal is only able to show already deprecated results.
Therefore SAP HANA offers totally new opportunities. The fast execution of the analysis algorithms provides completely new and better interaction with the system for the end-user. Beside the advantage of exploring the blogosphere in real time, it is possible to provide analyses for the end-user calculated separately for each user with his interests. Furthermore, former time-consuming text and graph analysis algorithm can now get integrated into our framework because SAP HANA offers fast variants of these algorithms. This opens new perspectives onto the data and the blogosphere for the user.
For example, it is now possible to figure out, how and what is discussed about products or companies inside the blogosphere. Traditional providers limit these analyses to the biggest blogs worldwide. With Blog-Intelligence and SAP HANA it gets possible to calculate analyses over all weblogs worldwide.
The result of the long-term research initiative presented here is Blog-Intelligence. It is a blog analysis framework, integrated into a web portal (http://www.blog-intelligence.com/), to make these findings available in an appropriate format to anyone interested. The portal has by now reached a mature functionality, however, requires ongoing optimization efforts in any of its three layers (see figure 1): data extraction (reconfiguration of existing crawler framework), data analysis (optimization of the portal’s six main information services) and data provision (visualization of key information services and ongoing web development for portal).
Future SOC Ressources
The HPI Future SOC Lab is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard- and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores.
Blog-Intelligence uses several hard- and software components located at the HPI Future SOC Lab.
- Philipp Berger, Patrick Hennig, Justus Bross, Christoph Meinel Mapping the Blogosphere — Towards a Universal and Scalable Blog-Crawler, in: Third IEEE International Conference on Social Computing (SocialCom 2011), IEEE CS, MIT, Boston, USA , 10, 2011. ISBN: 978-0-7695-4578-3.
- Justus Bross, Patrick Schilf, Maximilian Jenders, Christoph Meinel Visualizing the Blogosphere with BlogConnect, in: Third IEEE International Conference on Social Computing (SocialCom 2011), IEEE CS, MIT, Boston, USA , 10, 2011. ISBN: 978-0-7695-4578-3.
- Justus Bross, Keven Richly, Matthias Kohnen, Christoph Meinel Identifying the top-dogs of the blogosphere, in:SOCIAL NETWORK ANALYSIS AND MINING Volume 2, Number 1, 53-67, DOI: 10.1007/s13278-011-0027-7
- J. Bross, K. Richly, P. Schilf, C. Meinel, Social Physics of the Blogosphere: Capturing, Analyzing and Presenting Interdependencies of Partial Blogospheres, in: “From Sociology to Computing in Social Networks“ in: “Theory, Foundations and Applications Series: Lecture Notes in Social Networks, Vol. 1″, Memon, Nasrullah; Alhajj, Reda (Eds.), ISBN: 978-3-7091-0293-0, Springer: NewYork/Wien, 2010
- J. Bross, P. Hennig, P. Berger, C. Meinel, Feed-Crawler Enhancement for Blogosphere-Mapping, International Journal of Advanced Computer Science and Applications IJACSA, Vol. 1, No.2, US.ISSN: 2156-5570 (Online), August 2010
- J. Bross, M. Quasthoff, P. Berger, P. Hennig, C. Meinel, Mapping the blogosphere with RSS-feeds, 24th IEEE International Conference on Advanced Information Networking and Applications (AINA-2010) – Perth, Australia, 20-23 April 2010
- J. Bross, P. Schilf, C. Meinel, Visualizing blog archives to explore content- and context-related interdependencies, 2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI’10), Toronto, Kanada, 2010