ScaDS Big Data Industry Forum

Overview | Agenda | Blending Tools and Data in KNIME | Linked Big Data Analytics using IBM System G

The ScaDS Big Data Industry Forum brings big data researchers and practitioners from the industry together to speak about their use cases, innovative approaches and current challenges in their development.

Excellent speakers, mainly experts from companies with strong research efforts and experience in big data will give deep insights into their view onto current big data trends and challenges. Furthermore we were able to win some outstanding researchers with strong focus on usage of big data in business.

Big data and data science are discussed as part of the next disruptive innovation for business and society. With this event the participants gain unique insights from practitioners how big data applications revolutionize business models and production cycles we know today.

The big data forum addresses business application developers and consultants as well as people from the scientific community. In the course of the forum there will be much room for discussion and exchange.

The Big Data Forum is organised by the Competence Center for Scalable Data Services and Solutions ScaDS Dresden/Leipzig. It is one of two leading big data research centers of excellence in Germany and works on topics like data quality, knowledge extraction or visualization in five application sciences: Life Science, Digital Humanities, Material Sciences, Environmental and Transport Sciences, and Business Data.

Agenda

09:00 – 09:05 Opening (Prof. Dr. Bogdan Franczyk, Leipzig University)
09:05 – 09:15 Introduction Competence Center for Scalable Data Services and Solutions (Prof. Erhard Rahm, Leipzig University)
09:15 – 10:30 Blending Tools and Data in KNIME (Dr. Björn Lohrmann, KNIME)
10:30 – 11:00 Coffee break
11:00 – 12:30 Quo Vadis Research and Higher Education ICT Landscapes (Markus Zappolino, EMC)
12:30 – 13:30 Lunch break
13:30 – 14:30 Monitoring the Cloud (Dr. Wojtek Kozaczynski, Microsoft)
14:30 – 14:45 Coffee break
14:45 – 16:15 System G (Frederick Ho, IBM)

Blending Tools and Data in KNIME

Björn Lohrmann, KNIME
From .csv, R, and Python to Spark, MLlib and Hive

Abstract: The open source KNIME Analytics Platform allows blending of many data sources and tools within a single workflow. In this talk, I will walk through a real world use case and build one integrative workflow to read data from text files and Hive, and integrate, transform, and aggregate the data locally and on Hadoop. I will then demo how to control a parameter sweep using Spark and how models can be trained locally (with native KNIME nodes and Python/R) and directly on Hadoop using MLlib. The final model is deployed using KNIME workflows, web services, and Spark.

Biography: Björn Lohrmann received his Ph.D. in Computer Science from Technische Universität Berlin in 2016. From June 2010 to August 2015 he has been a research associate at Technische Universität Berlin. Since then, he has been working as a Sr. Software Developer at KNIME.com, where he integrates KNIME Analytics Platform with various systems of the Hadoop ecosystem. His research interests include distributed systems, including Big Data systems, such as engines for batch and and stream analytics.


Linked Big Data Analytics using IBM System G

Fred Ho
Program Director, Business Development
IBM Research (Almaden)

Abstract: In the Big Data era, data are linked and form large graphs. But most traditional IT systems were designed for processing independent data, while analyses are mostly done in independent scenarios. Processing connected data has been a big challenge for Big Data Analytics, which requires both the traditional big data platforms for data processes that are easily parallelized and the novel graph computing platforms for data that are linked.
IBM System G, develped at IBM’s Watson Research Center, is a complete set of databases, visualization, analytics and middleware tools to support graph applications and a variety of graph-based solutions. In this talk, I will discuss the technology & architecture of System G, with use cases that are being considered for commerical and government applications. I will then demo a couple of such scenarios.

Biography: Frederick Ho received his B.S. in Computer Engineering from University of Illinois-Urbana Champaign and his M.S. in Electrical & Computer Engineering from Univeristy of California at Santa Barbara. With 30+ years in the computer industry, he has held technical and management positions at Tandem Computers, Sun Microsystems, Informix Software and IBM. At IBM, he was the Chief Technologist for Informix Data Warehousing & Analytics. Currently, he is Research Program Director focusing on Cognitive solutions including Financial Fraud, Compliance and Visual Comprehension.