IDEAS 13 Final Program






 Foreword

Expressed in human terms, database systems, now over a half century old, have attained maturity. During this time, the database community has developed a number of models in an effort to improve the handling of larger and more varied data. For example, the relational model, though slow to be adopted, has come to acquire wide acceptance, withstanding incursions from the object model, in part by adopting and incorporating some of its concepts, thus offering support for both object features and unstructured data.

With the introduction and rapid proliferation of the world wide web, the personal computer, supported by the graphical interface first developed by Netscape, became the standard way of accessing information. As the primary mode of web access, the desktop computer has, in its turn, given way to the laptop, tablet and most notably, the mobile phone, which has evolved from the relatively uncommon briefcase-sized device to the ubiquitous handheld appendage of the young and not so young alike.

Despite their infancy these web-enabled devices have already produced such an enormous amount of dig ital data that no human being could possibly peruse it all in a single lifetime. With this advent of 'Big Data' the database community has arrived at a crucial turning point where the encounter between scientific endeavor and social transformation can no long er be underestimated or ignored.

The database community is increasingly aware of the opportunities and difficulties issuing from these novel developments as evidenced, for example, by the inclusion of a position paper on this very topic at this year's meeting . Just as numerous promising applications have emerged from the Big Data revolution, other, less benign outcomes are also beginning to appear, results which affect the most vulnerable users, children and adolescents. Both corporations and public institutions must work harder and become more aware of the challenges and sometimes tragic consequences precipitated by these technological changes.

In particular, the extent to which situations of bullying, blackmailing and intimidation are facilitated and even exacerbated by the profusion of social media networks and meta-data collection clearly indicate the crucial importance of developing regulatory measures to which parents and concerned parties can have recourse.

It is therefore incumbent on the database community to wholeheartedly confront these challenges with the same ingenuity and vigor with which we have, over the last half century, solved less immediately social problems. Specifically, we must develop much-needed tools for analyzing traffic and detecting harmful protocols, while buttressing capacities for raising alarms and preventing tragedies.

Bipin C. Desai

General Chair



Session: Opening
Welcome

Date Time:

  2013-10-09   From   09:00   To   09:15

Location:

  Sala d'Actes

Chair:

  Josep L Larriba-pey

Session: Data Mining, OLAP,  and Knowledge Discovery

Date Time:

  2013-10-09   From   09:15   To   10:45

Location:

  Sala d'Actes

Chair:

  Pedro Furtado

  -- On-The-Fly Generation of Multidimensional Data Cubes for Web of Things(Full Paper)   
      Muntazir Mehdi, Ratnesh Sahay

The dynamicity of sensor data sources and publishing real-time sensor data over a generalised infrastructure like Web pose a new set of integration challenges.Recent Linked Data initiative has shown an edge in combining large amounts of data from many organizations over Web.This article specifically addresses the problem of adapting data model specific (or context-specific) properties in an automatic generation of multidimensional data cubes.The idea is to generate data cubes on-the-fly from syntactic sensor data to sustain decision making,event processing and to publish this data as LOD.


  -- Content-Based Annotation and Classification Framework: A General Multi-Purpose Approach(Full Paper)   
      Michal Batko, Jan Botorek, Petra Budikova

Nowadays, unprecedented amounts of digital data are becoming available. However, most of the data lacks semantic information necessary to organize these resources. For images in particular, textual annotations that represent the semantics are highly desirable. Only a small percentage of images is created with reliable annotations, thus a lot of effort is invested into automatic annotation. We address this problem from a general perspective and introduce a new annotation model applicable to many text assignment problems. We also provide experimental results from several instances of our model.


  -- Breaking Skyline Computation down to the Metal - the Skyline Breaker Algorithm(Full Paper)   
      Dominik Köppl

Given a sequential input connection, we tackle skyline computation of the read data by
means of using a spatial tree structure for indexing fine grained feature vectors and concurrency whenever possible.
With these methods, we seek to provide a robust algorithm.


Session: Database Query Languages

Date Time:

  2013-10-09   From   11:15   To   13:00

Location:

  Sala d'Actes

Chair:

  Guadalupe M Canahuate

  -- Verification of k-Coverage on Query Line Segments(Full Paper)   
      En Tzu Wang, Arbee l.p. Chen

In this paper, we address the k-coverage verification problem regarding a given query line segment, which returns all sub-segments from the line segment that are covered by at least k sensors. We propose three methods based on the R-tree index


  -- Evaluating Skyline Queries over Vertically Partitioned Tables(Short Paper)   
      José Rafael Subero Carrillo, Marlene Optional Goncalves

Skyline queries allow to filter high volume of data. In this work, we propose two new algorithms to evaluate Skyline queries over Vertically Partitioned Tables and we perform an experimental study that shows our  algorithms outperform the state-of-art algorithms considering synthetic and real datasets.


  -- Personalized Progressive Filtering of Skyline Queries in High Dimensional Spaces(Short Paper)   
      Yann Loyer, Isma Sadoun, Karine Zeitouni

The deterioration of skyline queries (the size of their answers) increases proportionally  with the number of criteria. We refine the skyline answer via successive relaxations of the dominance conditions w.r.t. user's preferences. We also  define ranking and top-k methods over the skyline set.


  -- How to Exploit the Device Diversity and Database Interaction to Propose a Generic Cost Model(Short Paper)   
      Ladjel Bellatreche, Ahcene Boukorca, Jalil Boukhobza

Cost models have been following the life cycle of databases. The spectacular development complex decision queries amplifies the interest of the physical design phase.  Most of these cost models are usually developed for one storage device with a well identified storage model and ignore the interaction between the different components of databases. In this paper, we propose a generic cost model for the physical design that can be instantiated for each need. We contribute an ontology describing storage devices.


  -- DynamicNet: An Effective and Efficient Algorithm for Supporting Community Evolution Detection in Time-Evolving Information Networks(Short Paper)   
      Alfredo Cuzzocrea, Francesco Folino

DynamicNet, an effective and efficient algorithm for supporting community evolution detection in time-evolving information networks is presented and experimentally evaluated in this paper.
DynamicNet introduces a graph-based model-theoretic approach to represent time-evolving information networks, and to capture how they change over time.


Session: Access Methods and Data Structures

Date Time:

  2013-10-09   From   14:15   To   15:45

Location:

  Sala d'Actes

Chair:

  Elio Masciari

  -- On the Efficiency of Multiple Range Query Processing in Multidimensional Data Structures(Full Paper)   
      Peter Chovanec, Michal Krátký

Multidimensional data are commonly utilized in many application areas. Processing range queries in a multidimensional data structure has some performance issues, especially in the case of a higher space dimension. Many real world queries can be transformed to a multiple range query, i.e. query including more than one query rectangle. In this article, we introduce range query algorithms for a sequence of range queries and the Cartesian range query. We show optimality of these algorithms from the IO and CPU costs point of view. These algorithms are tested in the well-known R-tree.


  -- Approximate High-Dimensional Nearest Neighbor Queries Using R-Forests(Full Paper)   
      Michael C Nolen, King-ip Lin

We propose using an forest of R-trees to find approximate nearest neighbors in high dimensional space. The idea is to partition the space into regions and then use a modified branch-and-bound approach. Experiments shows that our method  produce higher quality results then LSB-tree while maintaining the same efficienc.


  -- Top-k join queries: Overcoming the curse of anti-correlation(Full Paper)   
      Manish M Patil

We build a linear space index for top-k join queries, which in anticipation of worst-case scenario (anti-correlated data), maintains a subset of answers. Based on this, we show that one can achieve join trials proportional to sqrt(kn) i.e., average case performance even for the worst-case queries.


Session: Panel Session

Date Time:

  2013-10-09   From   16:15   To   18:00

Location:

  Sala d'Actes

Chair:

  Alfredo Cuzzocrea

Session: Keynote I
Matching Bounds for the All-Pairs MapReduce problem
Jeff Ullman

Date Time:

  2013-10-10   From   09:00   To   10:00

Location:

  Sala d'Actes

Chair:

  Josep L Larriba-pey

  -- Matching Bounds for the All-Pairs MapReduce Problem()   
      Jeffrey David Ullman, Foto N Afrati

The all-pairs problem is an input-output relationship where
each output corresponds to a pair of inputs, and each pair of inputs has a corresponding output. It models similarity joins where no simplification of the search for similar pairs, e.g., locality-sensitive hashing, is possible, and each input must be compared with every other input to determine those pairs that are “similar.” When implemented by a MapReduce algorithm, there was a gap, a factor of 2, between the lower bound on necessary communication and the communication required by the best known algorithm.


Session: Best Paper Lecture

Date Time:

  2013-10-10   From   10:30   To   11:15

Location:

  Sala d'Actes

Chair:

  Jeffrey David Ullman

  -- Efficiency and Precision Trade-Offs in Graph Summary Algorithms(Full Paper)   
      Stephane Campinas, Renaud Delbru

In many applications, it is convenient to substitute a large data graph with a smaller homomorphic graph. Accurate graph summarization algorithms are sub-optimal for a shared-nothing infrastructure such as MapReduce as they require multiple iterations over the data graph. We investigate approximate graph summarization algorithms that are efficient to compute in a shared-nothing infrastructure. We evaluate over several datasets the trade-offs between performance and precision. Experiments highlight the need of trade-offs between precision and complexity of a summarization algorithm.


Session: Data Warehousing, Integration

Date Time:

  2013-10-10   From   11:15   To   12:45

Location:

  Sala d'Actes

Chair:

  Alfredo Cuzzocrea

  -- A Compact Representation for Efficient Uncertain-Information Integration(Full Paper)   
      Fereidoon Sadri

We present the extended probabilistic relation, a compact model for uncertain data that admits efficient data integration, and explore data integration and query evaluation in this model. This work is the first and critical step towards practical and efficient uncertain information integration.


  -- Near Real-Time with Traditional Data Warehouse Architectures: Factors and How-to(Full Paper)   
      Nickerson Fonseca Ferreira, Pedro Furtado, Pedro Miguel Oliveira Martins

Traditional data warehouses integrate new data during lengthy offline periods,with indexes being dropped and rebuilt.We analyze how a set of factors influence near real-time and frequent loading capabilities, and what can be done to improve near real-time capacity using a traditional architecture. We analyze how the query workload affects and is affected by the ETL process and the influence of factors such as the type of load strategy,the size of the load data,indexing, integrity constraints,refresh activity over summary data,and fact table partitioning.We evaluate the factors experimentally.


  -- Dynamic Bitmap Index Recompression through Workload-Based Optimizations(Full Paper)   
      David Chiu, Jason Sawin, Gheorghi Guzun, Guadalupe M Canahuate

We present an optimizer which recompresses a bitmap index over time. Based on query history, our approach allows the user to specify the priority of compression versus query efficiency, then possibly recompress the bitmap accordingly. In an empirical study, our approach was able to achieve both better compression and query speedup over WAH and PLWAH. On the largest data set, our VLC optimizer compressed up to 1.73X better than WAH, and 1.46X over PLWAH. We also show a slight improvement in query efficiency in most experiments, while observing lucrative (11X to 16X) speedup in special cases.


Session: Privacy and security in database

Date Time:

  2013-10-10   From   14:00   To   15:45

Location:

  Sala d'Actes

Chair:

  Manish M Patil

  -- Load Balance for Semantic Cluster-based Data Integration Systems(Short Paper)   
      Edemberg Rocha Da Silva, Guilherme Barros De Souza, Ana Carolina Salgado

Data integration systems based on Peer-to-Peer systems have been developed to integrate dynamic, autonomous and heterogeneous data sources on the Web. Some of these systems adopt semantic approach for clustering their data sources, reducing the search space. Moreover, the clusters may become overloaded and traditional strategies of load balance are not applied to semantic clusters. In this paper, we discuss limitations of load balance in semantic clusters. In addition, we propose a solution for the load balance in semantic clusters and we present experimental results.


  -- SVMAX: A System for Secure and Valid Manipulation of XML Data(Short Paper)   
      Houari Mahfoud

We present SVMAX, the first system that supports specification and enforcement of both read and update access policies over arbitrary XML views (recursive or not). SVMAX defines expressive models for controlling access to XML data using the W3C standards. It features an efficient incremental validation of XML documents that yields better performance w.r.t traditional approaches. SVMAX can be easily integrated within commercial database systems. Experiments have shown the efficiency of our system.


  -- Querying data across different legal domains(Short Paper)   
      Marco Taddeo, Danilo Montesi, Alberto Trombetta, Stefano Pierantozzi

The management of legal domains is gaining great importance in the context of data management. In fact, the geographical distribution of data as implied – for example – by cloud-based services requires that the legal restrictions and  obligations are to be taken into account whenever data circulates across different legal domains. In this paper, we start to investigate an approach for coping with the complex issues that arise when dealing with data spanning different legal domains. Our approach consists of a conceptual model that
takes into account the notion of legal domain (to be paired with the corresponding data) and a reference architecture for implementing our approach in an actual relational DBMS.


  -- Sequential Pattern Mining from Trajectory Data(Short Paper)   
      Elio Masciari

In this paper, we study the problem of mining for frequent
trajectories, which is crucial in many application scenarios, such as vehicle traffic management, hand-off in cellular networks, supply chain management. We approach this problem as that of mining for frequent sequential patterns.


  -- Self-Managing Online Partitioner for Databases (SMOPD) – A Vertical Database Partitioning System with a Fully Automatic Online Approach(Short Paper)   
      Liangzhe Li, Le Gruenwald

This paper introduces an algorithm (Self-Managing Online Partitioner for Database - SMOPD) which can dynamically monitor the database performance using user-configured parameters and automatically perform a re-partitioning action without feedback from DBAs when bad performance is detected.


Session: Keynote II
LDBC: benchmark for graph and  RDF data management
Peter Boncz

Date Time:

  2013-10-11   From   09:00   To   10:00

Location:

  C6-E101 building C6

Chair:

  Jorge Bernardino

  -- LDBC: benchmarks for graph and RDF data management()   
      Peter Boncz

The Linked Data Benchmark Council (LDBC) is an EU project that aims to develop industry-strength benchmarks for graph and RDF data management systems.
LDBC introduces a so-called choke-point based benchmark development, through which experts identify key technical challenges, and introduce them in the benchmark workload, which we describe in some detail.
We also present the status of two LDBC benchmarks currently in development, one targeting graph data management systems using a social network data case, and the other targeting RDF systems using a data publishing case.


Session: Access Methods and Data Structures - 2

Date Time:

  2013-10-11   From   10:30   To   12:00

Location:

  C6-E101 building C6

Chair:

  David Dominguez-sal

  -- A hybrid page layout integrating PAX and NSM(Full Paper)   
      Ilia Petrov

The paper explores a hybrid page layout (HPL), combining the advantages of NSM and PAX. The design defines a continuum between NSM and PAX supporting both efficient scans minimizing cache faults and efficient insertions and updates. Our evaluation shows that HPL fills the PAX-NSM performance gap.


  -- Read Optimisations for Append Storage on Flash(Full Paper)   
      Robert Gottstein, Ilia Petrov

In the present short paper we present approaches to read-optimisations for append storage, analyse the potential benefits and propose the concept of combining multi-versioning DBMS with ordered append log storage and multi-version index structures, optimised for Flash storage.


  -- Cloudy: Heterogeneous middleware for in time queries processing(Full Paper)   
      Pedro Miguel Oliveira Martins, Pedro Furtado

PWe propose a timely-aware execution architecture, Cloudy, which balances data and queries processing among an elastic set of non-dedicated and heterogeneous nodes in order to provide scale-out performance and timely results, nor faster or slower, using both Complex Event Processing (CEP) and database (DB). Data is distributed by nodes accordingly with their hardware characteristics, then a set of layered mechanisms rearrange queries in order to provide in timely results. We present experimental evaluation of Cloudy and demonstrate its ability to provide timely results.


Session: Closing
IDEAS14 & Farewell

Date Time:

  2013-10-11   From   12:00   To   12:10

Location:

  C6-E101 building C6

Chair:

  Bipin C. Desai






Copyright © 2009  CINDI SYSTEM