Mison: A Fast JSON Parser for Data Analytics

Verfasser / Beitragende:
[Yinan Li, Nikos Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, Donald Kossman]
Ort, Verlag, Jahr:
Association for Computing Machinery, 2017
Enthalten in:
Proceedings of the VLDB Endowment, pp. 1118-1129
Format:
Artikel (online)
ID: 528783246
LEADER naa a22 4500
001 528783246
005 20180924065515.0
007 cr unu---uuuuu
008 180924e201706 xx s 100 0 eng
024 7 0 |a 10.3929/ethz-b-000234616  |2 doi 
035 |a (ETHRESEARCH)oai:www.research-collecti.ethz.ch:20.500.11850/234616 
245 0 0 |a Mison: A Fast JSON Parser for Data Analytics  |h [Elektronische Daten]  |c [Yinan Li, Nikos Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, Donald Kossman] 
260 |b Association for Computing Machinery  |c 2017 
506 |a Open access  |2 ethresearch 
520 3 |a The growing popularity of the JSON format has fueled increased interest in loading and processing JSON data within analytical data processing systems. However, in many applications, JSON pars- ing dominates performance and cost. In this paper, we present a new JSON parser called Mison that is particularly tailored to this class of applications, by pushing down both projection and filter operators of analytical queries into the parser. To achieve these features, we propose to deviate from the traditional approach of building parsers using finite state machines (FSMs). Instead, we follow a two-level approach that enables the parser to jump di- rectly to the correct position of a queried field without having to perform expensive tokenizing steps to find the field. At the upper level, Mison speculatively predicts the logical locations of queried fields based on previously seen patterns in a dataset. At the lower level, Mison builds structural indices on JSON data to map logi- cal locations to physical locations. Unlike all existing FSM-based parsers, building structural indices converts control flow into data flow, thereby largely eliminating inherently unpredictable branches in the program and exploiting the parallelism available in modern processors. We experimentally evaluate Mison using representative real-world JSON datasets and the TPC-H benchmark, and show that Mison produces significant performance benefits over the best existing JSON parsers; in some cases, the performance improve- ment is over one order of magnitude. 
520 2 |a 43rd International Conference on Very Large Data Bases (VLDB 2017) in Munich, Germany (August 28 - September 1, 2017) 
540 |a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International  |u http://creativecommons.org/licenses/by-nc-nd/4.0  |2 ethresearch 
700 1 |a Li  |D Yinan  |e joint author 
700 1 |a Katsipoulakis  |D Nikos  |e joint author 
700 1 |a Chandramouli  |D Badrish  |e joint author 
700 1 |a Goldstein  |D Jonathan  |e joint author 
700 1 |a Kossman  |D Donald  |e joint author 
773 0 |t Proceedings of the VLDB Endowment  |d Association for Computing Machinery  |g pp. 1118-1129 
856 4 0 |u http://hdl.handle.net/20.500.11850/234616  |q text/html  |z WWW-Backlink auf das Repository (Open access) 
908 |D 1  |a Conference Paper  |2 ethresearch 
950 |B ETHRESEARCH  |P 856  |E 40  |u http://hdl.handle.net/20.500.11850/234616  |q text/html  |z WWW-Backlink auf das Repository (Open access) 
950 |B ETHRESEARCH  |P 700  |E 1-  |a Li  |D Yinan  |e joint author 
950 |B ETHRESEARCH  |P 700  |E 1-  |a Katsipoulakis  |D Nikos  |e joint author 
950 |B ETHRESEARCH  |P 700  |E 1-  |a Chandramouli  |D Badrish  |e joint author 
950 |B ETHRESEARCH  |P 700  |E 1-  |a Goldstein  |D Jonathan  |e joint author 
950 |B ETHRESEARCH  |P 700  |E 1-  |a Kossman  |D Donald  |e joint author 
950 |B ETHRESEARCH  |P 773  |E 0-  |t Proceedings of the VLDB Endowment  |d Association for Computing Machinery  |g pp. 1118-1129 
898 |a BK010053  |b XK010053  |c XK010000 
949 |B ETHRESEARCH  |F ETHRESEARCH  |b ETHRESEARCH  |j Conference Paper  |c Open access