Mison: A Fast JSON Parser for Data Analytics
Gespeichert in:
Verfasser / Beitragende:
[Yinan Li, Nikos Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, Donald Kossman]
Ort, Verlag, Jahr:
Association for Computing Machinery,
2017
Enthalten in:
Proceedings of the VLDB Endowment, pp. 1118-1129
Format:
Artikel (online)
Online Zugang:
| LEADER | naa a22 4500 | ||
|---|---|---|---|
| 001 | 528783246 | ||
| 005 | 20180924065515.0 | ||
| 007 | cr unu---uuuuu | ||
| 008 | 180924e201706 xx s 100 0 eng | ||
| 024 | 7 | 0 | |a 10.3929/ethz-b-000234616 |2 doi |
| 035 | |a (ETHRESEARCH)oai:www.research-collecti.ethz.ch:20.500.11850/234616 | ||
| 245 | 0 | 0 | |a Mison: A Fast JSON Parser for Data Analytics |h [Elektronische Daten] |c [Yinan Li, Nikos Katsipoulakis, Badrish Chandramouli, Jonathan Goldstein, Donald Kossman] |
| 260 | |b Association for Computing Machinery |c 2017 | ||
| 506 | |a Open access |2 ethresearch | ||
| 520 | 3 | |a The growing popularity of the JSON format has fueled increased interest in loading and processing JSON data within analytical data processing systems. However, in many applications, JSON pars- ing dominates performance and cost. In this paper, we present a new JSON parser called Mison that is particularly tailored to this class of applications, by pushing down both projection and filter operators of analytical queries into the parser. To achieve these features, we propose to deviate from the traditional approach of building parsers using finite state machines (FSMs). Instead, we follow a two-level approach that enables the parser to jump di- rectly to the correct position of a queried field without having to perform expensive tokenizing steps to find the field. At the upper level, Mison speculatively predicts the logical locations of queried fields based on previously seen patterns in a dataset. At the lower level, Mison builds structural indices on JSON data to map logi- cal locations to physical locations. Unlike all existing FSM-based parsers, building structural indices converts control flow into data flow, thereby largely eliminating inherently unpredictable branches in the program and exploiting the parallelism available in modern processors. We experimentally evaluate Mison using representative real-world JSON datasets and the TPC-H benchmark, and show that Mison produces significant performance benefits over the best existing JSON parsers; in some cases, the performance improve- ment is over one order of magnitude. | |
| 520 | 2 | |a 43rd International Conference on Very Large Data Bases (VLDB 2017) in Munich, Germany (August 28 - September 1, 2017) | |
| 540 | |a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International |u http://creativecommons.org/licenses/by-nc-nd/4.0 |2 ethresearch | ||
| 700 | 1 | |a Li |D Yinan |e joint author | |
| 700 | 1 | |a Katsipoulakis |D Nikos |e joint author | |
| 700 | 1 | |a Chandramouli |D Badrish |e joint author | |
| 700 | 1 | |a Goldstein |D Jonathan |e joint author | |
| 700 | 1 | |a Kossman |D Donald |e joint author | |
| 773 | 0 | |t Proceedings of the VLDB Endowment |d Association for Computing Machinery |g pp. 1118-1129 | |
| 856 | 4 | 0 | |u http://hdl.handle.net/20.500.11850/234616 |q text/html |z WWW-Backlink auf das Repository (Open access) |
| 908 | |D 1 |a Conference Paper |2 ethresearch | ||
| 950 | |B ETHRESEARCH |P 856 |E 40 |u http://hdl.handle.net/20.500.11850/234616 |q text/html |z WWW-Backlink auf das Repository (Open access) | ||
| 950 | |B ETHRESEARCH |P 700 |E 1- |a Li |D Yinan |e joint author | ||
| 950 | |B ETHRESEARCH |P 700 |E 1- |a Katsipoulakis |D Nikos |e joint author | ||
| 950 | |B ETHRESEARCH |P 700 |E 1- |a Chandramouli |D Badrish |e joint author | ||
| 950 | |B ETHRESEARCH |P 700 |E 1- |a Goldstein |D Jonathan |e joint author | ||
| 950 | |B ETHRESEARCH |P 700 |E 1- |a Kossman |D Donald |e joint author | ||
| 950 | |B ETHRESEARCH |P 773 |E 0- |t Proceedings of the VLDB Endowment |d Association for Computing Machinery |g pp. 1118-1129 | ||
| 898 | |a BK010053 |b XK010053 |c XK010000 | ||
| 949 | |B ETHRESEARCH |F ETHRESEARCH |b ETHRESEARCH |j Conference Paper |c Open access | ||