Please provide a short (approximately 100 word) summary of the following web Content, written in the voice of the original author. If there is anything controversial please highlight the controversy. If there is something surprising, unique, or clever, please highlight that as well. Content: Title: Nobody cares about our concurrency control research [pdf] Site: www.cs.cmu.edu WHAT ARE WE DOING WITH OUR LIVES? Nobody Cares About Our Concurrency Control Research SIGMOD 2017 @andy_pavlo I am only allowed 3 plugs in this talk. # DISK-ORIENTED CONCURRENCY CONTROL 3 Allows a DBMS to mask the latency of non-volatile storage. Pioneering work on transaction processing from the 1970s. Jim Gray The Great Phil Bernstein IN-MEMORY CONCURRENCY CONTROL 4 New concurrency control schemes are needed if the database is assumed to be in memory. Early research in 1980s. Some commercial DBMSs from 1990s. 21ST CENTURY RESEARCH ON IN-MEMORY CONCURRENCY CONTROL 5 Partitioned Protocols → H-Store (VLDB 2007) Non-Partitioned Protocols → Microsoft Hekaton (VLDB 2011) → Silo (SOSP 2013) NOBODY CARES ABOUT OUR CONCURRENCY CONTROL RESEARCH 6 All of this research is great for “classic” OLTP applications. We are not addressing the needs of new fields and environments. NOBODY CARES ABOUT OUR CONCURRENCY CONTROL RESEARCH 7 Peter Bailis examined real-world DB applications. Few of them use txns and many of them don’t use them correctly. NOBODY CARES ABOUT OUR CONCURRENCY CONTROL RESEARCH 7 Peter Bailis examined real-world DB applications. Few of them use txns and many of them don’t use them correctly. We did an automated evaluation with the CMDBAC corpus. Few apps written in popular frameworks use txns. 1 COMMON ASSUMPTIONS MADE IN CONCURRENCY CONTROL RESEARCH 8 Assumption #1: All transactions execute as stored procedures. Assumption #2: All transactions execute with serializable isolation. CONFERENCE PAPER SURVEY 9 Examined SIGMOD and VLDB publications from 2011-2016. We found 95 out of 1843 (5%) papers on transaction processing and concurrency control. DATABASE ADMIN SURVEY OVERVIEW 14 We commissioned a survey of DBAs in April 2017 on how applications use databases. 50 responses for 79 DBMS installations. +Nine others DATABASE ADMIN SURVEY STORED PROCEDURES 15 What percentage of the transactions run on your DBMS are executed as stored procedures? s e s n o p s e R f o # 25 20 15 10 5 0 21 20 11 9 12 4 None 1-10% 11-25% 26-50% 51-75% 76-100% DATABASE ADMIN SURVEY ISOLATION LEVEL What isolation level do transactions execute at on this DBMS? None Few Most All 16 s e s n o p s e R f o # 30 20 10 0 26 22 12 10 8 6 4 2 12 10 12 10 11 5 2 5 3 3 1 1 11 8 2 0 Read Uncommitted Read Committed Cursor Stability Repeatable Read Snapshot Isolation Serializable DATABASE ADMIN SURVEY FEEDBACK 17 Stored Procedures → Software engineering challenges. → Don’t want devs to update too often. Serializable Isolation → It was always done this way. → Not worth the overhead. WHAT DOES THIS MEAN FOR OUR RESEARCH? 18 Assuming that every txn executes as a stored procedure with serializable isolation changes the bottleneck. You end up optimizing things that are not as important as you think… Aren’t I being hypocritical? A RESEARCH AGENDA FOR THE NEXT 10 YEARS 22 → Examine Entire DBMS Architecture → Communication Overhead → Understand Lower Isolation Levels IN-MEMORY MULTI-VERSION CONCURRENCY CONTROL STUDY 23 The DBMS’s concurrency control protocol is not the only critical part of executing txns in a DBMS. IN-MEMORY MULTI-VERSION CONCURRENCY CONTROL STUDY 23 The DBMS’s concurrency control protocol is not the only critical part of executing txns in a DBMS. IN-MEMORY MULTI-VERSION CONCURRENCY CONTROL STUDY 23 The DBMS’s concurrency control protocol is not the only critical part of executing txns in a DBMS. → Secondary Indexes → Version Storage / Ordering → Garbage Collection IN-MEMORY MULTI-VERSION CONCURRENCY CONTROL STUDY Hybrid Workload TPC-C + OLAP Query (40wh) 90 60 30 0 t u p h g u o r h T ) c e s / n x t K ( 2 8 16 24 # of Threads 32 40 AN EMPIRICAL EVALUATION OF IN-MEMORY MULTI-VERSION CONCURRENCY CONTROL VLDB 2017 24 MVCC Configurations Oracle/MySQL NuoDB HYRISE MemSQL HyPer SAP HANA Hekaton Postgres 2.5 RE-EXAMINE DBMS COMMUNICATION OVERHEAD 25 Most applications are in the same data center as the DBMS machine. Kernel bypass methods: → RDMA → Intel DPDK Prefetching with machine learning. UNDERSTAND LOWER ISOLATION LEVELS 26 We don’t understand how applications are affected by lower isolation levels. Maybe READ COMMITTED is good enough or maybe people don’t know how dirty their data actually is… WHAT ARE WE DOING WITH OUR LIVES? CONCLUSION 27 It is (still) an interesting time for database research. Let’s make sure we work on the right problems. We need a better way of collecting information about applications. SOME PEOPLE DO CARE ABOUT OUR CONCURRENCY CONTROL RESEARCH 28 Serializable Snapshot Isolation Michael Cahill Deterministic Concurrency Control Dan Abadi END @andy_pavlo Joy Arulraj Winter 2018 3