Provide a detailed summary of the following web content, including what type of content it is (e.g. news article, essay, technical report, blog post, product documentation, content marketing, etc). If the content looks like an error message, respond 'content unavailable'. If there is anything controversial please highlight the controversy. If there is something surprising, unique, or clever, please highlight that as well: Title: The Design of Postgres [pdf] Site: dsf.berkeley.edu THE DESIGN OF POSTGRES Michael Stonebraker and Lawrence A. Rowe Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720 This paper presents the preliminary design of a new database management system, called POSTGRES, that is the successor to the INGRES relational database system. The main design goals of the new system are to: Abstract 1) provide better support for complex objects, 2) provide user extendibility for data types, operators and access methods, 3) provide facilities for active databases (i.e., alerters and triggers) and inferencing includ- ing forward- and backward-chaining, 4) simplify the DBMS code for crash recovery, 5) produce a design that can take advantage of optical disks, workstations composed of multiple tightly-coupled processors, and custom designed VLSI chips, and 6) make as few changes as possible (preferably none) to the relational model. The paper describes the query language, programming langauge interface, system architecture, query processing strategy, and storage system for the new system. 1. INTRODUCTION The INGRES relational database management system (DBMS) was implemented during 1975-1977 at the Univerisity of California. Since 1978 various prototype extensions have been made to support distributed databases [STON83a], ordered relations [STON83b], abstract data types [STON83c], and QUEL as a data type [STON84a]. In addition, we proposed but never pro- totyped a new application program interface [STON84b]. The University of California version of INGRES has been ‘‘hacked up enough’’ to make the inclusion of substantial new function extremely difficult. Another problem with continuing to extend the existing system is that many of our proposed ideas would be difficult to integrate into that system because of earlier design decisions. Consequently, we are building a new database system, called POSTGRES (POST inGRES). This paper describes the design rationale, the features of POSTGRES, and our proposed implementation for the system. The next section discusses the design goals for the system. Sec- tions 3 and 4 presents the query language and programming language interface, respectively, to the system. Section 5 describes the system architecture including the process structure, query 1 processing strategies, and storage system. 2. DISCUSSION OF DESIGN GOALS The relational data model has proven very successful at solving most business data process- ing problems. Many commercial systems are being marketed that are based on the relational model and in time these systems will replace older technology DBMS’s. However, there are many engineering applications (e.g., CAD systems, programming environments, geographic data, and graphics) for which a conventional relational system is not suitable. We have embarked on the design and implementation of a new generation of DBMS’s, based on the relational model, that will provide the facilities required by these applications. This section describes the major design goals for this new system. The first goal is to support complex objects [LORI83, STON83c]. Engineering data, in con- trast to business data, is more complex and dynamic. Although the required data types can be simulated on a relational system, the performance of the applications is unacceptable. Consider the following simple example. The objective is to store a collection of geographic objects in a database (e.g., polygons, lines, and circles). In a conventional relational DBMS, a relation for each type of object with appropriate fields would be created: POLYGON (id, other fields) CIRCLE (id, other fields) LINE (id, other fields) To display these objects on the screen would require additional information that represented display characteristics for each object (e.g., color, position, scaling factor, etc.). Because this information is the same for all objects, it can be stored in a single relation: DISPLAY( color, position, scaling, obj-type, object-id) The ‘‘object-id’’ field is the identifier of a tuple in a relation identified by the ‘‘obj-type’’ field (i.e., POLYGON, CIRCLE, or LINE). Given this representation, the following commands would have to be executed to produce a display: foreach OBJ in {POLYGON, CIRCLE, LINE} do range of O is OBJ range of D is DISPLAY retrieve (D.all, O.all) where D.object-id = O.id and D.obj-type = OBJ Unfortunately, this collection of commands will not be executed fast enough by any relational system to ‘‘paint the screen’’ in real time (i.e., one or two seconds). The problem is that regard- less of how fast your DBMS is there are too many queries that have to be executed to fetch the data for the object. The feature that is needed is the ability to store the object in a field in DISPLAY so that only one query is required to fetch it. Consequently, our first goal is to correct this deficiency. The second goal for POSTGRES is to make it easier to extend the DBMS so that it can be used in new application domains. A conventional DBMS has a small set of built-in data types and access methods. Many applications require specialized data types (e.g., geometic data types for CAD/CAM or a latitude and longitude position data type for mapping applications). While these data types can be simulated on the built-in data types, the resulting queries are verbose and confusing and the performance can be poor. A simple example using boxes is presented else- where [STON86]. Such applications would be best served by the ability to add new data types and new operators to a DBMS. Moreover, B-trees are only appropriate for certain kinds of data, and new access methods are often required for some data types. For example, K-D-B trees 2 [ROBI81] and R-trees [GUTM84] are appropriate access methods for point and polygon data, respectively. Consequently, our second goal is to allow new data types, new operators and new access methods to be included in the DBMS. Moreover, it is crucial that they be implementable by non-experts which means easy-to-use interfaces should be preserved for any code that will be written by a user. Other researchers are pursuing a similar goal [DEWI85]. The third goal for POSTGRES is to support active databases and rules. Many applications are most easily programmed using alerters and triggers. For example, form-flow applications such as a bug reporting system require active forms that are passed from one user to another [TSIC82, ROWE82]. In a bug report application, the manager of the program maintenance group should be notified if a high priority bug that has been assigned to a programmer has not been fixed by a specified date. A database alerter is needed that will send a message to the manager calling his attention to the problem. Triggers can be used to propagate updates in the database to maintain consistency. For example, deleting a department tuple in the DEPT relation might trigger an update to delete all employees in that department in the EMP relation. In addition, many expert system applications operate on data that is more easily described as rules rather than as data values. For example, the teaching load of professors in the EECS department can be described by the following rules: 1) The normal load is 8 contact hours per year 2) The scheduling officer gets a 25 percent reduction 3) The chairman does not have to teach 4) Faculty on research leave receive a reduction proportional to their leave fraction 5) Courses with less than 10 students generate credit at 0.1 contact hours per student 6) Courses with more than 50 students generate EXTRA contact hours at a rate of 0.01 per student in excess of 50 7) Faculty can have a credit balance or a deficit of up to 2 contact hours These rules are subject to frequent change. The leave status, course assignments, and administra- tive assignments (e.g., chairman and scheduling officer) all change frequently. It would be most natural to store the above rules in a DBMS and then infer the actual teaching load of individual faculty rather than storing teaching load as ordinary data and then attempting to enforce the above rules by a collection of complex integrity constraints. Consequently, our third goal is to support alerters, triggers, and general rule processing. The fourth goal for POSTGRES is to reduce the amount of code in the DBMS written to support crash recovery. Most DBMS’s have a large amount of crash recovery code that is tricky to write, full of special cases, and very difficult to test and debug. Because one of our goals is to allow user-defined access methods, it is imperative that the model for crash recovery be as simple as possible and easily extendible. Our proposed approach is to treat the log as normal data managed by the DBMS which will simplify the recovery code and simultaneously provide sup- port for access to the historical data. Our next goal is to make use of new technologies whenever possible. Optical disks (even writable optical disks) are becoming available in the commercial marketplace. Although they have slower access characteristics, their price-performance and reliability may prove attractive. A system design that includes optical disks in the storage hierarchy will have an advantage. Another technology that we forsee is workstation-sized processors with several CPU’s. We want to design POSTGRES in such way as to take advantage of these CPU resources. Lastly, a design 3 that could utilize special purpose hardware effectively might make a convincing case for design- ing and implementing custom designed VLSI chips. Our fifth goal, then, is to investigate a design that can effectively utilize an optical disk, several tightly coupled processors and custom designed VLSI chips. The last goal for POSTGRES is to make as few changes to the relational model as possible. First, many users in the business data processing world will become familiar with relational con- cepts and this framework should be preserved if possible. Second, we believe the original ‘‘spar- tan simplicity’’ argument made by Codd [CODD70] is as true today as in 1970. Lastly, there are many semantic data models but there does not appear to be a small model that will solve everyone’s problem. For example, a generalization hierarchy will not solve the problem of struc- turing CAD data and the design models developed by the CAD community will not handle gen- eralization hierarchies. Rather than building a system that is based on a large, complex data model, we believe a new system should be built on a small, simple model that is extendible. We believe that we can accomplish our goals while preserving the relational model. Other researchers are striving for similar goals but they are using different approaches [AFSA85, ATKI84, COPE84, DERR85, LORI83, LUM85] The remainder of the paper describes the design of POSTGRES and the basic system archi- tecture we propose to use to implement the system. 3. POSTQUEL This section describes the query language supported by POSTGRES. The relational model as described in the original definition by Codd [CODD70] has been preserved. A da