CS 471000 Introduction to Database Systems

Implementation, architectural design, and trade-offs.

Description

This course provides an overview of the current database management systems in the cloud, and explains how they are different from traditional database systems. The goal is to get students familiar with some well-known implementations like NoSQL databases, Google BigTable, Google MegaStore, and Google Spanner etc., and more importantly, to help students make better decisions on the design tradeoffs when configuring/building their own database systems given a particular set of target applications (tenants) in mind.

Proper understanding of Java/OOP programming and data structure is required.

Syllabus

Instructor

Teaching Assistants

Pinkie Chen
*陳玠霖

Wei-Hung Chang
*張維紘

Bo-Cheng Yang
楊博丞

Yen-Ting Wang
王彥婷


Contact TAs: dbta2024@datalab.cs.nthu.edu.tw
*DB Team

Time & Location

  • Lecture: Mon. 15:30-17:20 at Delta 103
  • Lab: Thur. 14:20-15:10 at Delta 103
  • Office hour: Thur. 13:20-14:10 at Delta 729

Grading Policy

  • Assignments (x5): 50%
  • Quiz: 15%
  • Midterm exam: 15%
  • Final project : 20%

Prerequisits

This course is intended for senior undergraduate and junior graduate students who understand

  • OOP Programming,
  • Multi-threaded Programming,
  • Data Structure, and
  • Version Control.
We use Java as the main programming language throughout the course. Although not required, background knowledge about operating system will be helpful.

Announcement

Curriculum

If you have any feedback, feel free to contact: shwu@cs.nthu.edu.tw

Lecture 00

Introduction

What's Database? | About This Course... | FAQ

Slides

Lecture 01

Using a DBMS

Main Features of a DBMS | Data Models

Slides

Lab1 Using PostgreSQL

Using PostgreSQL

Slides

Lab1 Java Concurrency

Java Concurrency

Slides

Lecture 02

Data Modeling

ER & Relational Models | Weak Entities | Functional Dependencies | Normal Forms

Video Slides

Lab2 General Rules For Assignments

Introducing the assignment platform we will be using in this course and the general rules for assignments

Slides

Lab2 Introduction to Git

This lab guides you through the main idea of version control systems and the basic usage of Git.

Slides GitLab

Lab2 Using VanillaDB

Basic tutorial of VanillaDB

Slides GitLab

Assignment 1

Given some scenario, design the ER model.

GitLab

Lecture 03

Architecture and Interfaces

Architecture Overview | SQL | JDBC | Native Interface | RecordFile | MetaData

Video Slides

Lab3-1 Introducing Benchmark Project

This lab guides you through the Benchmark Project.

Slides

Lecture 04

Server and Threads

Introduction | Threads Processes | Supporting Concurrent Clients | Embedded Clients vs. Remote Clients | RMI | JDBC Implementation

Video Slides

Lab3-2 VanillaCore Walkthrough Part 1

This lab guides you through the VanillaCore. (Server, Remote Access, Utilities)

Slides

Assignment 2

Implement JDBC version and Stored Procedures version of Read/Write transaction in VanillaDB.

GitLab

Assignment 2 Solution

Explain Assignment 2

Slides

Lecture 05

Query Processing

Overview | Parsing & Verification | Parser & SQL Data | Predicates | Predicates | Scans | Plans | Assignment

Video Slides

Lab4 VanillaCore Walkthrough Part 2

This lab guides you through the VanillaCore. (Query)

Slides

Assignment 3

Implement Explain SQL operation in VanillaDB.

GitLab

Lecture 06

Data Access and File Management

Storage Engine | I/O Interfaces | Compromised File Management | Implementation: Page & FileMgr

Video Slides

VanillaCore Walkthrough Part 3

This lab guides you through the VanillaCore. (File Access)

Slides

Lecture 07

Memory Management

Buffer Pools & Pinning | Buffer Replacement | Pool Size & Deadlock Handling | ACID & Logging | Caching Logs

Video Slides

VanillaCore Walkthrough Part 4

This lab guides you through the VanillaCore. (Memory Management)

Slides

Assignment 4

Optimize file and buffer modules of VanillaCore.

Slides GitLab

Assignment 4 Solution

Explain Assignment 4

Slides

Lecture 08

Record Management

Record Management | Design Considerations for Record Manager | The VanillaCore Record Manager

Video Slides

VanillaCore Walkthrough Part 5

This lab guides you through the VanillaCore. (Record)

Slides

Lecture 09

Transaction Management Part I : Concurrency Control

Overview of Transaction API | Schedules | Anomalies | 2PL & S2PL | Deadlock Handling | Assignment | Multi-Granularity Locking | Phantoms | Isolation Levels | Meta Structure | CC in VanillaCore

Video Slides

VanillaCore Walkthrough Part 6

This lab guides you through the VanillaCore. (Lock)

Slides

Assignment 5

Implement conservative locking protocol.

GitLab

Assignment 5 Solution

Explain Assignment 5

Slides

Lecture 10

Transaction Management Part II : Recovery

Recap: WAL | Physical Logging & Tx Rollbacks | Undo Recovery | Undo-Redo Recovery | Repeated Failures | Checkpointing

Video Slides

Lecture 11

Query Optimization

Cost Estimation | Histogram

Video Slides

Lecture 12

AI & Vector DBMS

Vector DBMS | PASE | Milvus

Slides

Final Project

Final Project

Introduction to Final Project

Slides

Final Project

Implement vector dbms.

GitLab

Lecture 13

Cloud Database

Introducing SAE for cloud DBMS | Data Partitioning for Scalability | Replication for Availability | Case Study

Slides

Lecture 14

Trade-Offs and NoSQL

SAE revisited | Non-relational partitioned DDBMS | Non-relational replicated DDBMS | Elasticity in non-relational DDBMS

Slides

Resources

Following provides links to some useful online resources. If this course starts your DB journey, don't stop here. Enroll yourself in advanced courses (shown below) to learn more.

Other Course Materials

For more course materials (such as assignments, score sheets, etc.) and online forum please refer to the GitLab.

GitLab

Documentation

Describes the API operations for the VanillaDB in detail.

Java Doc

Reference Books

  • Raghu Ramakrishnan et al., Database Management Systems, 3 Edition, McGraw-Hill, 2002, ISBN: 0072465638

  • Abraham Silberschatz et al, Database System Concepts, 6 Edition, McGraw-Hill, 2010, ISBN: 0073523321

  • Edward Sciore, Database Design and Implementation, Wiley, 2008, ISBN: 0471757160

  • M.Tamer Özsu, Principles of Distributed Database Systems, 3 Edition, Springer, 2011, ISBN: 1441988335

  • Rachid Guerraoui et al., Introduction to Reliable Distributed Programming, Springer, 2006, ISBN: 3540288457