SYLLABUS

CSCI 515 Data Engineering and Management
School of Electrical Engineering and Computer Science, University of North Dakota
Spring 2024

Class times: 03:30pm – 04:45pm, TuTh
Classroom: Harrington Hall 324
Credit hours: 3
Prerequisite: CSCI 513 Advanced Database Systems
Class pages: http://undcemcs01.und.edu/~wen.chen.hu/course/515/
 
Instructor: Wen-Chen Hu   (my teaching philosophy)
Email: wenchen@cs.und.edu
Office: Upson II 366K
Office hours: 12:30 pm – 02:30 pm, TuTh
Zoom ID: https://und.zoom.us/j/2489867333


Synchronous Class Delivery
The class lectures will be delivered synchronously via https://und.zoom.us/j/2489867333, and the Zoom video will be posted on the Blackboard afterwards. Students can watch the video clips anytime they want.

Lecture Notes
No textbook will be used. Instead award-winning, detailed, and precise class instructions and interactive, informative, and practical lecture notes (based on books, papers, online documents, and user manuals) will be provided. Collectively, the lecture notes and instructions are more like a small book, which supplies much more information than regular notes do and makes the subject studies much easier. Students will not have problem learning the subjects or taking the exams after studying them and doing programming exercises.

Course Descriptions
This course studies theoretical and applied issues related to data engineering and mining. Data engineering is to identify, investigate, and analyze the underlying principles in the design and effective use of information systems; and data mining is to discover patterns in large data sets and transform the patterns into a comprehensible structure for further applications. The following topics are covered:
  • Data crawling, collection, preparation, indexing, storage, searching, ranking, and mining,
  • Information retrieval,
  • Text analysis,
  • Database processing,
  • Database-driven web site construction,
  • Data processing and analysis,
  • Data classification and clustering,
  • Knowledge discovery,
  • Data visualization, sharing, and applications, and
  • Some other special topics.
Each student is required to build the following two systems:
  • a focused web search engine based on a data life cycle and
  • a data mining system using Firebase and TensorFlow.



Objectives
After taking this course, students are able to achieve the following goals, but not limited to:
  • Knowledge of data crawling, collection, and preparation,
  • Knowledge of data indexing, storage, searching, and ranking,
  • Knowledge of information retrieval,
  • Knowledge of data mining,
  • Knowledge of Google cloud-based, NoSQL, and realtime Firebase,
  • Knowledge of Google TensorFlow for machine learning,
  • Knowledge of Google APIs for Web, and
  • Proficiency in data analytics and processing.
Evaluations
    Two programming exercises:
      1. Data life cycle          ——  20%
      2. Data mining & analytics  ——  20%
    Two exams                     ——  20% each
    Final exam                    ——  20%

Tentative Schedule
    Week               1  ——  Introduction
    Weeks      2,  3,  4  ——  Programming Exercise I construction
    Weeks      5,  6,  7  ——  Information retrieval
    Weeks  8, 10, 11, 12  ——  Firebase and data analytics
    Weeks     13, 14, 15  ——  Data mining
    Weeks         16, 17  ——  Data mining and management concepts

Remark I
Terminologies and definitions will be discussed minimally in this course. Instead, (i) effective methods and practical works will be emphasized and enforced and (ii) the trend of data engineering and mining will be discussed.

Remark II
Unlike the disciplines such as databases or the World Wide Web, data engineering and management (DEM) is one of the disciplines (like image processing or artificial intelligence) without coherent methods or algorithms. Many methods (such as artificial neural networks or relevance feedback) are used by DEM and each method is usually not closely related to other methods (like decision trees or sequential pattern mining).

Remark III
In order to show what the data engineering and management (DEM) is in a semester, this course has to pick a small number of fundamental topics, instead of many topics, to investigate. Students then use the training to choose appropriate methods for the problems they encounter in the future.

Remark IV
Data engineering and management (and information retrieval) is a mature subject. A wide variety of methods have been applied to it, and the current methods are rather complicated because of its maturity. In order to cover more topics, the methods introduced in this course are fundamental or primitive. Students learn how the DEM methods work, and may try to enhance the methods or apply them in their programming exercises.

Remark V
The DEM is a well-developed subject, and it is not easy to find a brand-new method. On the other hand, artificial intelligence (AI), data mining (DM), machine learning (ML), or information retrieval (IR) has plenty of methods available to be used or adopted. In order to take the advantages from both, the DEM borrows many methods from AI/DM/ML/IR. However, the DEM is not the same as AI/DM/ML/IR because of the problem of data processing. That is a data research topic may consist of two parts: DEM and AI/DM/ML/IR, and you want to put an emphasis on the former instead of the latter because the DEM is more useful and practical.

Instructor’s qualification
The instructor’s current research interests include (mobile) data research and applications such as (mobile) data security & mining, and mobile/smartphone/spatial/web computing. He has applied various information retrieval methods (such as artificial neural networks, finite-state machines, and association-rule and sequential-pattern mining) to mobile applications and web searches. The instructor has published more than 100 research publications and advised more than 50 graduate students. Most of the research topics are related to (mobile) data engineering, management, and mining.

Dishonesty
Under no circumstances will acts of academic dishonesty be tolerated. Any suspected incidents of dishonesty will be promptly referred to the Assistant Dean of Students. Refer to the Code of Student Life, Appendix B.2: Academic Dishonesty.

Disability
Students who need special accommodations for learning or who have special needs are invited to share these concerns or requests with the instructor as soon as possible.