An Introduction to This Course (Cont.)


Data Life Cycle
Data is everywhere and becomes a precious commodity for companies. However, raw data is usually not useful. The data life cycle is the sequence of stages that a particular unit of data goes through from its initial generation or capture to its eventual applications at the end of its useful life:

  1. Problem specifications: A set of data may be useful for an application, but may be useless for another application. Before collecting data, target data needs to be specified exactly.

  2. Data collection: It is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. In this phase, data comes into an organization, usually through data entry, acquisition from an external source, or signal reception (such as transmitted sensor data).

  3. Data preparation: Data has to be processed prior to its use. Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It often involves reformatting data, making corrections to data, and the combining of data sets to enrich data.

  4. Data indexing and storage: Before the data is saved in storage, data structures (like B-trees or hash tables) are created and used to improve the speed of data retrieval operations on the data. Indexes are used to quickly locate data without having to search every data item in storage.