Data exists in one of three states: Data at Rest, Data in Motion and Data in Use. When we understand these three states, it lays the foundation for how to extract value from that data to support business operations. This blog intends to introduce the concepts and lay a basis for the upcoming data engineering blogs in the series.
Transactional systems such as points of sale (POS), enterprise resource planning (ERP) generate and store data in a database or a mainframe. Website clicks and social media data are other sources of data. Using those two examples of types of data, we can define the three states of data.
Data at Rest
Data at rest is data generated from a transactional system. Data analytics and business intelligence teams use data at rest to extract value. Data at rest resides in hard drives in the company’s network or cloud storage with security policies. Thus, it needs to be secure. Best practice favors encrypting data at rest and disabling access from external sources such as USB sticks or hard drives. For a long time, data at rest has been the primary source of business intelligence. Even today, dost data engineering tasks still use data at rest. The reason for the prevalence of data at rest is the existence of old systems that provide value. Another reason is that transactional systems contribute to 60% of the data sources for analytics, business intelligence, and algorithms.
Data in Motion and Data in Use
When different transactions happen in real-time, we generate Data in Motion. The shelf life of the value of data in motion is limited. To ensure the end-users get that value as soon as possible, data analytics teams need to provide data for consumption quickly. For example, social media data has mentions that have a value of maybe a day. It follows that if end users are to get any value from the data, data engineers need to extract it in less than 24 hours.
Once data is processed and available for consumption, it gives rise to Data in Use. As the name suggests, the data in use is not static or passive. And instead, it is actively moving through an IT system. Some examples of data in use include data processed in the CPU, a database or RAM.
Now that we have identified the different data states, we will talk about extracting the data from the sources in the next set of blogs. We will also discuss how we load it into a storage layer and transform the data for consumption.