<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>datapipelinetypes Archives - Napa Analytics</title>
	<atom:link href="https://napaanalytics.com/datapipeline/dataengg/datapipelinetypes/feed/" rel="self" type="application/rss+xml" />
	<link>https://napaanalytics.com/datapipeline/dataengg/datapipelinetypes/</link>
	<description>Cloud Data Engineering Talent On Demand</description>
	<lastBuildDate>Wed, 27 Oct 2021 23:30:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>What are different types of Data Pipelines</title>
		<link>https://napaanalytics.com/dataengg/datapipelinetypes/</link>
					<comments>https://napaanalytics.com/dataengg/datapipelinetypes/#respond</comments>
		
		<dc:creator><![CDATA[napalytics]]></dc:creator>
		<pubDate>Thu, 21 Oct 2021 00:03:56 +0000</pubDate>
				<category><![CDATA[datapipelinetypes]]></category>
		<guid isPermaLink="false">https://napaanalytics.com/?p=925</guid>

					<description><![CDATA[<p>As part of the Data Pipeline series, in part one and two of the series, we talked about data pipeline and components of data pipeline. The third part deals with types of data pipelines. The type of data pipeline is related to the need for fresh data. Data pipeline types are traditional (batch) and real-time. [&#8230;]</p>
<p>The post <a href="https://napaanalytics.com/dataengg/datapipelinetypes/">What are different types of Data Pipelines</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>As part of the Data Pipeline series, in part one and two of the series, we talked about data pipeline and components of data pipeline. The third part deals with types of data pipelines.</p>



<p>The type of data pipeline is related to the need for fresh data. Data pipeline types are traditional (batch) and real-time. Data pipeline types also define the architecture and underlying technology.</p>



<h3 class="wp-block-heading">Traditional (Batch) data pipeline</h3>



<p>Traditionally, data consumption is for business intelligence and data analytics. The metrics used in business intelligence reports and analytics rely on previous data or data that was generated a few hours earlier. Thus, the data pipeline used for these consumers is a Batch Data Pipeline. As part of the batch process, data is periodically collected, loaded, and transformed at a specified time  &#8211; could be once or more than once a day. Thus, the architecture and the technologies used for this data pipeline need to:</p>



<ol type="1"><li>Process large amounts of data</li><li>Normally the batch jobs are executed when there is not much activity going on in the source system</li><li>Flexibility on failures. There are options to rerun based on failure type and time allocation</li></ol>



<h3 class="wp-block-heading">Traditional Data Pipeline use case</h3>



<p>A large retailer(with online and brick and mortar presence) has infrastructure on AWS and uses Snowflake </p>



<p>as their centralized data warehouse that receives data from various systems, including their online store transactional data, physical stores legacy POS system, and the web clicks from their website.</p>



<p>The data pipeline that caters to the web analytics team is as follows:</p>



<ol type="1"><li>Data from all the sources is extracted into staging tables in Snowflake</li><li>Data from the staging table is loaded into the Snowflake data warehouse or to specific data marts that provide the end user behavior analytics and the features that describe the behavior</li><li>Data thus aggregated is used in the analytics that is sent to the web marketing team.</li></ol>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img fetchpriority="high" decoding="async" src="https://napaanalytics.com/wp-content/uploads/2021/10/image-15.png" alt="" class="wp-image-926" width="694" height="358" srcset="https://napaanalytics.com/wp-content/uploads/2021/10/image-15.png 694w, https://napaanalytics.com/wp-content/uploads/2021/10/image-15-300x155.png 300w" sizes="(max-width: 694px) 100vw, 694px" /><figcaption><strong>Batch Processing Data Pipeline</strong></figcaption></figure></div>



<h3 class="wp-block-heading"><strong>Real time analytics</strong></h3>



<p>Data pipelines supporting real-time analytics provide the data and the corresponding analytics as the data is generated, like working with the stream of data called Stream processing. Stream processing is about ingesting data and calculating the metrics and analytics on every piece of data as it is generated. Real-time analytics supporting data pipelines are mainly used in places where there is lot of sensor data used to understand operations and proactively identify potential failures</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="652" height="337" src="https://napaanalytics.com/wp-content/uploads/2021/10/image-16.png" alt="" class="wp-image-927" srcset="https://napaanalytics.com/wp-content/uploads/2021/10/image-16.png 652w, https://napaanalytics.com/wp-content/uploads/2021/10/image-16-300x155.png 300w" sizes="(max-width: 652px) 100vw, 652px" /><figcaption><strong>Real-time data pipelines</strong></figcaption></figure></div>



<h3 class="wp-block-heading"><strong>Real-time analytics use case</strong></h3>



<p>A large steel manufacturing company reduced the equipment downtime by actively analyzing sensor&nbsp;data from the machinery. At Napa Analytics, we used the following data pipeline architecture to achieve results for our client<a>:</a></p>



<ol type="1"><li>Text data ingested from all the machines using Kafka</li><li>Data from Kafka is fed into Apache Spark for calculation and analytics</li><li>Data from Apache Spark is stored in a database</li><li>Messages based on thresholds are sent to distro of engineers.</li></ol>



<p><strong>Near real-time analytics</strong></p>



<p>Real time analytics is not always possible. Sometimes, a compromise needs to be achieved. The compromise is what we can call near real-time analytics. In a sense, near real-time is providing data to the consumers with a time lag of 5 &#8211; 10 mins. The data pipeline structure is like the delayed (traditional data pipeline).</p>



<p><strong><em>Near real-time use case</em></strong></p>



<p>A large medical insurance provider has the need to look at the medical claims as they enter the system<a>.</a> At Napa Analytics, we used the following data pipeline architecture to achieve results for our client<a>:</a></p>



<ol type="1"><li>Claims data from Mainframe is read into Hadoop using Apache Flume</li><li>The data is loaded into Apache Kafka</li><li>Processing of the data (metrics and analytics) is done using Apache Spark</li><li>The output is stored in Apache Kudu</li><li>The tables in Apache Kudu feed the Micro Strategy reports</li></ol>



<p>From the three types of data pipelines we have examined, it is evident that data freshness is one of the deciding factors in the data pipeline you choose. If your organization needs support to select the right data pipeline suited to your needs, reach out to <a href="https://napaanalytics.com/contact/">Contact &#8211; Napa Analytics</a> </p>



<hr class="wp-block-separator"/>
<p>The post <a href="https://napaanalytics.com/dataengg/datapipelinetypes/">What are different types of Data Pipelines</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://napaanalytics.com/dataengg/datapipelinetypes/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
