<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DataIngestion Archives - Napa Analytics</title>
	<atom:link href="https://napaanalytics.com/datapipeline/dataengg/dataingestion/feed/" rel="self" type="application/rss+xml" />
	<link>https://napaanalytics.com/datapipeline/dataengg/dataingestion/</link>
	<description>Cloud Data Engineering Talent On Demand</description>
	<lastBuildDate>Tue, 23 Nov 2021 12:50:06 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>Data Ingestion Methods</title>
		<link>https://napaanalytics.com/dataengg/dataingestion/</link>
					<comments>https://napaanalytics.com/dataengg/dataingestion/#respond</comments>
		
		<dc:creator><![CDATA[napalytics]]></dc:creator>
		<pubDate>Tue, 23 Nov 2021 12:50:04 +0000</pubDate>
				<category><![CDATA[DataIngestion]]></category>
		<guid isPermaLink="false">https://napaanalytics.com/?p=943</guid>

					<description><![CDATA[<p>A Data Pipeline is a set of steps that Extract, Load and Transform data for consumption by the end user. As part of the blog series on Data Pipelines, we spoke about Data Ingestion and the different open-source and commercial players. In this blog we will talk about different data ingestion methods The need for [&#8230;]</p>
<p>The post <a href="https://napaanalytics.com/dataengg/dataingestion/">Data Ingestion Methods</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>A Data Pipeline is a set of steps that Extract, Load and Transform data for consumption by the end user. As part of the blog series on <a href="https://napaanalytics.com/datapipeline/dataengg/dataingestion/">Data Pipelines</a>, we spoke about Data Ingestion and the different open-source and commercial players. In this blog we will talk about different data ingestion methods</p>



<h3 class="wp-block-heading">The need for Data Ingestion types</h3>



<p>In an earlier blog, we spoke about the different <a href="https://napaanalytics.com/datapipeline/dataengg/datapipelinetypes/">Data Pipeline types</a> and how the need for data defined the data pipeline. The approaches to data ingestion we are about to explore results from the end user&#8217;s need for speed to data consumption. The Data Ingestion methods are:</p>



<ol><li>Batch</li><li>Real-time</li><li>Lambda Architecture</li></ol>



<h3 class="wp-block-heading">Batch Data Ingestion method</h3>



<p>As the name suggests, the data is extracted from the source and moved to the destination at a specified time. The ingestion process could be once data or multiple times a day at a predetermined time. This method is preferred and is the most used ingestion method.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="1026" height="582" src="https://napaanalytics.com/wp-content/uploads/2021/11/image-4.png" alt="" class="wp-image-944" srcset="https://napaanalytics.com/wp-content/uploads/2021/11/image-4.png 1026w, https://napaanalytics.com/wp-content/uploads/2021/11/image-4-300x170.png 300w, https://napaanalytics.com/wp-content/uploads/2021/11/image-4-768x436.png 768w, https://napaanalytics.com/wp-content/uploads/2021/11/image-4-705x400.png 705w" sizes="(max-width: 1026px) 100vw, 1026px" /></figure></div>



<h3 class="wp-block-heading">Real-time Data Ingestion Methods</h3>



<p>Data ingestion in real-time, also known as streaming ingestion, is ongoing data ingestion from a streaming source. A streaming source can be social media feeds/listens or data from IoT devices. In this method, data retrieval and generation happen simultaneously before storage in the data lake.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="833" height="523" src="https://napaanalytics.com/wp-content/uploads/2021/11/image-5.png" alt="" class="wp-image-945" srcset="https://napaanalytics.com/wp-content/uploads/2021/11/image-5.png 833w, https://napaanalytics.com/wp-content/uploads/2021/11/image-5-300x188.png 300w, https://napaanalytics.com/wp-content/uploads/2021/11/image-5-768x482.png 768w, https://napaanalytics.com/wp-content/uploads/2021/11/image-5-705x443.png 705w" sizes="(max-width: 833px) 100vw, 833px" /></figure></div>



<h3 class="wp-block-heading">Lambda architecture-based Data Ingestion Method</h3>



<p>Lambda architecture is a data ingestion setup that consists of both real-time and batch methods. This setup consists of batch, serving, and speed layers. The first two layers index data in batches, while the speed layer instantaneously indexes the data to make it available for consumption. The presence and the activities from each layer ensure that data is available for consumption with low latency.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="1030" height="559" src="https://napaanalytics.com/wp-content/uploads/2021/11/image-6-1030x559.png" alt="" class="wp-image-946" srcset="https://napaanalytics.com/wp-content/uploads/2021/11/image-6-1030x559.png 1030w, https://napaanalytics.com/wp-content/uploads/2021/11/image-6-300x163.png 300w, https://napaanalytics.com/wp-content/uploads/2021/11/image-6-768x417.png 768w, https://napaanalytics.com/wp-content/uploads/2021/11/image-6-705x383.png 705w, https://napaanalytics.com/wp-content/uploads/2021/11/image-6.png 1033w" sizes="(max-width: 1030px) 100vw, 1030px" /></figure></div>



<p></p>



<h3 class="wp-block-heading">Summary</h3>



<p>Data Ingestion is the first step of the ELT process and das different methods of extracting data from the sources. The data consumption needs of the data users defines the data ingestion methods as either batch, real-time, or lambda architecture. In the next blog as part of the Data Pipeline series we will talk about the Data Storage layer</p>
<p>The post <a href="https://napaanalytics.com/dataengg/dataingestion/">Data Ingestion Methods</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://napaanalytics.com/dataengg/dataingestion/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Data Ingestion &#8211; Extraction part of a Data Pipeline</title>
		<link>https://napaanalytics.com/dataengg/dataingestion/</link>
					<comments>https://napaanalytics.com/dataengg/dataingestion/#respond</comments>
		
		<dc:creator><![CDATA[napalytics]]></dc:creator>
		<pubDate>Fri, 12 Nov 2021 11:25:40 +0000</pubDate>
				<category><![CDATA[DataIngestion]]></category>
		<guid isPermaLink="false">https://napaanalytics.com/?p=938</guid>

					<description><![CDATA[<p>In the previous blog series, we defined data pipelines, the types of data pipelines, and the data pipeline components. We identified the three main pieces of a data pipeline: Extract, Load, and Transform. In this blog, we focus on &#8220;Extract&#8221;, also referred to as Data Ingestion. Data Ingestion Data Ingestion is the movement of data [&#8230;]</p>
<p>The post <a href="https://napaanalytics.com/dataengg/dataingestion/">Data Ingestion &#8211; Extraction part of a Data Pipeline</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In the previous blog series, we defined <a href="https://napaanalytics.com/datapipeline/dataengg/datapipeline-basics/">data pipelines</a>, <a href="https://napaanalytics.com/datapipeline/dataengg/datapipelinetypes/">the types of data pipelines</a>, and <a href="https://napaanalytics.com/datapipeline/dataengg/components/">the data pipeline components</a>. We identified the three main pieces of a data pipeline: Extract, Load, and Transform. In this blog, we focus on &#8220;Extract&#8221;, also referred to as Data Ingestion.</p>



<h3 class="wp-block-heading">Data Ingestion</h3>



<p>Data Ingestion is the movement of data from different data sources to a storage destination for further processing/analysis.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="660" height="352" src="https://napaanalytics.com/wp-content/uploads/2021/11/image-2.png" alt="" class="wp-image-939" srcset="https://napaanalytics.com/wp-content/uploads/2021/11/image-2.png 660w, https://napaanalytics.com/wp-content/uploads/2021/11/image-2-300x160.png 300w" sizes="(max-width: 660px) 100vw, 660px" /></figure></div>



<p>In the past, most of the data sources were structured making the data ingestion simple connection using JDBC/ODBC and extracting the data. With the increase in the number and variety of data sources, data ingestion has become complex. Fortunately, there are many open-source and commercial tools that take away the complexity and make it easier to extract data from a wide variety of data sources.</p>



<h3 class="wp-block-heading">Data Ingestion tools</h3>



<p>Data Ingestion tools are software products that gather and transfer structured, semi-structured, and unstructured data from the source to staging layer. The tools provide connectivity to diverse data sources, automate data movement, and monitor the movement. There are two categories of Data Ingestion tools:</p>



<ol><li>Open-source tools &#8211; With the increased use of Apache HDFS(Hadoop Distributed File System) is an open-source tool to store large amount of data. There are multitude of Apache open-source projects for ingesting, loading, and transforming data into HDFS and cloud storage. These tools are free to use, have a large community of developers that add to and support the feature sets.</li><li>Commercial tools &#8211; These are software companies that have been in the data space and have evolved to provide the connectivity, security, UI/UX, and ease of use to ingest data from different data sources. </li></ol>



<p>The figure below shows a subset of open-source and commercial solutions for data ingestion.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="682" height="361" src="https://napaanalytics.com/wp-content/uploads/2021/11/image-3.png" alt="" class="wp-image-940" srcset="https://napaanalytics.com/wp-content/uploads/2021/11/image-3.png 682w, https://napaanalytics.com/wp-content/uploads/2021/11/image-3-300x159.png 300w" sizes="(max-width: 682px) 100vw, 682px" /></figure></div>



<h3 class="wp-block-heading">Prevalence of Open-source tools </h3>



<p>Most of the ELT data pipelines use open-source tools for data ingestion. Open-source tools started to support Apache Hadoop and later Sqoop and Flume were added to extract data from structured data sources. More connectors were added as the number and variety of data sources increased. In addition, the need for real-time data led to open-source tools such as Apache Kafka, Apache Samza, Apache Nifi.</p>



<h3 class="wp-block-heading">Commercial tools usage</h3>



<p>These tools cater to users who do not have the depth of technical expertise that is required by the open-source tools. These tools provide drag-and-drop functionality and the risk management that is needed by large firms.</p>



<h3 class="wp-block-heading">Which is better: open-source or commercial tools?</h3>



<p>Commercial tools with their functionality, UI/UX, and support are welcome in most major organizations. However, open-source tools are catching up in functionality and UI/UX. Organizations are noticing the improvements in open-source tools, and the broad community support and their development teams are moving more towards open-source</p>



<p></p>



<h3 class="wp-block-heading">Summary</h3>



<p>Data Ingestion takes data from different data sources and loads into staging layer. There are open-source and commercial tools that are available for data ingestion. Even though commercial tools provide the support and ease of use, open-source tools are catching up and are becoming important players for Data Ingestion.</p>
<p>The post <a href="https://napaanalytics.com/dataengg/dataingestion/">Data Ingestion &#8211; Extraction part of a Data Pipeline</a> appeared first on <a href="https://napaanalytics.com">Napa Analytics</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://napaanalytics.com/dataengg/dataingestion/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
