- Topics
- Understanding enterprise integration
- What is streaming data?
Published September 27, 2021 •
Overview
Streaming data is the continuous flow of real-time information, and the foundation of the event-driven architecture software model. Modern applications can use streaming data to enable data processing, storage, and analysis.
One way to think about streaming data is as a running log of changes or events that have occurred to a data set—often one changing at an extremely high velocity.
The large, fast-moving data sets that can be sources of streaming data are as varied as financial transactions, Internet of Things (IoT) sensor data, logistics operations, retail orders, or hospital patient monitoring. Like a next generation of messaging, data streaming is suited for situations that demand a real-time responses to events.
One example of streaming data is event data, which forms the foundation of event-driven architecture. Event-driven architecture brings together loosely coupled microservices as part of agile development.
Read more about Apache Kafka
Why is streaming data important?
Application users expect real-time digital experiences. Apps that can consume and process streaming data raise the level of performance and improve customer satisfaction.
Traditionally, applications that needed real-time responses to events relied on databases and message processing systems. Such systems cannot keep up with the torrent of data produced today. For example, traditional request-driven systems can struggle to react quickly to fast-moving data requests from multiple sources.
With an event-streaming model, events are written to a log rather than stored to a database. Event consumers can read from any part of the stream and can join the stream at any time.
Event stream processing can be used to detect meaningful patterns in streams. Event stream processing uses a data streaming platform to ingest events and process or transform the event stream.
What are some common use cases for data streams?
When you think of streaming data, think of real-time applications. Some common use cases include:
- Digital experiences that rely on immediate access to information.
- Microservices applications that support agile software development.
- Streaming scenarios that modernize database-driven applications that were previously driven by batch processing.
- Real-time analytics, especially ones that ingest data from multiple sources.
- Edge computing that brings together data from diverse and disparate devices and systems.
Apps built around messaging, geolocation, stock trades, fraud detection, inventory management, marketing analytics, IT systems monitoring, and industrial IoT data are some popular use cases for data streams.
How does Apache Kafka work with streaming data?
Apache Kafka is an open-source distributed messaging platform that has become one of the most popular ways to work with large quantities of streaming, real-time data.
Software developers use Kafka to build data pipelines and streaming applications. With Kafka, applications can:
- Publish and subscribe to streams of records.
- Store streams of records.
- Process records as they occur.
Kafka is designed to manage streaming data while being fast, horizontally scalable, and fault-tolerant. Since Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce, and many others.
Apache Kafka can handle millions of data points per second, which makes it well suited for big data challenges. In many data processing use cases, such as the IoT and social media, data is increasing exponentially, and may quickly overwhelm an application based on today's data volume.
What are some of the challenges of data streaming?
By definition, data streams must deliver sequenced information in real time. Streaming data applications depend on streams that are consistent and highly available, even during times of high activity. Delivering and/or consuming a data stream that meets these qualities can be challenging.
The amount of raw data in a stream can surge rapidly. Consider the sudden exponential growth of new data created by stock trades during a market selloff, social media posts during a big sporting event, or log activity during a system failure. Data streams must be scalable by design. Even during times of high activity, they need to prioritize proper data sequencing, data consistency, and availability. Streams also must be designed for durability in the event of a partial system failure.
Across a distributed hybrid cloudenvironment, a streaming data cluster demands special considerations. Typical streaming data brokers are stateful and must be preserved in the event of a restart. Scaling requires careful orchestration to make sure messaging services behave as expected and no records are lost.
Why use a streaming data service?
The challenge of delivering a complex, real-time, highly availablestreaming data platform can consume significant resources. It often takesexpertise and hardwarebeyond the capabilities of an in-house IT organization.
For these reasons, many streaming data users opt for a managed cloud service, in which infrastructure and system management is offloaded to a service provider. This option helps organizations focus on their core competencies, rather than management and administration of a complex streaming data solution.
More about integration
Products
A comprehensive set of integration and runtimes technologies engineered to help build, deploy, and operate applications with security in mind and at scale across the hybrid cloud.
Hosted and managed platform, application, and data services that streamline the hybrid cloud experience, reducing the operational cost and complexity of delivering cloud-native applications.
A set of products, tools, and components for developing and maintaining cloud-native applications. Includes Red Hat AMQ, Red Hat Data Grid, Red Hat JBoss® Enterprise Application Platform, Red Hat JBoss Web Server, a Red Hat build of OpenJDK, a Red Hat build of Quarkus, a set of cloud-native runtimes, Migration Toolkit for Applications, single sign-on, and a launcher service.
A comprehensive set of integration and messaging technologies to connect applications and data across hybrid infrastructures. Includes Red Hat 3scale API Management, Red Hat AMQ, Red Hat Runtimes, change data capture, and a service registry.
Related articles
- What is integration?
- What is a service registry?
- What is an event mesh?
- What is Apache Kafka?
- What is change data capture (CDC)?
- What is event-driven architecture?
- REST vs. SOAP
- Why choose Red Hat for agile integration?
- Why run Apache Kafka on Kubernetes?
- What is an IDE?
- Understanding middleware
- What is middleware?
- What is application integration?
- Why choose Red Hat for middleware?
- Understanding APIs
- API security
- What is an API?
- What does an API gateway do?
- What is a REST API?
- What is API design?
- What is API management?
- What is API monetization?
- What is GraphQL?
- Why choose Red Hat for API management?
- What is streaming data?
Resources
E-book
Create an agile infrastructure—and enable an adaptive organization
Keep exploring
ANALYST MATERIAL
Key questions to ask when modernizing integration capabilities
Training
Free training course
Red Hat Agile Integration Technical Overview