Databases
Fast Data Ingestion, ML Equates to Smarter Decisions Faster
March 27, 2018 | Written by: Steven Astorino
Categorized: Analytics | Databases | IoT | Machine Learning
Share this post:
Human beings tend to filter out events they deem unimportant to avoid sensory overload. They can only process so much at any given time.
Computer systems, however, must be able to handle a massive number of digital “events” – everything from changes in your car’s engine to millions of retail transactions – in real time or near-real time to help support a wide and growing range of applications and customized services. Financial applications have to be able to monitor events to help counter fraud and provide insights to investors. Retail apps have to be able to use online shopping events to capitalize on growth opportunities and ensure effective supply chain management. And manufacturing production lines and energy systems have to use event data from Internet of Things (IoT) devices to automate faster, smarter responses to changes that demand attention. These are just a few examples where event processing can potentially be applied with great effect.
Enter a New Kind of Data Store
With the high velocity, volume and variety of data that digital events can generate, today’s “event data store” must be able to deliver things like, fast data ingest rates, in-memory indexing for fast and efficient lookups, near real-time analytics on all ingested data with online analytical processing (OLAP), and much more. That’s exactly what’s behind IBM’s Db2 Event Store.
Three critical components of the solution include: ingest, analytics and availability.
- Ingest Car sensors, home appliances, credit card purchases, mobile transactions and aircraft flight systems generate volumes of events, many at high velocity in order to support automated or augmented systems. This necessitates the efficient ingestion of vast amounts of data. IBM Db2 Event Store achieves this on the order of a million inserts per second[1] through numerous messaging systems such as Kafka, Spark, IBM Streams, and other vendors’ streaming solutions. Support for industry standards and open APIs such as Scala and Python are necessary to make use of existing and available skill sets and democratize event streaming and processing. Furthermore, Db2 Event Store writes its data using the open Apache Parquet format, allowing it to natively interact with open source tooling.
- Analytics With Db2 Event Store, people can access the latest ungroomed data because queries can directly access the optimized event nodes and their associated cached data in the cluster. Even if minutes-old, users are able to access and query that stored hardened data using “vanilla” spark nodes and use compatible analytics tools of their choice. Ultimately, queries are able to retrieve the most recent data and combine it with groomed data in cache or in the storage layer.
- Availability High availability for all data, event store processes and stores is vital. Technology often fails when users least need it to. With IBM Db2 Event Store, data, as well as all associated log data, is sharable through replication across nodes for redundancy reasons. Should a node-failure occur, queries can continue to be processed, so the configured number of query replicas are always obtainable. In addition, Db2 Event Store provides sophisticated management and monitoring capabilities to help provide insight into the health of the system.
Event-driven AI with machine learning
Consider the impact of AI, machine learning in particular, on event processing. Machine learning is best facilitated through the availability of large quantities of data. IBM Db2 Event Store can capture, analyze and store more than 250 billion events per day. This can enable people to apply machine learning to the most recent data along with the historical data.
In addition, thanks to the tightly coupled and built in machine learning capabilities, IBM Db2 Event Store helps automate the entire end-to-end pipeline from the ingest phase, data preparation, real time analytics, to the apply stage providing the most up to date real time and most insightful data for applications to build on.
Each time an event occurs, the system “learns,” processes and reacts rapidly to events – as they happen. Event processing coupled with machine learning helps enable applications to become “aware” of what is happening, to the point it could potentially help predict when similar events might recur. They thus help protect against potential impending fraud or disaster by correlating previous events and outcomes.
There are many scenarios where this linkage between event processing and machine learning can be used, faster and more often than human beings can ever process. Having intelligent and “aware” event-driven system, like Db2 Event Store, augmenting one’s own capabilities is like having a personal, trusted adviser or assistant.
How to get started
These combined capabilities add up to a tall order for an event streaming and processing solution. Fortunately, IBM has infused its event processing capabilities with advanced machine learning technologies to deliver the IBM Db2 Event Store, which is available in developer edition and enterprise edition. The technology is also a core element of the newly announced IBM Cloud Private for Data, an engineered solution for doing data science, data engineering and application building, with no assembly required. As an aspiring data scientist, anyone can find relevant data, do ad-hoc analysis, build models, and deploy them into production, within a single integrated experience.
With Db2 Event Store, people can start using the developer edition on a single node, or the enterprise edition for pre-production use and testing at no cost.
In this high-speed data world of IoT, mobile, social, and more, users will continue to demand more sophisticated and integrated capabilities, such as fast data ingest, machine learning and analytics from their databases. The IBM Db2 Event Store is sure to play a pivotal role in the movement providing people with faster insights to make faster, more accurate business decisions.
[1] Based on an internal 5 hour test February, 2018, producing over 3 million inserts/second across three nodes. The test was performed on three servers, each with: CPU: Quad Intel Xeon E7-4850 v2 (48 Physical Cores, clock speed 2.30 GHz); Memory: 256 GB RAM; Network adapter: 10 Gbps; Compute: Five SSD drives: 1.7 TB SSD (3 DWPD); Storage: Five HDD drives: 8.00 TB SATA; OS: RHEL 7.4.
__________________________________________
A version of this story first appeared on IBM Big Data & Analytics Hub.
__________________________________________
Related:
Rob Thomas: Empowering the New Data Developer
Follow Steven on Twitter @astorino_steven
Vice President, Development, Hybrid Cloud and z Analytics, IBM
Watson Anywhere: The Future
(Part 3 in a Series) There’s a paradox in the world of AI: While it’s the largest economic opportunity of our lifetime (estimated to contribute $16 trillion to GDP by 2030), enterprise adoption of AI was less than 4% in 2018. A recent Gartner survey said that the 4% in 2018 has now grown to […]
The 3 Beachheads of AI
(Part 2 in a Series) We have been partnering with and assisting clients on their data needs and strategies for years. It’s clear that data and AI are two-sides of the same coin; in fact, this understanding spawned the AI Ladder concept. We’ve developed skills training in the areas of data science and machine learning; […]
IBM Watson: Reflections and Projections
(Part 1 in a Series) AI has gone through many cycles since we first coined the term “machine learning” in 1959. Our latest resurgence began in 2011 when we put Watson on national television to play Jeopardy! against humans. This became a cornerstone event, demonstrating that we had something unique. And we saw early success, putting […]