Data transformation: The missing ingredient from edge computing strategies
It’s easy to get excited about the Things in the Internet of Things (IoT). Tiny computers, factory robots, devices with 3D cameras, devices with GPS units -- everything from telemetry units on long-haul freight trucks to gumball-size sensors reporting from atop the cooling towers of power plants. The world of network-connected devices, whether specialized or general purpose, is more varied and creative than ever.
The innovation taking place with this hardware is truly impressive. But to take full advantage of the hardware, it’s important to pay just as much attention to the software. Edge computing is going to quintuple the amount of data enterprises collect from devices in just the next three years. quintuple the amount of data that enterprises collect from devices in the next three years alone. How to manage that data and make the most of it -- well, that’s a pretty edgy job.
SEE ALSO:
Data Management and IoT
The success of many organizations’ IoT initiatives will hinge on data, data integration and data management. Fail to architect data integrations and transformations properly, and IoT systems will have trouble scaling and delivering the profound business transformations their vendors have promised. Mishandle the data of the thousands of IoT devices in use, and the business will underperform.
Here’s an example to illustrate my point.
Let’s say you’re using IoT devices to monitor production equipment on a factory floor and have thousands of these devices active across the entire assembly line. If any of these devices report an error, you need to analyze the error right away. Worst case, you might want to stop production until the error is corrected to prevent damage to the material in production, or even to the factory equipment. Finally, let’s imagine that these thousands of devices are reporting status every 30 seconds.
You have a couple choices here:
- You can have this multitude of devices each deliver their status messages to a cloud application used for trouble-ticketing across the organization. That means that twice a minute, thousands of messages will be sent to the cloud, and this cloud application, regardless of whatever else it is doing, will analyze the messages and attempt to respond in a timely manner. The response might entail sending an alert to a factory floor manager or could entail sending a command to a factory automation application.
- Alternatively, you can have the thousands of devices deliver their status messages to an edge gateway device installed on-premise. The gateway runs a lightweight trouble-ticketing application. If the status messages are fine, the application does nothing. If the status messages indicate a problem, the gateway takes action. It could alert a local factory automation application. It could also or alternatively open a trouble ticket with a cloud application.
From the perspective of IT performance, the second scenario makes a lot more sense (even though the first is even being championed by some major cloud software vendors). There’s nothing to be gained by sending all status OK messages to the cloud, they simply consume bandwidth. The second scenario dramatically reduces network traffic and handles everything else at the network edge.
Data Transformation
Now let’s make our example a little more complicated -- and here I’m going to draw on a real-world use case.
It is very typical for IoT scenarios to entail the collection of all, or most, of the data from a landscape of devices for the purposes of storing history and mining for insights via analytics tooling. It is also very common to have to transform that data, which is often done by the way of stored procedures since the common wisdom is to transform at the database. However, attempting to perform data transformations on persisted data while it is simultaneously and continuously being streamed into the database is not feasible for scenarios where the transformed data is needed in a timely manner -- all that creates is competition of computational resources.
Because of the volume and velocity of device data that is being persisted, it would not be atypical for a stored procedure to take a full minute or more. This would render many scenarios impossible if they depend on more timely transformed data from the database.
Instead of depending on data transformation to happen at the SQL database, can the same edge principle applied above be used to transform IoT data before it’s sent to the database, thereby distributing computational requirements and increasing the performance of data transformation?
In this case -- and in many others, too -- the answer is yes. The software was rewritten to run as data transformations on a high-performance run-time engine installed on an edge gateway. This better approach performs data transformations at the network edge.
The result? In one instance, simply transforming 100 rows in a database shrank from 60 seconds to 0.3 seconds when performed at the edge, a 200X improvement.
What’s more, testing shows that these improvements are highly scalable: running this transformation on 1000 records instead of 100 records took only 0.7 seconds.
For this particular factory, this optimization will enable them to extend their existing investment in their database by offloading many data transformations to the edge, which will also allow them to continue supporting the growing number of devices being added to their factory floor. And finally, increasing the opportunity for more timely detection of potential situations in production keeps operations fast and efficient.
Scalability Depends on Getting Data Transformation Right
These examples demonstrate the importance of getting data transformations right.
Here’s a general rule: In high velocity and high-volume data environments, instead of performing data transformations in a data lake or database using stored procedures, IT organizations should deploy integrations as close to the edge as possible. Performance enhancement is critical if organizations are going to scale up IoT and other edge devices in a manageable way in the coming years.
Edge computing devices are necessary. When data integration and data transformation are performed at the optimal time and location, you will get the most out of edge computing software with maximum business results.
Image credit: BeeBright/depositphotos.com
Michael Morton is the Chief Technology Officer of Dell Boomi, where he drives product direction and innovation. He has been leading and producing a wide range of enterprise IT solutions for over 25 years. Prior to joining Dell Boomi in 2013, Michael had an impressive career with IBM, where he became an IBM Master Inventor and worked directly with a number of Fortune 100 Companies. He was a founding developer and Chief Architect of IBM WebSphere Application Server, providing architecture leadership on the IBM InfoSphere data integration and IBM Tivoli systems management family of products.