by Poojitha Jayadevan

LogiCloud scales up its IoT data processing capacity 100x in the cloud

Feature
Jun 23, 2021
Cloud ComputingIoT SecurityTransportation and Logistics Industry

When business is booming, capacity problems are looming: Success in recruiting new customers created a challenge for the WebXpress IT team.

India, mapped on a dark globe.
Credit: 1Xpert / Getty Images

After the first lockdown in 2020, many companies realized they needed better technologies to support their logistics operations. That was good news for logistics service providers like WebXpress—but also bad, as the sudden influx of customers put a great strain on infrastructure.

“Logistics was first to open” after lockdown, says Apurva Mankad, founder and CEO at WebXpress. “More companies started using our service after the pandemic because they were under a lot of uncertainty and tracking of vehicles became more important.”

WebXpress offers a number of logistics services, including LogiCloud, a platform that tracks cargo vehicles by collecting real-time data from GPS devices. It connects over 150 logistics service providers to retail and manufacturing companies, processing more than 2 million orders every month. It provides its clients with tracking details including origin, destination, current location, type of cargo, estimated arrival time for each vehicle, and also sends alerts as per customer requests.

The company started out with a small number of customers each with just a few vehicles to track but then, says Mankad, “We then started onboarding large logistics customers with almost 800 vehicles each that needed to be tracked. We had multiple such customers onboarded. When we started taking their data, the amount of data really jumped multifold.”

There was something else amping up the demands on infrastructure, too: One customer wanted to collect data from its vehicles every 15 seconds, rather than every two or three minutes. “Even for the same set of vehicles, the data multiplied. That’s when the data shot up and we had to rush to invent a new solution,” says Mankad. The number of datapoints from the Internet of Things (IoT) the system had to ingest grew from 50,000 each day to 5 million, up one hundredfold.

Ups and downs in the flow of data

The control tower for the LogiCloud service ran on Microsoft SQL Server, collecting all the IoT data into temporary tables for processing, but that system was fast approaching its limits, says Shashank Trivedi, co-founder and data scientist at LogiCloud: “The number of devices kept increasing and it started becoming difficult to process data load and create events.”

Since it was already working with SQL Server, the company turned to Microsoft for help. “We discussed this use case with the Microsoft team, and they advised us on implementing a scalable infrastructure that’s based on PaaS components,” he says.

The LogiCloud team followed the advice, deploying Azure Streaming Analytics to improve IoT processing to handle the increase in data volume.

The first PaaS component to be deployed was the Event Hub, a solution by Microsoft that can listen to millions of devices at the same time, says Mankad. “We customized this to make it receive data only from the devices we needed. This infrastructure is scalable: It can expand and contract as we need. Suppose tomorrow we lose a big customer, then the number of vehicles that need tracking reduces. I should be able to reduce the infrastructure cost at that time.”

After the data is received by Event Hub, it has to be processed. Previously, the data collected was stored on the server and a scheduled program processed it, but this system failed when the data volume increased. 

Now, the data is sent to the second PaaS component, Stream Analytics, for real-time flow analysis. Stream Analytics analyses data in pipeline. The processing logic then determines events and creates alerts.

“Most of the resources are charged on an hourly basis and some of them are based on the size of data and the processing power used,” says Trivedi. “Stream Analytics… is charged on the basis of streaming units and the clusters that are formed to analyze the IoT data as it flows,” he says. This data is then stored in Azure Tables and Azure Blobs for long-term use.

When LogiCloud finally shifted from its old SQL Server model to Azure, it expected all its problems to be solved—but a new one awaited: The data gathered every 15 seconds did not arrive every 15 seconds!

The GPS devices on the vehicles transmit their location at regular intervals via mobile networks—unless, as is sometimes the case in India, they are in an area that doesn’t have reliable mobile coverage. In that case, the devices store their location in onboard memory and, as soon as the mobile network is available, push all the stored information to the cloud in one burst.

Mankad explains the problem: “It might go quiet for almost three minutes, and suddenly all the data comes in five seconds. Bursting of data was the biggest challenge. We sometimes get 10x data in 20 seconds. The uneven flow of data was tackled by algorithms developed in-house to even it out as far as our customer’s visibility is concerned.”

Developing skills

Trivedi led the project, along with a technical architect who sat down with the Microsoft team to devise this entire solution. There were three software developers who were also strong database developers with expertise in data analytics. The entire team pieced together all the components: Event Hub, Stream Analytics, Azure Tables, and the front end. The team had to write Python code and SQL scripts. There were no special recruitment requirements, although, the team was trained for this project. The developers did Microsoft workshops with help from subject matter experts as well.

The primary work of the project, which includes moving from old to new infrastructure, and the pilot project took three months. “A number of issues were faced to make certain components work. We also had to make certain difficult decisions about what data to store on what infrastructure. We had to experiment many times. There were many cases when this burst issue happened. Initially, our understanding was that we would get data only at a certain rate. And that’s exactly what did not happen. So it took two more months to stabilize, On the whole, it took five months,” says Mankad.

The implementation of the project helped WebXpress retain customers, who were not happy initially as there was no data scalability. The business team agreed to take data every 15 seconds, not realizing the amount of stress it would put on the existing product. “Then we saw this as an opportunity because if this is what one customer is demanding, this is what the market will need,” Mankad says.

The project has enabled WebXpress to collect data at shorter intervals and this has helped them monitor vehicles better. It can now create and send alerts in case of speed violations, route deviations, or when the vehicle crosses a customer-defined geofence. “The customers are happy because now they are getting value on their investment in these GPS devices. What they initially got in return was only a Google map kind of interface. This has become a game-changer and helped us in closing more sales,” says Mankad.

There are many things Mankad would have done differently if he hadn’t been under the same pressure of time. “We kind of did an on-the-fly learning. We learned and deployed the technology at the same time because we had customers to satisfy. That leads you to make mistakes. If I could do it again, and if I had the time, I would architect it better. That could have reduced the number of iterations we did on this project,” he concludes.