The Benefits and Considerations of Big Data Implementations
With the proliferation of devices and sensors that generate data, today’s federal government agencies are faced with the difficult task of storing, managing and sharing a mountain of data and information so that those who need it have access to it.
However, the upside is that agencies now have the ability to analyze that data to identify relationships and actionable insights capable of making them more proactive and helping them to better, more effectively accomplish their missions.
We recently had the opportunity to sit down with Jim Smid, the CTO of Iron Bow Technologies – a leading information technology solution provider – to discuss what is enabling big data initiatives, how the federal government is using big data and data analytics today and what agencies have to consider when building out the networks and data centers that will drive their big data programs.
Here is what Jim had to say:
TechSource: Big data is a major trend across the entire federal government today. Can you provide our readers with some information about the ways various federal agencies are using – or are looking to use – big data to help them accomplish their mission?
Jim Smid: From a military perspective, there’s always usage for analytics. An easy use case that most people can understand or relate to deals with Iraq and the insurgents.
Big data and data analytics enable the military to identify relationships between what would seem like unrelated things. For example: an increase in cell activity and insurgent attacks. Using data and analytics, the military was able to see the relationship between those two seemingly unrelated things. They were then able to identify spikes in cellphone conversations, triangulate where those spikes were located and subsequently identify places where an attack was about to happen.
That’s just an example of how big data and data analytics can help the military. Today, you have sensor information coming from multiple places in an environment – whether that be from predator drones, sensors in the field, wearable technology or other data-generating devices. And the market for things like wearable technology is just exploding, so the sheer amount of data being generating is only going to increase.
With all of this data available, government agencies and military branches have to ask themselves some questions. How do I aggregate and store all of that data? How do I analyze it? How long do I need it for? Who needs to get it and what can they do with it?
TechSource: What about other agencies? Is this something limited to military and intelligence organizations, or are there other agencies using big data and data analytics in other ways?
Jim Smid: This isn’t just limited to the military. Take a look at what’s happening in healthcare today. The human genome project has led to entities looking to identify what potential health problems a person will have based on their genes – the propensity they have to develop diseases and chronic conditions. All of that is going to fundamentally change healthcare, and that change relies on being able to store, analyze and share data.
That will be of use for military healthcare organizations, the Department of Veterans Affairs and other healthcare-focused federal agencies.
Another trend that we’re seeing is in cyber analytics. So much is being gathered by the many tiers of security tools in a network. Cyber analytics finds the correlation between those things to preemptively identify cyber security threats.
For example, say Jim logs in from California and then – 30 minutes later – logs in from New York City. That’s physically impossible and is most likely coming from a bad actor. Using cyber analytics, we can identify user behavioral patterns and identify when they deviate from “normal activity,” helping agencies identify attacks and enabling them to respond.
Security tools are built to help us identify what has happened using forensics. These cyber analytics tools allow us to identify what is happening in real time and work to mitigate the damage. These are tools that every federal agency – and every private enterprise – will want to implement to keep their networks and sensitive information safe.
TechSource: What technological considerations do agencies have to make to accommodate big data and data analytics? What impact does network and data center architecture have on big data and data analytics initiatives?
Jim Smid: There are many flavors and elements of big data. I tend to think and talk about the analytics piece because it’s more interesting. But there are going to be times when you just need to ingest a large amount of data at once. Or, you’ll need to store a lot of data or house it for a long period of time. All of those things will have unique impacts on the infrastructure.
But, you also need to be able to ingest and process it quickly for the analytics piece. Things like Hadoop and other analytics tools that do parallel processing of information have really enabled agencies to tackle problems in different ways. This involves agencies investing in new technologies – many of which are Open Source – which also changes what the infrastructure looks like.
This is also impacting the architectures. For example, big data puts tremendous strains on storage systems. People had started moving away from direct attached storage, but the sheer amount of storage and speed needed, and the systems’ ability to account for data loss, has led IT administrators to begin utilizing it in their data centers again.
TechSource: What considerations – if any – should government agencies make when making data center and network hardware and architecture decisions if they’re planning on embracing big data?
Jim Smid: Take the example that Cisco used a few Partner Summits ago, when they were talking about the Internet of Everything and the proliferation of sensors. They talked about being able to take microscopic sensors and essentially till them into the earth where you’re going to be farming. Instead of guessing when you water and fertilize, these sensors give you information back about how much water the land needs and what fertilizers it needs, etc.
When you think about a field with all of these sensors in it, it wouldn’t be advantageous to take all of that raw data and send it all back to a data center to process. I want to take all of that data, process it and make decisions in real time. Do I need more water? Great, let’s turn on a sprinkler on this particular part of the field. What I probably want in the back of the hub is aggregated data – what I’ve watered, how much I’ve watered, etc.
This is what people mean when they say, “analytics at the edge.”
It’s not optimal to transport all of that data back to a centralized location to process. It takes much more bandwidth, which is very expensive, to push all of that data back to a central hub. However, you will need the processing power and storage at the edge to do the analytics. The thought of consolidating data centers and moving everything into centralized data centers runs contrary to this thought. But it’s not cost-effective or conducive to have to bring all data back to one place to analyze.