CIA - big tasks and big data. Towards a Global Information Cap
Ira Gus Hunt, current CIA Director of Technology, talks about her vision of Big Data at the CIA, as well as the challenges that arise and how to solve them. The performance took place at the GigaOM Structure: Data 2013 conference, held March 20 in New York. According to eyewitnesses, this was one of the most interesting and memorable performances.
If you do not applaud our next speaker, he will certainly raise your personal file and make a note there. Mr. Gus Hunt - Director of Technology, Central Intelligence Agency. He's going to talk about those big challenges associated with processing Big Data for the CIA. Welcome to the scene, Mr. Hunt.
IRA GAS HUNT 00:22
Since I was the only one left between you and dinner, I’m not very sure that I would like to be here, but we will see if I can keep your attention. My name is Gus Hunt and I am the Director of Technology at the CIA (Chief Technology Officer) and would like to talk with you about the things that you have been listening to all day here. I will try to tell you how everything that happens in the world looks from our point of view, why it is important for us, and then, in our opinion, that we should change so that we, and, I believe, the entire private sector , managed to get the advantage of using Big Data.
If you think about the world that once was, then there were Clouds in it. That was three years ago. We are now at the point where Big Data is located, so for the past year we have been reading all of these articles that are breath taking. And more - glossy covers. I already expect Big Data to become Man of the Year in Time. This year, we are seriously talking about how we will be able to get value from existing materials, and I have already heard a lot of talk and opinion on this matter ...
In case you don’t know how we earn our bread, then The CIA has three main areas of "business."
We collect information about the plans and intentions of our opponents. We are doing a comprehensive analysis of the sources, where we combine the freshly collected information with what we already have in our hands, after which we can tell the President, the Secretary of State Security, those who make politics, and also everyone else what all this means. And the third thing we do - and we are the only agency to which this is permitted by law, with the knowledge of the President of the United States - is covert operations. These are the three areas for which we are responsible.
About four years ago, when I was appointed to the post of director of technology, we somehow sat and talked: “What should we have to be confident in our future” and we came to what I call our four big tasks .
The big task at number one, which arose four years ago, concerned Big Data and our ability to take advantage of the large information flows arising on the planet. This is necessary so that we can understand what is happening in them and protect national security. This is what we are doing.
Number two - and this was even before the conversation about sequestration and other things - the fact that we have a certain responsibility to you - taxpayers, and you must be sure that we spent every dollar as efficiently as possible. But when we think about efficiency, it is not a matter of lower cost. This is the best value proposition, and for us, “value” is defined as results divided by cost and time. Better results in less time and less money give babout the greatest value.
Third - and sometimes we intentionally emphasize this - that we must interact together as a community, despite the fact that you have to read that everything is working incorrectly, that we are not sharing information, and all that. This is actually not the case. We are doing our job well. And like any organization similar to those that exist in private business, we consider problems from different points of view and angles, and sometimes this gives rise to small discussions about the most correct way to resolve issues that we have to face.
Number four is the staff. If we do not have talents - people with the abilities we need, we will not be able to fulfill the tasks that we set for ourselves.
Then we announced that in order to achieve these goals, we will need to have a solid foundation in which we are going to make investments. We have brought together six key technologies to solve our problems, and we are going to invest in these technologies for a long time.
All this is necessary so that we have confidence that we are a viable and competitive organization, looking to the future.
These are quite simple things and they are well known to you, but safe mobility for us is a topic of great importance. Mobile technology is not safe. Repeat after me: Mobile technology is not safe. And indeed it is. How are we going to make them safe in our environment so that we can benefit from them? This is a serious task.
The second thing cited here is what we call advanced analytics. In fact, we consider analytics as a service. By this we mean everything that we need to do with Big Data - to do the work necessary to maintain the security of our nation.
The third thing we have is widgets and services. We approached this using such a thing as the Ozone Framework. Ozone is a framework that the intelligence community has developed based on the Google framework. The main reason we use it coincides with the reasons why you use your smartphones, iPads, and other gizmos. You can personalize them and put on them all sorts of different things necessary for your business or personal life. We need to create an environment where our analysts and operators and other employees can place the functionality they need and personalize their world. We can call it WebTop, or device-top, or something else, as you wish.
Fourth - that, by the way, on the slide goes under number three, and I do not really want to explain the oddities of such a calculus system - this is a long story. So, the fourth; security is a service. We do not want you to have to rebuild the security system from top to bottom every time you supply us or create a new system for us. We need to have a set of services, and best practices from the world of the architecture of old services. Does everyone remember that world? I bet I remember. These are security services, above which there are widgets and analytics at the top level, they communicate with security services in the middle, which interact with the infrastructure for computing and other low-level things. So - security services and stuff have a lot in common with each other, and we want to be sure of them, that they are constant throughout their entirety for any person who has access to any data element accessible through any analytical system. And these measures should also be provided through one of the security services.
The fifth is data. I am going to talk about this in more detail. I immediately remember - "this is data, you fool." We have a concept of using data as a service and such a concept, which we called 'data bay'. The data bay is not a well-defined place, but we plan to collect powerful calculation engines there, similar to those that you saw in the exhibition hall. We found (or at least believe that it is) that all analytics above a certain level often use common sets of large, high-performance settlement infrastructures hidden inside.
We want to create an environment in which all our data and massive computing infrastructures will be present, so that it is easy for us to work out new ideas or our new skills at the top level, setting in motion what we have below. To do all of these things, you need a lot of computing power and this little fun little thing is called the Cloud.
Have you asked yourself how much 'a lot' means? We do this all the time. I want to quickly go over how big the concept of 'big' is in Big Data. You all know Google. Google is a great provider of all sorts of interesting things. Google stopped reporting its size, at least, as we were able to find out, about four years ago in their 2009 or 2010 SEC documents.
At that time, they said they had about 100 petabytes of data, more than a trillion URL indexes. This is quite a lot.
Facebook As you know, Facebook, in August last year, exceeded a billion users, so now there are more than a billion of them. I found one interesting thing - the latest figures show that approximately 35% of all world photos are posted on Facebook.
Youtube We believe that Youtube is the only exabyte-sized or larger repository that can be found on the planet, at least in its public sector. According to the latest documents that came to us, the size of Youtube was about 768 petabytes. If you roughly calculate how much data is being added to Youtube, you will find that three or four years ago, Youtube was larger than exabyte in size.
World population. If you return approximately in the month of April, you will find that the population has slowly exceeded the seven billion mark.
Everyone is talking about Twitter and how great Twitter is. Twitter has about 124 billion tweets per year, 4,500 per second.
But Twitter is just a squeeze compared to the global short message system SMS, which transfers about 193,000 units per second. Of which 190,000 my daughter is gaining [laughter]. I have accounts from the operator, I can confirm this.
But even this is a little compared to the number of cell phone calls in the United States. In the United States alone, 2.2 trillion minutes of negotiations per year — 19 minutes per person per day — are taking place, which I find incredibly small, unless, of course, again to use my daughter as an average estimate. About two orders of magnitude less than it should be, but if all this is shifted to the usual estimates of the data, then this is another Youtube per year.
What makes this all happen? I think you know all this. There are three fundamental driving reasons for the past few years, as well as one small curious thing - Social Mobile Cloud. It was she who brought us most of the Big Data. In the social world, things spread very quickly like viruses, and therefore they need an information space that is elastically scalable, in significantly greater limits than it was originally intended when the Cloud was just beginning to exist. Everyone wants to be in society and exchange information. All this, considered as a whole, creates what we are talking about - Big Data.
There is a substantial increase in the speed of innovation. You can ask any of you who have start-ups: have you ever visited your investment companies, except in some special cases, and told them that you were going to buy a bunch of iron, hire a crowd of admins to it, and after that you will start work? Has anyone done this? Hardly ... And what do you usually do? You go, get your credit card, buy services from Amazon or Rackspace, or something like that - and get the power, and begin to do your job. You start a project quickly, very cheaply and you can concentrate on your task and not think about the underlying infrastructure.
For our world, this means that Social Mobile Clouds significantly accelerated social communication in ways that we did not expect, and I believe that they did not exist at all before the advent of these technologies in real life. A classic example is the Arab spring. The ability of groups of citizens who participated in the Arab spring to keep in touch, despite totalitarian governments in every possible way trying to hinder them, made it possible for the processes and protests of the Arab spring to develop, and be that as it may, come to fruition, which we are going to see after a while. But we are still trying to understand what all this means.
Most importantly, in our world, such a thing as Social Mobile Cloud has completely changed the flow of information on an entire planet. When I started working at the CIA many years ago as an analyst, the world was simple enough. Speaking in terms of information flows, it was a movement from several-to-many. There were NBC and CNN, also the Soviet TASS and the American Times, and the Washington Post. What you were doing was a classic model, when several generators of information told the rest what and how they needed to think, and things spread like that. The Social Mobile Cloud has turned this model upside down and moved on to the complex many-to-many model, and of course I have to say that we really like the multi-to-many model [ laughter]. To get an advantage in this model was quite simple. After all, what is interesting is that when everyone talks and exchanges information, despite the high noise level, there is a useful signal that we need to find. And this, I suppose, is one of the big problems of Big Data in the world: how to find a signal in the ever-increasing oceans of noise.
If you think that it is difficult and you think that you know it, they talked about it here; the guy who cares for health at Aetna and the others who talked about this a bit earlier are three more emerging forces: Nano, Bio, and Sensors.
You are already a walking platform for sensors, and I hope you know that. Your mobile devices - your smartphone, your iPad, so that you aren’t there yet - everyone has many of these gizmos. I think that there is a closed list of what is installed inside these devices and what occurs inside these spaces. As you walk around the neighborhood like a mobile touch platform - and remember, I told you that your devices are not safe - you should be aware that some may know where you were all the time, because You have a mobile device. Even if your mobile device is turned off. Hope you know that. Yes? Not? If not, then you should know this [laughter]. Because it is really important.
Suppose you were once a Star Trek fan - like I was when I was a kid, and now imagine that your mobile platform, your smartphones, turn into your Communicators, become your Trikorders and, finally, become your Transporters. How do you get on a plane today? Would you like to go through with a piece of paper, as I do, because in the place where I work, mobile devices are not very encouraged? Or you will go through a small symbol in front of which you will make a movement with your hand, and this magic thing will take you wherever you want.
It can also become your mobile platform that monitors your health. Right now you can buy additional devices for your pacemaker that will monitor your blood sugar, control insulin and other healthy things. The healthcare industry itself is very persistently looking for ways in which it can do remote monitoring of your health, so that they can always do what is happening to you and your body, and then they can remotely fine-tune your problems. You think: Gus speaks very quickly - so, I’m very worried that someone is going to hack my remote settings remotely and speed up my little pacemaker so that I can talk to you even faster. And that’s exactly what we have to worry about if you think that cyber attacks as they arise are not only directed against your business. In the end, they can be directed against you and your health. And if you do not take precautions, you will have serious risks.
In fact, if you are thinking about your touch platform, there is a small cool program - Activity Tracker. This is a small program for Android. Are you familiar with her? The program usually uses your triaxial accelerometer on your phone to collect data. Although, I actually have Fitbit. You know about Fitbit, right? This is an ordinary simple three-axis accelerometer. We love these gizmos because they don’t have .... However, I will not go deep into the specifics here [laughter]. What usually happens: they collect information, and from the data that can be collected with high accuracy, you can determine your gender, find out your height - are you tall or short, are you heavy or not, but more surprisingly, it can all be set by your walking style - by how you move when you walk.
But actually it can be a really good thing. Imagine this is a security program. If you go somewhere and you need access to your bank code, it may be a little easier, because the bank will know with absolute accuracy that you are you, setting it according to your walk and after that they will allow you to conduct operations in the bank . On the other hand, if you don’t want to reveal yourself or you want to protect yourself, you don’t want someone to know what your walk looks like, so that no one can understand where you have been all this time.
What is curious, as you begin to bring all these things together, the inanimate becomes rational. We are already seeing this happening. IBM talks about their Smarter Planet project. Google has a car that drives on its own. You already have a technique that knows what you need - you could see it at the last CES. Have you not read an article about a refrigerator reading products? He does this as you put or take them out, and then sends you mail on your smartphone: “Buy milk”. I paint for myself a somewhat gloomy picture of the future: Friday evening, I am very tired, I worked until late, I get into my self-driving car, I say "take me home" and where will it take me? In a safe manner, going round all the obstacles she takes me for damn milk [laughter]. Why? Because she knows better that you, in the end, will need milk! [laughter]. So, of course, there are a number of good things here, but some things may not be so wonderful.
But still, when you combine everything together, it usually works well, because if you think about it, the potential of these things is unbelievable. And you know that too. Radical improvements in traffic control - the ability to dynamically change the route, so you can optimize your time and save gasoline or anything else - that's great. We have already talked about the involvement of society, it also helps us to be green (automatic transport management), and we have already talked about how cool this is.
Crime Prevention. Probably everyone saw the last article in which the British conducted a study - in London, which is considered the city with the largest number of cameras on the planet - and an argument in favor of placing cameras, such as the fight against crime. Do you know how many crimes they managed to prevent solely thanks to the cameras? Is there anyone here who knows the exact answer?
So some of these things raise questions.
The problem we are facing; Remember, I’ve talked about the big world of data from the Social Mobile Cloud, in which you place the world of sensors, and of course, this becomes a place of really interesting problems, especially for us, because the sensors are unlimited. These are just small pieces of silicon that we would like to place everywhere, they can move anywhere, and they are quite simple to make. Sensors are transparent; they will never process a signal that is not intended for them. And they make no difference: they process any received signal.
And when we apply it to the Internet, full of entities that we talked about earlier, everything becomes connected, everything is equipped with sensors, so that everything exchanges information and talks to each other, and the volume of this conversation only grows. Possibilities of people look pale in comparison with what can occur in a world connected to sensors. And this is a very big challenge for our future.
You may ask yourself - why should we think about this? We take care of this because all of this information contains important signals for us to help ensure national security. This worries us, because we need to understand what is happening or is going to happen in the world around us, so that we can notify the people who are responsible for our policy even before trends are formed and before problem situations arise.
We need this because we want to stop another terrorist who is about to bring a bomb in his underpants to a plane before his pants engage in fire.
We are doing this - and I have to be careful when I say this here - because it may be better for you and your friends to know where you are; which for my particular case may not be such a good thing. But most importantly, we are worried about the direction in which this world is developing.
And it also worries us because the information now existing differs significantly from that which was in the world where intelligence activities were completely controlled by man. There is a good table below. Greenish bubble and purple bubble. Green is a world according to the universal library decimal classification system, which when I was at school was called the Dewey Decimal Classification Decimal System (DDC), if I remember correctly. Another red one is the world of information according to Wikipedia. Which one should I trust? What order of information organization do you trust? I know which world I trust - I trust Wikipedia.
What effect does Big Data have on us? Basically, it helps us understand what is happening in the world and know what we know; Understand where we have white spots, so that we can do our job better. This takes us a lot of time and requires the use of some very expensive assets, with which we manage to understand how and with what we can fill in the gaps, and we don’t really need to collect information that we don’t need, which we can find and collect through other mechanisms, such as social media and other similar things. This leads to some important implications, and I'm going to talk about the present and what I call the four big rules of Big Data over the next six minutes.
Number one. “This is data, you fool!” Remember James Carville: “This is economics, dumbass!” Two - this can be a force for people. Three - we’ll talk about a delay that generates contempt. And four - in the world of the future, everything is in a certain context and everything is in your context.
Number one, "this is data, you fool." A little lesson in history from our world - which may sound somewhat ordinary for you, but we got it with a fight and earned hard work - no matter how sophisticated and complex tools you had, if they do not work with my data, then they will be completely useless. Our users, as a rule, tend to choose a rather mediocre tool for working with data, instead of choosing the best tool available and showing me what a wonderful and wonderful object you can create with it.
And this is necessary in order to understand what is happening in the world of information - we must bring everything together, we must understand the plans of our opponents, we need to connect all the key points together.
The big data problem is as follows - the database of useless information is 500 million gigabytes, while the database of useful information is only 5K.
Our problem is to determine what is included in these 5000? Throughout our long history, we have already realized that information has a value in time, just as money has a value in time, and the value of any information becomes known when you can combine it with something else that fits your place in the future. If in our world, some information will be discreetly discarded because you thought that it had no value, or you decided not to take it into account and not to collect it, because you thought that it did not meet the needs of the current moment, then as of how new events and new information will appear in the world, you will not have a link in the big picture. The question is, if we cannot find and connect all the links in our chain now, then it makes us constantly try to put everything together later and we are forced to hang on this issue forever. Although "forever", of course, it should be in quotation marks.
Some interesting characteristics of Big Data that have arisen are quite simple, such as 'more is always better'. The signal-to-noise ratio in this world is only getting worse, but the reason why 'more is better' is that it allows you to make a numerical assessment of what is happening in your data and not engage in costly modeling. Does anyone remember George P. Box's famous phrase about modeling? "All models are incorrect, but some of them are useful." The problem with modeling is that it forces you to make assumptions that are all, one way or another, distorted by your vision of current events. But we want to get away from a distorted perspective and have a clear understanding of what is happening in the world.
On the other hand, users are not data scientists or engineers. They do not navigate in detail in the material. And we need, and we must be sure that no matter what happens in our world, it should feed our information - in fact, data sets, with enough intelligence, so that the user does not need to do anything more than ask a question and get a meaningful answer from the data set itself. If they have to look through thousands of data sets with their hands and try to understand which of them contains information related to the issue of interest to them, then this is a losing situation in all directions.
The following is power for people. I will tell you that today analytics and tools are difficult to use. To get valuable information from the data, we need specialists; we call these specialists scientists in the field of data processing, and we are trying to raise the prestige of the science associated with this to a high level, because the information, skills and knowledge necessary for this are very complex and require considerable time to master. The problem is that a lot of work is required by hand, and much of what is happening is not built into our business space.
The world of new ministers from science is engaged in the development of these areas, which we have already talked about a lot - data researchers, information processing engineers, and so on.
A scientist in the field of processing this data, in accordance with Wikipedia, must have fundamental training in all these areas. How many people on the planet have these skills? Not so much. Of course, having received grants, many universities on the planet began programs in mastering new sciences, this is good news, but so far the [indistinguishable] state of affairs is still far from ideal.
We believe that Big Data democracy will triumph. Our goal is to bring the moment when I will be able to transfer the power of Big Data and analytics into the hands of the average user. The only way that real value can be perceived by us, and by the way, this is true both in the commercial sector and for individual companies, is when everyone will have access to a tool and data that allows them to do their job without worrying about how it works.
We want elegant, easy-to-use tools to appear tomorrow. Let the machines do the hard work, and we need simple things like the same search. Search in the modern world, which we are constantly talking about and which is already being matched to the petabyte scale, is still unintelligible.
We understand all these things, we can name seven universal constructs on which we want to do analytics. We look after people, places and organizations, we care about time, events, certain things and concepts. What we want for analysts is to make it as easy as using functions in Excel. You go into Excel, write your little equations there - sums, standard deviations, open the bracket, select the list of values, close the bracket - and then get the answer. And you see whether it is correct or not. We want a similar tool, let’s say for analyzing a group of people — I need, for example, to see the relationship between them, and it would be great if we opened the bracket, entered the list of names, closed the bracket. And what would I like to receive? A beautiful network graph from which it would be seen
I believe that for those who use it, all this will be quite simple. And we want people to be able to use all these things, and in an unexpected way, and so that they can change everything, so that they can get more and more complex results from relatively simple building blocks.
This is exactly the case when I would like to mention the participants in the Arab spring, here I would like to analyze the moods over time and put it on a map in the form of a map of the temperature distribution. And I would like all the users to do is just draw a diagram, such as in Visio, and see what happens on the other end. And we would like it to be as simple as possible for them.
Delay causing disrespect. This is about speed. Speed is the only thing that matters in our world, and I think it is also the only thing that means anything from a commercial point of view.
Just because we want everything to be fast and not want to wait. What drives my users more crazy than anything is when they wait, when something happens. So I think that we are gradually moving into a world where it already exists. We have works that are performed in almost real time related to MapReduce - we get rid of MapReduce, which is flexible, powerful and slow, and want to use MapReduce which is flexible, powerful and very fast.
We really want to transfer this all to the structures that we call petabyte-sized memory architectures so that we can deal with distributed analytics and other things of this type. These kinds of things lead to technological changes that you constantly read about.
And we think that these processes will lead to the development of new competing architectures, radically changing the order of things happening in the world.
And finally, everything is in some context. In your context - and this is important, because this is the world in which we believe that we built it and perceive it that way.
It will be within the limits of your concepts, because everything else will be within the limits of someone else's concepts. So the purpose of all widgets is to allow you to build your WebTop, or let you name it differently, using the tools and capabilities you need to perform your work properly. What is the purpose of all the material that arose in the world of Big Data when using the 'reading scheme'? This is data outside the context from which you need to benefit. I want to receive, as I said earlier, user-collected analytics in the context of asked problems and questions, and then all this will be miscalculated in the context of the requirements of the work performed. That's roughly what the task of elastic computing is in our world.
A few thoughts in conclusion. I believe that in our information age it is noon — when the sun is high above our heads, and I say this for this reason.
We are already standing very close to the ability to process all the information accumulated by mankind. You know what's good when comparing people with respect to sensors? Within 24 hours you can do a ton of things. You sit here, take your notes, take photos or just listen - you do one thing. You cannot do many other things. You just generate some data. Now this is the case - and if you do not believe me, then let's go back to my Facebook example, which contains one seventh of the total population of the planet and 35% of all digital photos taken - if you want to think about the things that they [sensors] can do.
Inanimate becomes intelligent. When it becomes reasonable, then it becomes somehow gloomy to me. A third wave of computing emerged when cognitive machines appeared. Watson is a prime example. Interestingly, Watson in cognitive machines is about the same as the IBM PC 8088 when compared to modern machines. Gradually, these machines will radically change our world and will be engaged in everything in medicine, trading on the stock exchange, as well as helping us in intelligence analysis abroad.
It is a fait accompli that the world moves faster than the government and the law keep pace with it. I bet he’s moving faster than you can keep up with him. You can ask a question, what are your rights and who owns your data? I bet you will definitely raise this question. As I said earlier, it causes social changes at a pace and in ways that we cannot even expect, and all this creates a very interesting world. I will not talk about cyber threats here because we are running out of time. Thank you very much.
Thank you very much. It was something amazing. I think that we are now ready to go for lunch, and I think that everyone can find you somewhere nearby. Thank you again Mr. Gus Hunt, CTO CIA. I don’t know about you, gentlemen, but I am going to throw my phone into the river after dinner.