Harel Kodesh: The Biggest Internet Industry Empire Today is China, not the U.S.
From Microsoft to Amdocs, then to EMC, and now as President of Red loop Media as well as CEO of Nurego, Harel Kodesh has received world-wide acclaims for his expertise in Big Data, production chain and IT technologies.
As an honored guest of Guangming Daily and its official website, GMW.cn, also known as Guangming Online, Harel Kodesh came to share his interpretations and ideas on Big Data with reporters on the afternoon of July 28.
He sees pros and cons of the two-edged sword, when he still believes: “At the end of the day, Big Data can save people’s lives.” He understands the possible threat to privacy behind Big Data, but he is still optimistic as he mentioned in the interview:” I mean that you are going to see the ways to deal with big data not only becoming easier and easier, but also coming down to the point where small companies can use big data to their advantage, as everybody is able to use that.”
GMW.cn: Firstly, Big Data has become related to all aspects of life up to now. Ranked by necessity, which aspect in our daily life do you think should be the very first to apply Big Data?
Mr Kodesh: Big Data has already been applied to consumers’ space, so if you go and order a book on Amazon, Amazon will come back with a recommendation for other books that other people may have graded. That is a Big Data usage or utilization. What we see, over the last several years, consumers or consumer software have taken a lead in terms of fusing Big Data for their own ideas. Gaming companies are using Big Data by creating all kinds of virtual goods, that people can buy, then analyzing, you know, how many virtual umbrellas, for instance, people can buy as a function whether it is raining outside or not. So there is an interesting connection between virtual world and the real world.
But this is for the consumers’ space. So this is the first thing that Big Data is to be used. It is much easier, as it does not require any regulation. Consumer software usually gets ripen much faster. But if you think about the important things to use Big Data, it takes things to other areas. One of the scientists, for instance, talked about the industrial internet, about how you connect, how you get information from Jet engine, how you get information from cars’ engine. But how do you make sure that the operation of those big industrial systems is optimized? So the industrial part is probably even more important. And it is really going to change the world in terms of how things are, how things look. If you think about, let us say, China Airlines or Air China, jets flying over the ocean. With two engines, usually, the amount of data you can create during the flight is about 5 TB. So system can analyze that information and give information both to the captain about whether the engines are doing well or whether there is a problem there before the captain actually understands the temperature or any problems with the engine. So this is really how it is going on with aviation now. This is a very important thing that saves lives. The other area is really health care and you know the ability to create optimized medications that reserves people’s profiles. These are the important things, but in terms of the first time Big Data is being used, probably consumers, and then goes to telecommunication use, where telecommunication companies are trying to optimize the use of their capital equipment by using Big Data.
GMW.cn: Secondly, among such a great number of data, many are actually worthless. When making use of Big Data, how shall we remove these worthless data in order to give full play to the valid ones?
Mr Kodesh: So this is part of the deprecated secret of Big Data. A lot of the data is either worthless or simply wrong. Look at one of the ways that people are using Big Data today in agriculture. You can put sensors in your ground. They’ll tell you how much moisture it is in the ground, and rather than just water by a precise number. You can actually see whether the field is dry or moist enough, and then you can decide how much more water you need to put in. So you rely on sensors to give this information. But first, a lot of data is wrong, and the sensor gets corrupted, which gives information or data that doesn’t make any sense. It says that there is a minus 5 percent. Obviously it can’t have a negative number. And as you pointed out, a lot of data in general is interesting but completely irrelevant to make any decisions. So the technologies that use Big Data are technologies that need to address what we call the cleanup of the data. So when you take the data, the first thing you do is, before you even move it from where it was generated into storage, that you have to clean up the data, making sure that the data you put in is useful data. Sometimes you can ignore 5 percent, while sometimes you need to ignore 95 percent of the data because there is too much of it. So you don’t need to store worthless data because it costs you lots of money to do these operations. So there are technologies that allow you to distribute the data and get rid of the worthless ones, before you have to store them and process them. And so you are absolutely correct, for there is a lot of worthless data, which shouldn’t be stored. There is no value in using it. A lot of companies are trying to get involved into the Big Data revolution. Many people assume that all they need to do is just take every piece of data they can create. That isn’t going to be economically useful because there is too much data. So you have to understand what you need. To find solutions for something on the base of Big Data, you want to get rid of data that is not relevant and focus on the one that is.
GMW.cn: Thirdly, at present, Big Data processing method has become more and more modularized. Will it be easier and simpler for us to deal with Big Data in the future?
Mr Kodesh: The answer is yes, it is not just the modularization. If you look at the amount of new tools that are being developed to address Big Data, it’s amazing how much progress was made over the last, let’s say, two or three years. The interesting thing about Big Data is that the name actually comes from two elements. For one thing, there is a lot of it, that’s why we call it the Big Data. But for the second thing, it comes in completely unstructured way. It used to be that computer systems could only handle data if it came in a certain format, and if it didn’t come in a certain format, it was required to reform that. For instance, as a said gaming company is starting to look for the twitter patterns and weather prediction, which is not necessarily part of the data that the game is created, right? So they would like to look at the new type of data and do it in a context of their old solutions, and this would be called unstructured data. So it’s important that the unstructured data would be able to get handled. So what you see today is innovations in the way to store those pieces of data. It used to be that people developed data basis, where you store data. Since companies are dealing with unstructured data, they develop new technologies like Hadoop to allow them to store the data in a completely unstructured way, so this is one thing. The second thing, because there is a lot of data, you cannot take, and it doesn’t make any sense to take a main firm or a super computer and go over that pieces of data. What you do is to break the problems into sets of smaller problems, and you address each one of the problem with the slightly or with the much more efficient computing skims. So this is what people are doing with cloud computing. The fact is that if you go to Google data center, you don’t see a main friend, you see a lot of, for example, the same boards that you are having in your own PC, in your own laptop. So the ability to break the problems into sets of small problems and solve each one of the problems and then combine the results is what is being developed now. And the third one is to bring it all together to use people’s experiences from decades ago, to allow them to look at Big Data problems and not to require them to learn a new language, which is another part of the solution. Even though the data is stored differently today, you can still use the old technologies, and there was lots of work that was done in order to combine the new types of technologies with the old types of technologies.
So you can use all the computer scientists that graduated 5 years ago and 10 years ago and make them part of the workforce working on Big Data. So the long answer to your short question is absolutely. I mean that you are going to see the ability to deal with Big Data not only becoming easier and easier but also coming down to the point where small companies can use Big Data to their advantage. So...yes, everybody is able to use that.
GMW.cn: For the next and the forth question, how is the latest development of Big Data in the United States? You can make some comments in terms of technology, market, your own experience on management, operation and etc.
Mr Kodesh: You know, I think that there is a little bit of craziness going on with Big Data today. Everybody is talking about Big Data. There are a lot of versions and great plans to take advantage of Big Data. But in order to do that, certain things as you said have to happen. So first, Big Data usually requires big computing. The question is how you build big computing in a way that is much more effective than, let’s say, the computers that we built in the seventies. So today the way to build big computing is to use cloud computing. Cloud computing allows you to gather infinite amount of processing power with very cheap affordable models. You can connect all those multiple computers either in the same data center or in many data centers and many places. You can take as much processing power as you want, and when you don't need it you can put it back where it was. But at the same time it relies on the operation of those cloud centers. You know, very interesting, because all of a sudden you have to deal with hundreds and thousands of distributes, so there is a whole set of technologies that are being developed to make sure the operation is a smooth.
Because everybody is trying to rely on this thing, you have to make sure that there is always enough processing power. Almost like you want to make sure that your utility company, especially in a warm day like today has enough kilowatts, enough energy to power all the air condition systems in Beijing. So you don't want to be in a situation where something fails and nobody can turn the air conditioners on. It’s the same thing with computing. You have utility companies providing you with power, water, and energy. And now all the utility companies are providing the computing power. And so we start to set separation of the users, of those computing powers from the producers. It used to be that if the companies wanted to have a software, they had to go to buy tons of servers. Today, they don't necessarily buy all the servers they need. They can rent them or they can rent the virtual servers from China telecom or China mobile or any of the companies that are providing communication servers and now they start to provide computing servers. So that's the operation side.
So we talked about the modularity. We talked about the ability to make sure that there is no failure. The third one is protecting the data. You want to make sure that the data doesn't start growing legs and ending up with people who shouldn't be authorized to do that. Think about... everybody is talking about hacking into websites. There is a big concern when you are going to the website of a company and ,all of a sudden, you see that somebody hack into that. Just imagine what it would be if somebody hacked into the Beijing Power Grid. And all of a sudden, you know, they can actually take the power grid down. So you want to make sure that those things, even though they are distributed, they are very well protected. Now, it doesn't mean that you should not...other people would say that we shouldn't expose all those things or we shouldn't connect all those things because people may take advantage of that. But the reality is that you can expose if you have the right defense of the data. You don't want anybody to turn off jet engine but you still need jet engine. By using Big Data, we can optimize the operation. So there are many technologies that are being built, but there is still a need for many other types of technologies that probably make adventure capitalists both in China, the U.S. and Europe very happy. But we are just in the beginning.
GMW.cn: Okay, as you have mentioned the safety of our data, could you please give us more suggestions on data protection?
Mr Kodesh: You know, there are all kinds of way to compromise data. There is a denial of service which a hacker may cause the server to be too busy. So all of a sudden the server is not available for you even though you are sitting there. Just give them some random work to do and that's called a denial of service. There is the issue of building a website: it looks like a normal website, a legitimate website. They take your credit card and password and compromise you. And there is definitely a way to break into a system and either steal your credit card or even worse steal your ability to connect into everything.
All those things require protection. Unfortunately, when it comes down to the safety of the data, you have actually a war going on and people are not aware of that. There are good guys trying to protect the data and there are the hackers, the bad people, who are trying to come up with ways to break into the data, and the bad people are very very smart. So it’s really a never-ending problem where you have to be one step ahead, where you have to make sure that the technologies that you have are in step with the technologies hackers and people that would like to break in are building. And you have to be one step ahead of them. Over the last five years, we see again, using of Big Data, in order to defend Big Data. You try to understand whether a system has something going wrong even before you see the results of bad things happening. You want to make sure that you understand how the system is getting compromised and there are many new types of ways to actually break into computer system. It used to be that the virus was the only thing, but unfortunately now there are many types.
You see by the investment in the safety software, you know, the defense software, cyber security etc, there’s a very hot area where a lot of technologies are being built all the time. At the end of the day, when you measure the risk in having data compromised vs. the usefulness and benefit of having data in a situation where it can be analyzed. And we’ll talk about it in a second. I think the bottom line is net positive. You just have to make sure that you protect the information. Mobile banking is a classical example. It’s actually very frightening because you may be concerned that people will break in and take your money. But at the same time, will you be able to function without mobile banking? And the answer today is: it’s too late. You can’t do that. You don’t want to have an accident in the road, but you don’t want to give up driving. So at the end of the day the industry is building a lot of good tools to make sure that data is not compromised and the security is maintained.
GMW.cn: Okay, thank you, as you have mentioned that Big Data has been applied to human health. What is your opinion on developments in this field?
Mr Kodesh: This is probably the most exciting part because people talk about data and it sounds very technical and very computer sciency. But at the end of the day I do believe that Big Data can save people’s lives. I’ll just give you an example. If you look at the protocol to treat cancer, usually there’re protocols and procedures that were developed with averages. This is what we need to do for a patient. Today if you can take a patient’s blood test, which is available to you, and a patient’s protein or DNA mapping, which is also available to you, you can actually come up with a procedure that is individual. So if I have two friends who have cancer, the treatments for them may be completely different based on their own data, based on what blood types they have and what disease they had in the past. So the ability to make medication optimized based on specific patient’s data is very exciting. This is a massive amount of data because you have millions of people who get treated. But the ability to use, maybe even a thousand, maybe ten thousand, computers for 15 mins to analyze all these information and then come back and say “you know usually the way you would do it is this, but this patient has a different problem so you are going to do it that way” is very exciting. In my hometown Seattle, there has been a lot of work that has been done exactly with that type of solution. I think that give it two or three years, you’ll see everything, from DNA mapping to blood test to specific medication, is tailored for you and not for somebody else who has the same illness, because they may not have the same properties. I think this is probably the most exciting use of Big Data.
By Zhao Gang & Zheng Yi, GMW.cn (Guangming Online)
Sincere thanks to our cameraman Liu Lian & Wang Enhui
President of Red Loop Media & CEO of Nurego
Mr.Harel Kodesh currently serves as the Chief Executive Officer of Nurego Inc., a company dedicated to create a new type of Cloud Business Management System. Previously Mr. Kodesh managed the Cloud Infrastructure Business at EMC, served as the CEO of Mozy, a wholly owned subsidiary of EMC dedicated to Backup as a Service.More>>
- "Lots of companies are trying to get involved into the Big Data revolution. Many people assume that all they need to do is just taking every piece of data. That isn't going to be economically useful because there are too much data."
- "Cloud computing allows you to gather infinite amount of processing power with very cheap affordable models. You can connect all those multiple computers either in the same data center or in many data centers. You can take as much processing power as you want. And when you don't need it, you can put it back to where it was."
- "I think that give it two or three years, you'll see everything, from DNA mapping to blood test to specific medication, is tailored for you and not for somebody else who has the same illness, because they may not have the same properties."
Consultant producer:Zeng Fanhua
Designer: Zhou yueqin
Art Editor: Li Wenfeng
Executive Editor:Zeng Fanhua