Saturday, November 22, 2014

A Tale of Crowdsourcing and Diversity




Crowdsourcing is a hot topic recently, and I think it is a promising paradigm for solving a lot of problems. Diversity is an important phenomenon in complex systems such as the human society, and I am very interested in understanding the nature of it. Some of my previous blog posts, such as The Amazing Diversity is dedicated to this topic. In addition, our recent paper on a vulnerability disclosure program is also inspired by these two keywords. But how are these two concepts connected?

In the summer, I have read a famous Chinese Wuxia book called the Ode to Gallantry (侠客行). I found that the tale in the book serves as a perfect example for understanding crowdsourcing and diversity. I will very briefly introduce the story here, and please stop reading if you don't want to see this spoiler.


Sometime during the ancient China, many kongfu masters will be hijacked to a mysterious island every 10 years by some mysterious guys. These masters never return. It turns out that two top kongfu masters have obtained an old martial art book with undeciphered text. They have tried hard to understand the meaning of the book, but failed. Therefore, they decide to invite (or hijack) kongfu masters and ask them to decipher it (crowdsourcing). These masters, unwilling to go to the remote island at first, will soon be attracted by the book and concentrate entirely on the decipher task. However, many years have passed and still no one has figured it out.

It is not surprise that the protagonist of this story solves the problem. The unique advantages of him are:
  • He is illiterate. Therefore, he tends to understand the writing as graphs.
  • He has seen a graph-based kongfu book before, and this further guides him to interpret the text as graph.
  • He has little knowledge of kongfu before, and therefore does not have much prejudices and biases (the Einstellung effect).
Then, the protagonist learns the super kongfu in the book and become invincible in the world.


We can see that the initial crowdsourcing effort fails because there is a bias. And this bias is overcame by increasing the diversity of the crowd. Here, the protagonist is drastically different from the rest and thus significantly increases the diversity of the pool. This is one reason why diversity is important to crowdsourcing.

In general, crowdsourcing is still in its infancy and we are still exploring the meaning of diversity. There are many questions to be answered.



Reference:

[1] The picture. http://pjh568.gotoip2.com/data/attachment/forum/201207/28/101235i5w4ir1hj4ik2g45.jpg






Sunday, November 2, 2014

On Academic Presentation



(v0.1)

I will present our paper at a CCS workshop next Friday. Then I will present my thesis proposal in the comprehensive exam next next Friday. Facing these two important occasions, I decide to summarize my current understanding on presentation. This is NOT a collection of advises, because I am far from a good academic speaker. I simply hope this article may raise some discussions and help you think about what will lead to a good academic presentation.

Here I have several points to share:

(1) A clear story flow in the presentation is of top priority. The flow can grasp the attention of the audience. As others have said [1], the flow is much more important in slides than in paper, because the audio channel is more brittle. In addition, a good flow will also help the presenter to remember what to say.

I think there are at least two types of flows:
  • The logic flow of research. The audiences should know the natural transition between research steps. Thus they will appreciate the work. 
  • The knowledge flow. We need to introduce enough background before going into details. Also, make sure that terms etc. are understandable.
In addition, try to only have one story line. It is true that a research project usually expands to several branches. But they will interrupt the flow and confuse the listeners.

(2) Presentation is a process of convincing others. The listeners will be convinced if the study is rigorous and the language is accurate. Do not over claim.

(3) Make the presentation tight. Try to connect things together. Try to refer back to previous important points. This actually improves the complexity of the presentation structure, and people enjoys complexity. Similar strategies have often being used in movies. Lock, Stock and Two Smoking Barrels is a perfect example.

(4) Presentation is also a form of teaching. Try to think what the audience will learn from it.

(5) We have our own styles in presentation. I feel it is in general hard to copy other's style. For example, native-English-speakers can talk about jokes and funny pictures (e.g. the one used in this blog), which are sometime hard to understand, not to mention to speak, by non-native speakers. Nonetheless, even without these funny elements one can still make a good talk. I sometime think too much "fun" will actually have negative effects, i.e.,  "amuse to death".

Here are general steps I take for preparing a presentation. Please feel free to comment on them and provide your own opinions:

(1). Have a rough story line first.

(2). Turn the story line into slides. Focus more on the completeness of the information.

(3). Practice lightly and then update the slides. At this stage don't expect them to be perfect. Also take a look at similar talks to "steal" good presentation ideas.

(4). Write the scripts for all slides. At least write outlines for each slide. You don't need to read them, but you need them to remind you about the story line. Also, written text is easy to be studied and improved.

(5). Practice seriously.

(6). Present to others. Your adviser or research collaborators are the best choices. They know your research, but they are not trapped by myriad of details like you. So they can give very good suggestions on improving the story line! People with enough knowledge background (e.g. your lab mates) are also good. They can tell you which part is unclear or confusing. Also, try to collect creative ideas of presentation from others.

(7). Improve slides, practice, improve slides, ....

In general, you will feel unconfident and uncomfortable in the beginning, because the quality of your taste is always ahead of the quality of your work [2]. However, as long as you keep improving it, the final version will be very good. Furthermore after a well preparation, you will not only have a great talk, but also find new research ideas!





References:

[1] 博士五年总结(三), http://blog.sina.com.cn/s/blog_946b64360101dych.html

[2] Ira Glass on Storytelling, http://vimeo.com/24715531

[3] The picture. http://assets.diylol.com/hfs/ae1/38e/525/resized/business-cat-meme-generator-boss-wished-me-luck-on-the-presentation-like-i-need-it-52c717.jpg

Saturday, October 11, 2014

English Name or Not?




(v0.1)

As a Chinese student in America, an important question to ask is: should I choose an English (first) name? Those who against this idea usually provide the following points:
  • The original name defines your identity.
  • You should respect the original name because it is given by your parents.
  • If I am good, others will correctly pronounce and remember my name anyway. 
Some of my American friends, Indian friends and Chinese friends are holding these points. Sometime ago, I've also watched a Youtube video in which an American student advocates these points to some Taiwan students.

Other people, such as Philip Guo [2], support the idea of choosing an English name when moving to an English-speaking country.

And here is my opinion: although I currently do not have an English name, I agree that Chinese students (or possibly other East Asian students) studying in America should find a English (first) name. Obviously, the English name is easier to pronounce and to remember by both the natives and students from other countries. The English name can also tell the person's gender, which in some situations are more convenient. I guess the reason that most Indian students don't choose an English name because their original name is relatively easy to pronounce and already tells the gender, at least based on my experience. After all, English and Hindi both belong to the family of Indo-European languages.

Furthermore, I disagree with the three points that are against finding an English name. To refute them, we can look at the opposite direction: what did some Westerners do when they were in China. During the Age of Discovery, many Jesuit priests came to China and played an important role in the communication between civilizations. These priests all used Chinese names, such as 利玛窦 (Matteo Ricci, the man in the above figure),汤若望,郎世宁, which are still known by many Chinese today.

Also, having a second name is actually a part of traditional Chinese culture. Ancient Chinese people use their style name (字), rather than their real name in the daily lives. And it is actually impolite to call one using the real name. It is not a bad idea to consider the English name as a style name.


References

[1] The picture, http://www.faculty.fairfield.edu/jmac/sj/scientists/riccimap.gif

[2] http://www.pgbovine.net/choosing-english-name.htm

Saturday, September 20, 2014

A Quick Analysis of Facebook Bug Bounty Program




(v2, updated 10/15/2014)

Nowadays, Web companies have been relying on vulnerability reward programs (VRP, also called bug bounty programs) to discover vulnerabilities in their products. Basically, a white hat (good hacker) can submit a vulnerability discovery report and then get some money back. We have written a preliminary paper analyzing a related program called Wooyun, and please take a look if you are in general interested in this new paradigm of improving security.

Facebook is one of the companies that embrace this idea, although Facebook is generous sometime (see this and this), :P. FB also hides information about what vulnerabilities have been discovered, or the details of each white hat's accomplishment (e.g. how many vulnerabilities one has discovered, and when). FB only provides a list of white hats who have contributed to Facebook security every year, at this page.

Anyway, we can start with this page and do some quick analysis. The data is obtained by 9/20/2014. First, there are 670 names on the list (there are several cases when multiple names appear in one line and separated by commas, and we will count each name alone). Quite a lot, isn't it? But it is possible that some enthusiastic white hats contributed every year and leave their name multiple times, so we also count the number of unique names, which is 516.

Next, we count the number of white hats each year, shown in the following table:

TimeWhite Hat Count
2014 (up to 9.20)191
2013255
2012126
201155
Prior to 201143

We clearly see the trend: more and more players are joining this game, and the number roughly doubles every year:) I guess VRP is really a promising idea (please see our paper for more discussions).

There is also an interesting fact: a lot of white hats are only active in one year. To show this, we create another table counting the white hats based on number of years being active:

Number Years being ActiveWhite Hat Count
1402
282
326
45
>=51

So far, there are 402 who have only appeared in one year's thank list. And we can see that the white hat count distribution is highly skewed. Much few white hats are active for more than one year. And there is only one person who has been thanked all the time! This probably shows that the value of this kind of VRP not only lies in a few experts, but also in a large number of people. But since we don't know how many vulnerabilities each white hat contributes and the severity of them, the conclusion is hard to make. Still, this observation is consistent with what we claim in our paper.

You might wonder who is the "all the time" person, and the answer is: Szymon Gruszecki. You can access his personal page here.

Please feel free to discuss by leaving a message. Thank you for your time!


Update:

Facebook has released some interesting statistics of its bounty program here:
https://www.facebook.com/notes/facebook-bug-bounty/bug-bounty-highlights-and-updates/818902394790655

Some interesting points:

  • From the statics we see that there is a huge number of invalid reports. The valid rate is only 4.7%. Why?
  • It says that "One of the most encouraging trends we've observed is that repeat submitters usually improve over time. It's not uncommon for a researcher who has submitted non-security or low-severity issues to later find valuable bugs that lead to higher rewards." Actually, we plan to investigate this issue further in our data set.
  • The country rank: Russia -> India -> USA -> Brazil  -> UK





References

[1] The picture. http://america.aljazeera.com/content/dam/ajam/images/shows/Real%20Money%20with%20Ali%20Velshi/SG_FB2_1460.jpg



Saturday, August 23, 2014

Bugs and Patches for Papers



In an earlier article, Writing Like Compiling, I have made some connections between programs and papers. This article makes a connection from a different perspective.

For publications in Computer Science (and possibly other domains), there is a problem. A paper could contain bugs: errors, unclear sentences, missing backgrounds, etc. These bugs might caused by the knowledge gap between the authors and the readers. Or they simply arise due to the conference-driven publication paradigm. Such paradigm puts researchers on a fast race and leave them less time to ponder and polish their work [2]. These bugs inflict readers minds and eats up their time. Some smart readers might find ways to fix those bugs, just like an advanced user finds a bug in a program and makes a patch for it. However, since there is not good way to share the fix, and a paper is usually fixed after the camera-ready version, this paper patches only stay in a paper copy as some red marks...

Programs, too, are not perfect after release. However, software developers and users will constantly discover new bugs and apply corresponding patches. And this model generally works well. After all, there seems to be no alternative way. Therefore, I think we need to treat papers as programs, and creates ways for reporting bugs and sharing patches. A first step is to store these paper bugs and patches in some database and enable readers to search for them. However, we want to avoid the detachment between the papers and the patches, so we could allow some energetic readers to fork a paper and make version 2.0 of that paper. This fork ability is very common in the opensource software community [3].

In general, isn't it a bit ironic that Computer Science, the field that aims to digitalize paper-based information, still record its cutting-edge findings in papers?


References

[1] The picture. http://www.rebeccaheflin.com/wordpress/wp-content/uploads/2013/08/rejected-writing.jpg

[2] Fortnow, Lance. "Viewpoint Time for computer science to grow up." Communications of the ACM 52.8 (2009): 33-35.

[3] Raymond, Eric. "The cathedral and the bazaar." Knowledge, Technology & Policy 12.3 (1999): 23-49.

Saturday, July 12, 2014

Two Downsides of Privacy



v0.2

These days people are all talking about privacy, partly thanks to Mr. Snowden's effort. While I definitely support the the individual right of privacy, I want to talk about two downsides of privacy here. But again, I strongly agree that each individual should have full control of her or his information. I just think sometime we might want to trade privacy for more important things.

The first downside is that privacy could lead to distrust. An often used example for supporting privacy is: you got drunk one night and shared a photo of drinking on Facebook. While your friends might like it, your boss doesn't. So people have been designing advanced access control technologies to keep your boss away from your "little secrets". Some people might even avoid using Facebook, considering that some companies require employees' Facebook passwords [2]. However, such personal information disclosure could help others understand you more and thus enhance the relationship. On the other hand, if a person cannot be traced at all on the Internet, he or she will be a mystery in others' eyes. And will you trust a mysterious figure?

The second downside is that privacy might obliviate a person. From East to West, from past to present, an eternal pursue of the mankind is immortality. At least, a person wants to leave something to this world after death, which could only be achieved by a small group of people in the past. This digital age enables immortality to everyone, in the sense that their words and activities on the public Internet can be recorded and kept almost forever. Search engine could retrieve one's words and return to somebody in the future, and we could imagine it as a kind of conversation between the dead and the live. However, privacy would make these information secret to only a few people or even one person, through technologies like encryption. If that person happened to pass away, then his words also gone with him, if no one else knows the password. What a pity if Einstein II encrypted his remarkable theory and then passed away accidentally. Would it be better to publish it on a blog, like this blog?

Update: we all feel really sad about the MH17 tragedy. It has been said that there are more than 100 AIDS researchers on board. And I hope we can rescue their ideas and thoughts as much as possible.



References:

[1] The picture. http://blog.static.abine.com/blog/wp-content/uploads/2011/10/privacy.jpg?e835a1

[2] http://www.usatoday.com/story/money/business/2014/01/10/facebook-passwords-employers/4327739/

Friday, June 27, 2014

Heartbleed and the Paradox of Security Professionals


The recent Heartbleed vulnerability of OpenSSL shaken the whole Internet. Yet such incident might not be a total surprise because OpenSSL only had one full time employee and received $2000 a year as a donation [1] before the incident. Such support is by no means enough for the developers to produce high quality code and test the software product comprehensively. I guess the even hackers who keep searching for vulnerabilities inside OpenSSL have much more funding.

But the situation is changed now, as tech giants agreed to fund OpenSSL for at least 3.9 millions in three years. Even a Chinese mobile company, Smartisan, announced a donation of 1 million yuan ($160,000) [2]. You might already feel the strangeness of this event: a mistake makes millions of money!

The more astonishing thought is that if OpenSSL developers did a better job by not introducing the vulnerability, then they will still starving and suffering in poverty! This paradox does not only apply to OpenSSL, but probably to every company that needs security. For such a company, if the security team is doing a good job, then the company's CEO might feel the security team is redundant because nothing bad happens. Although the CEO might not that dumb to fire the security team, nonetheless the CEO could not appreciate the effort of the security team and might not raise their salary. Thus for the security team, there is hardly any incentives to do better. Rather, they might just want to meet the minimum requirements, or even accept some security incidents to attract the attention from company managers.

Do you know how to break this paradox?




References:

[1] Tech giants, chastened by Heartbleed, finally agree to fund OpenSSL. http://arstechnica.com/information-technology/2014/04/tech-giants-chastened-by-heartbleed-finally-agree-to-fund-openssl/

[2] http://www.ithome.com/html/android/86232.htm

[3] The picture. http://www.paradoxproductions.com/pics/tritwo.gif

Tuesday, May 20, 2014

Research vs. Learning



A PhD student has two roles. One is a research assistant that strives to produce good research papers. Another is a learner that keeps improving oneself. Actually, even after obtaining PhD, a researcher still need to learn, and might need to do so for one's lifetime. Research result is explicit while learning result is implicit. That why some students and faculties (e.g. [1]) ignore the later role. However, I would like to use a simple model to show that such ignorance leads to inefficiency.

Let's consider making research progress as a random sampling from a normal distribution, which represents the capability of the PhD student. And higher value of the sample corresponds to higher quality of the research. No matter how many samples are drawn, the average of the samples is the mean of the distribution. And only a few of them could have high value.

Now, student A and B both start with a normal distribution of mean = 1 and variance = 3. And we further assume that sample x > 5 means a very good paper and x < 0 means a failed research project. The figure is:

Now, A decide to solely focus on research. So A keep sampling from this distribution. However, the probability to produce a good pare is only 1.0%. So 100 trials lead to one good paper. And the chance of failure is 28.2%, so more than 1/4 of the trials result in failure.

On the other hand, B focus on learning, by which B pushed the mean of the distribution to 3:

For B, the probability of generating a good paper is 12.4%, which is better than A's in an order of magnitude. Moreover, the probability of failure now drops to 2.8%.

I don't want to draw any conclusion because it is just a very rough model. However, I think you can see the point. And I am also not saying that a PhD student should only focus on learning without any research responsibility. I personally think a PhD student should definitely spend more time on research than on learning. And doing research is actually another very important way of learning, that's why I put the Taiji graph in the beginning of this article, because they boost each other. The point I want to make is that during the journey, sometime there will be a stagnant period during research and we might feel sad. However, we should smile because as long as we keep learning and keep pushing the mean of our normal distribution to the right, things will be fine:)


References:

[1] The picture. http://www.acuherb.us/image/taiji01.png

[2] http://blog.liyiwei.org/?p=1429

Monday, May 19, 2014

Learn the Upstream

v0.1

It seems that learning the upstream of your research field would be very helpful for your research. This claim assumes that a field has a upstream field, or all fields are constructed in a hierarchical structure. I guess most people would agree with this. For example, the upstream field of Computer Science is mainly Mathematics. Mathematics provide language, tools and theorems to build the foundation of Computer Science, and many people (e.g. [1]) believe that the prerequisite of a good Computer Scientist is a solid knowledge of Mathematics. This article will expand this point to other fields by presenting several examples.

Most sub fields in computer security are more or less rooted in cryptography, which is no doubt the earliest sub field in computer security and the most rigorous one. This explains that several famous security researchers such as Ross Anderson and Bruce Schneier, started their career in cryptography, and then "invaded" many other sub fields. Andrew Yao might be another example. He switched from Physics to Computer Science, and got Turing Awards.

More examples can be found. Yin Wang has been criticizing many important products in computer science, such as SQL, Unix, Go language, ... While the validity of his criticisms are always in debate, I think they do have some value. And I further realize that it is because Yin Wang is from the programming language field, which is more or less the upstream of many other computer science subfields, such as database and OS. My roommate also serves an interesting example. Once a Math major undergraduate student, he switched to the field of Deep Learning now. Compared with researchers in CS background, his knowledge in Math helped him understand the problem deeper.

People always talk about jumping out of the box is the way towards creativity. Well I guess learning the upstream is the way towards the outside of the box. Isn't it?




References:

[1] The picture. http://www.nolandalla.com/wp-content/uploads/2014/02/salmon.jpg

[2] How to do Research At the MIT AI Lab. David Chapman. 1988

Saturday, April 19, 2014

Technological Niche

v0.1


The Facebook purchase of Oculus VR  in 2 billion dollar is still quite shocking. Particularly, it is because the company is only 2-year-old and the owner is only 21. And for people who are more familiar with technologies, there are two more reasons to be shocked. One, the legend graphics programmer, John Carmack, joined the company. And two, Virtual Reality (VR), a once hot technology, was considered a mistake and pretty much dead.

I still remember when I was an intern in Microsoft Research Asia, 2011, I attended a talk given by Professor James A. Landay. During the Q&A session, a student asked Dr. Landay about the prospect of VR. Dr. Landay said something like "many people believe VR is a mistake, and augment reality is the right way". Indeed, in the first fad of VR back in late 1990s and early 2000s, companies spent a lot of money for VR but the return was frustrating [1]. However, the founder of Oculus VR just loves VR all the time and collected most of available VR equipment. This love gives him motivation and this collection gives him inspiration. I guess most of us could regret for what unique interests we have abandoned in order to cater to others, and what large amount of money we have lost...

Let's wipe our tears and move on. Despite the lesson of perseverance, we could also find some other interesting things. For me, I would say a unique thing will never be a mistake and every unique thing will have a position and purpose in the world. And this is why I am able to write this article and you are able to read it. We know that human are mammals. But in 65 million years ago, the lord of the earth, dinosaurs, might not know mammals, which at that time, were just some tiny creatures eating bugs to survive. An alien who visited earth at that point could say that mammals are "a mistake" compared with dinosaurs. However, we all know the story afterwards, dinosaurs got wiped out in a disaster and mammals control the world. We could say that mammals win this time, but the ultimate winner is Nature who keeps the diversity of the ecosystem to combat disasters.

Same principle applies for technology. VR might not be the mainstream technology in the past 20 years and possibly not the mainstream technology in the next 20 years, as Oculus VR might fail. However, VR does have the value to exist, because it might play a critical role in the future. Every unique technology fits a technological niche, and is prepared for its day.



References:

[1] http://time.com/39577/facebook-oculus-vr-inside-story/

[2] The picture. http://media.pcgamer.com/files/2013/04/Eve-Oculus-RIft.jpg

Sunday, April 13, 2014

Resume an Old Project

v0.2


It is common for us to resume an old project (research project, programming project, etc.) due to interruptions (e.g. a vacation) or multitasking. Particularly, multitasking is necessary for graduate students and even more important for professors. The challenge is that we usually forget about the details of that project, or even forget about the motivation of doing that project. Thus, we will find quite a lot difficulties to regain what you once know. How to reduce the difficulty of resuming an old project is what we will discuss here today.

First, I found it is useful to have a warm-up period in the beginning. To recall the progress of an old research project, we could first warm up by reading a related paper. If you want to continue a suspended coding project, you can first warm up by adding some comments to the source code and refactor the code a little bit. In general, try to be slow in the beginning and do not hurry. Maybe after the warm-up and a good sleep, your dormant memories of that project start to revive and you will have more confidence to continue.

Another important thing is to write note during a project. Write down as much detail as possible as they might save a lot of time when we want to pick up where we left off. I found a simple daily note is especially useful.

Finally, before suspending a project, we need to consider how ourselves or somebody else can pick that up in the future. One tip is to make our work as automatic as possible, so the future person can quickly run it. For example, creating a script file for experiment or data analysis tasks will enable a future person to run what we've done in one command. It gives that person an instant feeling of accomplishment and it is much easier than first study a lot of stuff and then run.

References:
[1] The picture. http://blog.viddler.com/wp-content/uploads/2013/10/project-manager.jpg

Monday, April 7, 2014

Writing Like Compiling

v0.1


There has been suggestion to write programs like writing articles [2].  But I am thinking the opposite direction: could we write articles like writing programs? There is an interesting article that makes such analogy for novels [3]. While as a grinding PhD student, I am more interested in applying it to academic paper writing. And in this article, I would like to discuss the connection between writing an academic paper and compiling a program.

When a programmer has written some source code, it needs to be compiled to machine-understandable format, by a program called compiler. This is similar to writing a paper, in which you try to translate the thoughts in your mind to a form that is understandable to others. When compiling a program, there are typically many passes to process the source code and transform them step by step towards the final form. Each pass usually focuses on a specific task. Such architecture simplifies the design of compiler and enables extensions in the future. When writing a paper, we could do the same thing by first write an awful version, then improve it through multiple passes. In each pass, we focus on one goal. Below is a simple example:

  1. Make sure that the paper does not miss any important information.
  2. Make sure that the story line of the whole paper makes sense.
  3. Make sure that the core concepts are correctly defined.
  4. Make sure that terminologies are consistent and sentences are correct.
  5. Improve line by line and make the paper readable (might have a lot of redundant information)
  6. Revise the paper by removing redundant information.
  7. ...
This method definitely cannot guarantee a good paper. After all, the quality of the paper is determined by the quality of the research. However, it can at least reduce the anxiety of writers. When looking at an aghast draft, they won't feel panic and overwhelming. They can directly start from pass 1:)



References:

[1]. Picture. http://uploads3.wikipaintings.org/images/m-c-escher/drawing-hands.jpg

[2]. Literate programming. http://en.wikipedia.org/wiki/Literate_programming

[3]. 金庸笔下的良好代码风格. http://blog.sina.cn/dpool/blog/s/blog_6a55d6840101ek3y.html (In Chinese)

Tuesday, April 1, 2014

Research Idea Forensics




v0.1

computer-forensics

When reading a paper, I always wonder how the author came up with the idea. Knowing this could helps us better understand the essence of the paper. We could also learn how to find good ideas by studying predecessors' path. Finding a good idea is much more harder than reading and understanding an idea. From the history of science and technology we could see that only a few of creative minds were able to propose great ideas.

Since most papers do not include how the authors got the idea, the readers have to figure out by themselves. We can call this activity research idea forensics. It is pretty much like a detective deducing motivation and crime of a criminal. In this article, I just share some thoughts on idea forensics. For more general and comprehensive discussion, [2] might be helpful.

How to do research idea forensics? A few authors might mention the story in some sections of the paper, such as introduction or related works. Or the reader can find a clue in the citations of the paper, since the authors might directly inspired by some existing works. The publication record of the authors define their specialty and way of thinking, which are usually important factors for generating ideas. The inspiration might also from industry, because a new technology could turn impractical ideas practical.

Actually, these heuristics sounds not difficult to understand and use. So some researchers in information retrieval, data mining, etc. could even try to create some tools that can automatically infer the idea generation process

However, I do find some other works that might have a very interesting and unique idea path. Recently, I re-read the following classical paper:

A sense of self for unix processes, by Forrest, Stephanie, et al, 1996

It proposes a way of differentiating intended execution of a process and maliciously injected execution (e.g. shell code execution through a stack overflow attack) during a process run time. So that intrusions to a system could be detected. The idea is to use short system call sequences to build a model of self (i.e. intended execution of a process), and then apply the model to detect abnormal system call sequences. The closest work uses system calls as building blocks for a policy language that allows users to specify what is correct and incorrect. While it is an interesting approach to detect intrusion, human might not have the ability to make a comprehensive policy. On the other hand, the work by Stephaine is automated. Until now, this paper has received 1863 citations on Google Scholar. It particularly inspires later intrusion detection work and more recently, behavior-based software analysis work [3, 4].

I am quite curios on how the authors discovered this idea. And my current hypothesis is that this idea is a product of interdisciplinary research. The last sentence of the abstract part is:

"This work is part of a research program aimed at building computer security systems that incorporate the mechanisms and algorithms used by natural immune systems."

It seems that such motivation push the author to think how to create an immune system for computer system. There are many concepts in immune system that might be useful in computer security. And the authors seem to focus on phagocyte cells, which "eat" foreign particles in the body. Phagocyte cells use some chemical cues to identify foreigners, so to implement them in a computer system, you need to find the correspondent "chemical cues". These cues have to be simple (i.e do not require too much time to identify) and effective (e.g. will not let bad guys run away and will not kill good ones). And system call trace is a very good candidate, because it contains the critical behaviors of  a process and it is much smaller than raw instruction traces.

The lesson is that, thinking from a new angle could make a difference:)



Reference
[1] The picture. http://www.teamyeater.com/2011/09/phases-of-computer-forensics/computer-forensics-2/

[2] Where Good Ideas Come From, Steven Johnson

[3] Zhao, Bin, and Peng Liu. "Behavior Decomposition: Aspect-Level Browser Extension Clustering and Its Security Implications." Research in Attacks, Intrusions, and Defenses. Springer Berlin Heidelberg, 2013. 244-264.

[4] Wang, Xinran, et al. "Behavior based software theft detection." Proceedings of the 16th ACM conference on Computer and communications security. ACM, 2009.

[5] Data mining approaches for intrusion detection. Defense Technical Information Center, 2000.

Thursday, March 13, 2014

The Amazing Diversity

v0.2



I have been observing amazing diversity in my TA class.

In the start of the semester, each team will choose two information security topics and present one of them. There are about 14 teams and 18 topics. Initially I was concerned that if many teams choose the same topics, especially the hot ones, then we need to spend a lot of time to adjust. It turned out that the preference of the class is very diverse so that each team can present what they want. I am pretty sure there were no negotiations between teams, so such diversity occurred naturally. Why students' interests are so different? When did the difference emerged in their lives? College? High school? Despite the mystery of the origin, this diversity could put them into different positions in the future, which is a very good thing to see.

Another case for me to observed diversity is the project report. Each team is required to write a vulnerability analysis report of a fake company. Although the flexibility of this assignment is rather small, I still observed great diversity in their reports. The angles are different, technologies are different, format are different, etc...

And when I grade quizzes, I tend to think poor performance of some students as a sign of diversity, instead of a sign of lazy, stupidity, etc. Some answers are not wrong, but just different from the standard ones. And I do believe the students who provided those different answers are matched for some special tasks in the society.

It might sounds crazy, but I do feel that the world is so beautiful because of such diversity exists.

If I have time, I would definitely spend some efforts on studying diversity.


References:

[1] The picture. http://theragblog.blogspot.com/2012/11/mercedes-lynn-de-uriarte-supreme-court.html

Wednesday, March 12, 2014

Breaking Bad is Bad



Breaking Bad is one of the most successful TV shows in the history. It got 9.6/10 in IMDB and 94 awards including Emmy Awards.

After watching the first two seasons and the beginning of season 3, I have to say that I do not know why it is so successful. Is it because finally some bad guys become the main characters? Or it is because drug is a topic many Americans are interested in? Or most ordinary people have a hidden impulse to break bad?

But I do know why I don't like it:

1. Many episodes are actually unnecessary. Especially, the episodes related to Walter's family. You definitely know what gonna happen in his family: Skyler will find out the truth someday. However, at least 1/3 of the time is spent on conflicts in the family which won't push the main theme further. In other episodes where Walter and Jesse do the business, the story writer always try to insert some incidents to make their time harder. And it is really intrusive and make the show unrealistic.

2. The show has a negative emotion. I am not saying it is negative because it portrays drug dealers. I feel it is negative because the show is full of trivial arguments and fights, yelling, dirty words, losers, and frustrations. And it is especially sad to see Walter gradually changed from a decent teacher to a lunatic. I think this will make some audience including me uncomfortable after watching it.

3. You cannot learn something from it. While amusement is one motivation to watch Breaking Bad, I do want to learn something from it. However, I failed to remember anything valuable from it, even English. As a non-native speaker, watching TV show is a good way to learn English language. But Breaking Bad cannot help me with that because most of the sentences are short and simple...

I won't watch it anymore and instead resume watching Law and Order. It is a fairly old show but tells quite a lot about American legal system and the society. The sophistication of the language is also a plus for an English learner.


References:
[1] http://www.joblo.com/newsimages1/Breaking%20Bad%20The%20Soundtrack.jpg