Thursday, January 15, 2015

Inflation and StackOverflow.com




(v0.1)

I really enjoy asking and answering questions on the StackOverflow.com, which has some exciting features. For example, it is mainly managed by the community, not the admin. So users on StackOverflow.com not only contributes information (Web 2.0), but also contributes "computation" (Web 3.0?). Another feature is the scoring system. A user can get scores based on votes of questions and answers she posts. This is a quite good incentive for users to offer there knowledge, because the score reflects one's progress and ability, and can be used in job interview. This second feature separates StackOverflow.com from traditional mail list based Q & A.


However, this reward mechanism also comes with an issue which I call it the disadvantage of late members. That is, it is more difficult for a late member to get the same amount of score than an early member. This is mainly because early questions are in general more important yet easier to answer than late questions. For example, user X asks how to sort a dictionary in Python. User Y easily answers this question. Since this question is quite common for new Python users, we can expect that many user will find this answer and give it a vote up. Y thus earns high score simply through a simple answer. Now after one year, Z enters StackOverflow.com. Z is much more knowledgeable in Python than Y, however, there are not such easy and rewarding questions for Z now. So Z's score might never go beyond Y's.

This issue could hurt the participation of such sites. New comers could lack strong incentive to make contributions, because they can hardly find questions to answer, and they cannot catch up with the early members anyway.

We can think about how to address this issue. I propose two simple ideas here. The first idea is to reduce the score of early members overtime. However, reducing one's "possession" sounds bad and might hurt the early members. A slightly different idea is to introduce inflation. That is, the Q & A site should increase the score for a vote up overtime, thus giving more scores to new members. Maybe the inflation in economy also serves similar purpose. After early some early people have accumulated a large amount of wealth, it would be hard for late people to catch up, because the rich people can have better return simply through safe investments such as government bond. Inflation, in some degree, could alleviate this issue.

Actually, the solution used by StackOverflow.com now is to have a recent ranking (e.g. in past year) together with overall ranking. The recent ranking would be a fairer play ground for everybody. However, the recent ranking is not as stable as the overall ranking, so I think the inflation idea still make sense. Of course, StackOverflow.com can complement the recent ranking with permanent badges (e.g. Top 1 in one month badge).

A related article:

Why I no longer contribute to Stackoverflow
http://michael.richter.name/blogs/why-i-no-longer-contribute-to-stackoverflow/


Saturday, November 22, 2014

A Tale of Crowdsourcing and Diversity




Crowdsourcing is a hot topic recently, and I think it is a promising paradigm for solving a lot of problems. Diversity is an important phenomenon in complex systems such as the human society, and I am very interested in understanding the nature of it. Some of my previous blog posts, such as The Amazing Diversity is dedicated to this topic. In addition, our recent paper on a vulnerability disclosure program is also inspired by these two keywords. But how are these two concepts connected?

In the summer, I have read a famous Chinese Wuxia book called the Ode to Gallantry (侠客行). I found that the tale in the book serves as a perfect example for understanding crowdsourcing and diversity. I will very briefly introduce the story here, and please stop reading if you don't want to see this spoiler.


Sometime during the ancient China, many kongfu masters will be hijacked to a mysterious island every 10 years by some mysterious guys. These masters never return. It turns out that two top kongfu masters have obtained an old martial art book with undeciphered text. They have tried hard to understand the meaning of the book, but failed. Therefore, they decide to invite (or hijack) kongfu masters and ask them to decipher it (crowdsourcing). These masters, unwilling to go to the remote island at first, will soon be attracted by the book and concentrate entirely on the decipher task. However, many years have passed and still no one has figured it out.

It is not surprise that the protagonist of this story solves the problem. The unique advantages of him are:
  • He is illiterate. Therefore, he tends to understand the writing as graphs.
  • He has seen a graph-based kongfu book before, and this further guides him to interpret the text as graph.
  • He has little knowledge of kongfu before, and therefore does not have much prejudices and biases (the Einstellung effect).
Then, the protagonist learns the super kongfu in the book and become invincible in the world.


We can see that the initial crowdsourcing effort fails because there is a bias. And this bias is overcame by increasing the diversity of the crowd. Here, the protagonist is drastically different from the rest and thus significantly increases the diversity of the pool. This is one reason why diversity is important to crowdsourcing.

In general, crowdsourcing is still in its infancy and we are still exploring the meaning of diversity. There are many questions to be answered.



Reference:

[1] The picture. http://pjh568.gotoip2.com/data/attachment/forum/201207/28/101235i5w4ir1hj4ik2g45.jpg






Sunday, November 2, 2014

On Academic Presentation



(v0.1)

I will present our paper at a CCS workshop next Friday. Then I will present my thesis proposal in the comprehensive exam next next Friday. Facing these two important occasions, I decide to summarize my current understanding on presentation. This is NOT a collection of advises, because I am far from a good academic speaker. I simply hope this article may raise some discussions and help you think about what will lead to a good academic presentation.

Here I have several points to share:

(1) A clear story flow in the presentation is of top priority. The flow can grasp the attention of the audience. As others have said [1], the flow is much more important in slides than in paper, because the audio channel is more brittle. In addition, a good flow will also help the presenter to remember what to say.

I think there are at least two types of flows:
  • The logic flow of research. The audiences should know the natural transition between research steps. Thus they will appreciate the work. 
  • The knowledge flow. We need to introduce enough background before going into details. Also, make sure that terms etc. are understandable.
In addition, try to only have one story line. It is true that a research project usually expands to several branches. But they will interrupt the flow and confuse the listeners.

(2) Presentation is a process of convincing others. The listeners will be convinced if the study is rigorous and the language is accurate. Do not over claim.

(3) Make the presentation tight. Try to connect things together. Try to refer back to previous important points. This actually improves the complexity of the presentation structure, and people enjoys complexity. Similar strategies have often being used in movies. Lock, Stock and Two Smoking Barrels is a perfect example.

(4) Presentation is also a form of teaching. Try to think what the audience will learn from it.

(5) We have our own styles in presentation. I feel it is in general hard to copy other's style. For example, native-English-speakers can talk about jokes and funny pictures (e.g. the one used in this blog), which are sometime hard to understand, not to mention to speak, by non-native speakers. Nonetheless, even without these funny elements one can still make a good talk. I sometime think too much "fun" will actually have negative effects, i.e.,  "amuse to death".

Here are general steps I take for preparing a presentation. Please feel free to comment on them and provide your own opinions:

(1). Have a rough story line first.

(2). Turn the story line into slides. Focus more on the completeness of the information.

(3). Practice lightly and then update the slides. At this stage don't expect them to be perfect. Also take a look at similar talks to "steal" good presentation ideas.

(4). Write the scripts for all slides. At least write outlines for each slide. You don't need to read them, but you need them to remind you about the story line. Also, written text is easy to be studied and improved.

(5). Practice seriously.

(6). Present to others. Your adviser or research collaborators are the best choices. They know your research, but they are not trapped by myriad of details like you. So they can give very good suggestions on improving the story line! People with enough knowledge background (e.g. your lab mates) are also good. They can tell you which part is unclear or confusing. Also, try to collect creative ideas of presentation from others.

(7). Improve slides, practice, improve slides, ....

In general, you will feel unconfident and uncomfortable in the beginning, because the quality of your taste is always ahead of the quality of your work [2]. However, as long as you keep improving it, the final version will be very good. Furthermore after a well preparation, you will not only have a great talk, but also find new research ideas!





References:

[1] 博士五年总结(三), http://blog.sina.com.cn/s/blog_946b64360101dych.html

[2] Ira Glass on Storytelling, http://vimeo.com/24715531

[3] The picture. http://assets.diylol.com/hfs/ae1/38e/525/resized/business-cat-meme-generator-boss-wished-me-luck-on-the-presentation-like-i-need-it-52c717.jpg

Saturday, October 11, 2014

English Name or Not?




(v0.1)

As a Chinese student in America, an important question to ask is: should I choose an English (first) name? Those who against this idea usually provide the following points:
  • The original name defines your identity.
  • You should respect the original name because it is given by your parents.
  • If I am good, others will correctly pronounce and remember my name anyway. 
Some of my American friends, Indian friends and Chinese friends are holding these points. Sometime ago, I've also watched a Youtube video in which an American student advocates these points to some Taiwan students.

Other people, such as Philip Guo [2], support the idea of choosing an English name when moving to an English-speaking country.

And here is my opinion: although I currently do not have an English name, I agree that Chinese students (or possibly other East Asian students) studying in America should find a English (first) name. Obviously, the English name is easier to pronounce and to remember by both the natives and students from other countries. The English name can also tell the person's gender, which in some situations are more convenient. I guess the reason that most Indian students don't choose an English name because their original name is relatively easy to pronounce and already tells the gender, at least based on my experience. After all, English and Hindi both belong to the family of Indo-European languages.

Furthermore, I disagree with the three points that are against finding an English name. To refute them, we can look at the opposite direction: what did some Westerners do when they were in China. During the Age of Discovery, many Jesuit priests came to China and played an important role in the communication between civilizations. These priests all used Chinese names, such as 利玛窦 (Matteo Ricci, the man in the above figure),汤若望,郎世宁, which are still known by many Chinese today.

Also, having a second name is actually a part of traditional Chinese culture. Ancient Chinese people use their style name (字), rather than their real name in the daily lives. And it is actually impolite to call one using the real name. It is not a bad idea to consider the English name as a style name.


References

[1] The picture, http://www.faculty.fairfield.edu/jmac/sj/scientists/riccimap.gif

[2] http://www.pgbovine.net/choosing-english-name.htm

Saturday, September 20, 2014

A Quick Analysis of Facebook Bug Bounty Program




(v2, updated 10/15/2014)

Nowadays, Web companies have been relying on vulnerability reward programs (VRP, also called bug bounty programs) to discover vulnerabilities in their products. Basically, a white hat (good hacker) can submit a vulnerability discovery report and then get some money back. We have written a preliminary paper analyzing a related program called Wooyun, and please take a look if you are in general interested in this new paradigm of improving security.

Facebook is one of the companies that embrace this idea, although Facebook is generous sometime (see this and this), :P. FB also hides information about what vulnerabilities have been discovered, or the details of each white hat's accomplishment (e.g. how many vulnerabilities one has discovered, and when). FB only provides a list of white hats who have contributed to Facebook security every year, at this page.

Anyway, we can start with this page and do some quick analysis. The data is obtained by 9/20/2014. First, there are 670 names on the list (there are several cases when multiple names appear in one line and separated by commas, and we will count each name alone). Quite a lot, isn't it? But it is possible that some enthusiastic white hats contributed every year and leave their name multiple times, so we also count the number of unique names, which is 516.

Next, we count the number of white hats each year, shown in the following table:

TimeWhite Hat Count
2014 (up to 9.20)191
2013255
2012126
201155
Prior to 201143

We clearly see the trend: more and more players are joining this game, and the number roughly doubles every year:) I guess VRP is really a promising idea (please see our paper for more discussions).

There is also an interesting fact: a lot of white hats are only active in one year. To show this, we create another table counting the white hats based on number of years being active:

Number Years being ActiveWhite Hat Count
1402
282
326
45
>=51

So far, there are 402 who have only appeared in one year's thank list. And we can see that the white hat count distribution is highly skewed. Much few white hats are active for more than one year. And there is only one person who has been thanked all the time! This probably shows that the value of this kind of VRP not only lies in a few experts, but also in a large number of people. But since we don't know how many vulnerabilities each white hat contributes and the severity of them, the conclusion is hard to make. Still, this observation is consistent with what we claim in our paper.

You might wonder who is the "all the time" person, and the answer is: Szymon Gruszecki. You can access his personal page here.

Please feel free to discuss by leaving a message. Thank you for your time!


Update:

Facebook has released some interesting statistics of its bounty program here:
https://www.facebook.com/notes/facebook-bug-bounty/bug-bounty-highlights-and-updates/818902394790655

Some interesting points:

  • From the statics we see that there is a huge number of invalid reports. The valid rate is only 4.7%. Why?
  • It says that "One of the most encouraging trends we've observed is that repeat submitters usually improve over time. It's not uncommon for a researcher who has submitted non-security or low-severity issues to later find valuable bugs that lead to higher rewards." Actually, we plan to investigate this issue further in our data set.
  • The country rank: Russia -> India -> USA -> Brazil  -> UK





References

[1] The picture. http://america.aljazeera.com/content/dam/ajam/images/shows/Real%20Money%20with%20Ali%20Velshi/SG_FB2_1460.jpg



Saturday, August 23, 2014

Bugs and Patches for Papers



In an earlier article, Writing Like Compiling, I have made some connections between programs and papers. This article makes a connection from a different perspective.

For publications in Computer Science (and possibly other domains), there is a problem. A paper could contain bugs: errors, unclear sentences, missing backgrounds, etc. These bugs might caused by the knowledge gap between the authors and the readers. Or they simply arise due to the conference-driven publication paradigm. Such paradigm puts researchers on a fast race and leave them less time to ponder and polish their work [2]. These bugs inflict readers minds and eats up their time. Some smart readers might find ways to fix those bugs, just like an advanced user finds a bug in a program and makes a patch for it. However, since there is not good way to share the fix, and a paper is usually fixed after the camera-ready version, this paper patches only stay in a paper copy as some red marks...

Programs, too, are not perfect after release. However, software developers and users will constantly discover new bugs and apply corresponding patches. And this model generally works well. After all, there seems to be no alternative way. Therefore, I think we need to treat papers as programs, and creates ways for reporting bugs and sharing patches. A first step is to store these paper bugs and patches in some database and enable readers to search for them. However, we want to avoid the detachment between the papers and the patches, so we could allow some energetic readers to fork a paper and make version 2.0 of that paper. This fork ability is very common in the opensource software community [3].

In general, isn't it a bit ironic that Computer Science, the field that aims to digitalize paper-based information, still record its cutting-edge findings in papers?


References

[1] The picture. http://www.rebeccaheflin.com/wordpress/wp-content/uploads/2013/08/rejected-writing.jpg

[2] Fortnow, Lance. "Viewpoint Time for computer science to grow up." Communications of the ACM 52.8 (2009): 33-35.

[3] Raymond, Eric. "The cathedral and the bazaar." Knowledge, Technology & Policy 12.3 (1999): 23-49.

Saturday, July 12, 2014

Two Downsides of Privacy



v0.2

These days people are all talking about privacy, partly thanks to Mr. Snowden's effort. While I definitely support the the individual right of privacy, I want to talk about two downsides of privacy here. But again, I strongly agree that each individual should have full control of her or his information. I just think sometime we might want to trade privacy for more important things.

The first downside is that privacy could lead to distrust. An often used example for supporting privacy is: you got drunk one night and shared a photo of drinking on Facebook. While your friends might like it, your boss doesn't. So people have been designing advanced access control technologies to keep your boss away from your "little secrets". Some people might even avoid using Facebook, considering that some companies require employees' Facebook passwords [2]. However, such personal information disclosure could help others understand you more and thus enhance the relationship. On the other hand, if a person cannot be traced at all on the Internet, he or she will be a mystery in others' eyes. And will you trust a mysterious figure?

The second downside is that privacy might obliviate a person. From East to West, from past to present, an eternal pursue of the mankind is immortality. At least, a person wants to leave something to this world after death, which could only be achieved by a small group of people in the past. This digital age enables immortality to everyone, in the sense that their words and activities on the public Internet can be recorded and kept almost forever. Search engine could retrieve one's words and return to somebody in the future, and we could imagine it as a kind of conversation between the dead and the live. However, privacy would make these information secret to only a few people or even one person, through technologies like encryption. If that person happened to pass away, then his words also gone with him, if no one else knows the password. What a pity if Einstein II encrypted his remarkable theory and then passed away accidentally. Would it be better to publish it on a blog, like this blog?

Update: we all feel really sad about the MH17 tragedy. It has been said that there are more than 100 AIDS researchers on board. And I hope we can rescue their ideas and thoughts as much as possible.



References:

[1] The picture. http://blog.static.abine.com/blog/wp-content/uploads/2011/10/privacy.jpg?e835a1

[2] http://www.usatoday.com/story/money/business/2014/01/10/facebook-passwords-employers/4327739/