Coded and categorized: analyzing a sample of 219 blocked Weibo words

Thought I ought to cross-post something I just added to my Blocked on Weibo site:

Back in December, after I’d completed searching through half of my 700,000 word list, I decided to look more closely at what kind of words were being blocked. I used the 218 two- and three-character words that I’d uncovered at the time to be blocked only 2 one-character word are blocked: 屄, cunt / ; and ҉, a Cyrillic character that is associated with backwards or bi-directional writing) as a sample and then proceeded to tag them according to whatever categories I began to see developing. (The categories are at the end of this post and on the second page of the spreadsheet as well.)

As would be expected, most of these three-character and under keywords were names of people (most Chinese names are made up of a one character surname and a one or two character given name). 87 of the 219 were names of people, and the vast majority of those people, 54, were CCP members. Nine of them were involved with either corruption or other controversy in which they were usually dismissed. Fifteen of the people are dissidents of various sorts.  Three are criminals who were neither dissidents nor CCP politicians and are probably listed because their crimes were so gruesome.

Because of the way Weibo censors items, inoffensive words are inevitably caught in the net. For example, “grand justice” (大法官) was blocked because it contains 大法, i.e. Falun Dafa. Other inadvertent blocks included Théodore de Banville(庞维勒) because it contains 维勒, a reference to Uyghurs.

Words related to sex or sexual activity also compose a great deal of the list. This include anatomy like 女阴, slang like 吹箫 (blowjob, but literally blow flute), and “immoral” sex acts like 恋足, foot fetishes. Discrimination that would be considered wildly un-progressive in the West is also on display. Lesbian and homosexual (同性爱) are blocked, as are Islam (伊斯兰) and Muslim (穆斯林).

Finally, the list of blocked words on Weibo is actively adjusted and changed. These words were blocked at the time of their search in November and December, but most of this list has been unblocked since late-January.

Here are the categories I used to code this list:

? – I’m unsure of why it is blocked [1]
person – a fictional or real human with a name
place – geographical location or named geographical body
“clean” slang – slang or abbreviated phrases that are not “obscene”
sex – related to sex, sexual activity, or sexual organs
obscenity – obscene phrases
morality – words dealing with actions that might be considered “immoral”
crime – self explanatory
scandal/controversy/corruption – self explanatory
religion/”cult” – self explanatory
CCP – Chinese Communist Party
princeling – deals with children of CCP members
demonstration – related to a mass protest
dissent – related to ideas or thoughts conflicting with CCP
force / violence – related to weapons, shows of force, or violence
minority – dealing with minorities or ethnic groups in China
nonCCP politics – other political issues not related to CCP
foreign – self explanatory
internet/computer – self explanatory
other media, content, and art – self explanatory
history – either historical figures or events
inadvertant? – it is blocked because a word within it is blocked

[1] If other categories are checked off, it indicates my best guess.

About Twitter and Google censorship: another Waging Nonviolence article

Wrote up a short article for Nathan and Eric’s blog again, this time on Google and Twitter’s recent announcements that they would begin restricting content more tightly in foreign countries.

Nathan pointed me to a very astute comment by Thomas Clark Wilson on WNV’s Facebook page, which I thought was worth reprinting here:

The actual details of what they’re doing will make it easier to circumvent censorship than it is just now. Tweeters in, say, Egypt, don’t just rely on tweets within their own country to organise, but also RTs and such from foreign accounts. Before this policy; censor a tweet in Egypt and nobody can see it. After this policy; censor a tweet in Egypt and some American can still see it, quote it, and send it back into the fray.

Besides, the contention in this article that ‘they might bend to shady censorship requests even though they say they’ll play nice’ has *always* been a danger, this policy hasn’t changed that fact. Yeah it would be super swell if they took a stand an said no to any censorship, but twitter’s a business not a revolutionary tech collective. Recognise it as a tool, an use it wisely.

My response, just for posterity’s sake:

That’s a great point Thomas. Indeed, this isn’t some sort of smoking gun revelation, merely an acknowledgment and reminder from Twitter that they are a business and not some sort of utopia maker.

As for circumvention methods, other ways potentially include setting your home country to say the U.S. so you don’t have restrictions. But we have to assume that Twitter and any governments that would want to utilize Twitter’s restrictions are obviously aware of these limitations that would make the whole thing pointless and would implement things like IP detection. Also, floods of foreign retweets might make it too hard to stomp out every one, but for smaller, budding movements these tools do allow governments to snuff out incohate organizing. And even if a few foreign retweets get out, location shouldn’t matter. They would in all likelihood also be blocked because they would bear the same illegal content as the original local one. This would just be the first step of potential cat and mouse games, with governments likely requesting that Twitter move faster to remove such posts and with Twitter employing more active monitoring–once something has been blocked in a country and it knows the government wants future instances of such content blocked, it could employ some kind of flagging system to warn employees that this is another potentially illegal post and give them a chance to take instant action when requested.

But if none of this comes to be, then Twitter is the good guy, and we have no major reason to expect them to bend over backwards to regimes and governments whose values are so antithetical to those of the Internet’s. But one has to look at the trajectory here–this is a step in the other direction from an open Internet. And wouldn’t it be foolish for Twitter to take such a PR hit without following through on actually providing a credible oand functional option to regulate content in countries? Why make yourself look so bad and buddy up to censorship regimes when you know you’re only going to half heartedly enforce these sorts of things.

But your points are well taken. In the short-term Twitter should still be safe and useful, but this announcement definitely makes the future of Twitter as a revolutionary tool cloudier.

That’s all I got for now. Still working mightily on Blockedonweibo (now reachable via www.blockedonweibo.com) which I hope to start sharing more publicly in the next week or two (once I finish writing up a short summary for Nathan and Eric at WNV), so things will probably once again go quiet here for a bit.

Blocked on Weibo

I guess it’s about time I “unveil” what I’ve been working on for the past few weeks: Blocked on Weibo is a site where I’ll be posting words that you can’t search for on Weibo. I go into more detail on the site, but essentially, I’ve developed a computer script that is searching Weibo as we speak, uncovering words which return the bureaucratese “Sorry, due to relevant regulations and laws your results cannot be shown.” For instance, you might be amused to learn that “Mao bacon” is a banned search term on Weibo.

毛腊肉 (“hair bacon” / mao larou) is a reference to Mao’s embalmed body in the Mausoleum of Mao Zedong in Tian’anmen Square, Beijing. The character mao means hair, but is also Mao’s surname. larou commonly refers to bacon, but literally means “preserved meat.” Thus, the preserved meat of Mao: his embalmed body. The term is generally used in a derogatory fashion.

Why it might be blocked: Referring to Mao as a slab of meat is undoubtedly offensive to a government that still officially reveres the Great Leader, though only 70% of the time.

More: Among the top results for 毛腊肉 is a facetious recipe for how to prepare Mao Bacon. The instructions (a rough translation):

A fierce boar from the Huguang province [the pre-Qing name for Hunan and Hubei, where Mao lived—ed]: First, empty the internal organs and wash with 7 kg of salt, 0.2 kg nitrate (?), 0.4 kg pepper. For the deboned meat, use 2.5 kg salt 2.5, 0.2 kg fine nitrate, 5 kg of sugar, 3.7 kg of baijiu and soy sauce mixed, 3-4 kg of water. Optional ingredients that can be added prior include salt and crushed pepper, fennel, cinnamon and other spices; dry and flatten, seal up well and bathe in Chinese medicine for three days, until the surface fluffs up, that way the seasoning penetrates through the meat. Then disinfect it with alcohol and dry in the sun. [followed by various descriptions of how to eat/what it tastes like]

I’m currently finding new blocked words every day. I’ll try to post a new one every day or two. Hope you find it interesting.

The Disconnect Between Virtual and IRL: Han Han’s Dismissal of Online Protest

(If you’re already familiar with Han Han, feel free to jump ahead.)

Today, the Nobel Committee celebrated the Arab Spring by awarding a share of the Peace Prize to Tawakel Karman. Though Ms. Karman is certainly deserving of the award, it very well might have gone to any number of other individuals without argument; it could have also been awarded to the entire nations of Tunisia, Egypt, and others, ala Time’s “You” as Person of the Year, without much fuss as well (though cutting the check 200 million ways might be an administrative nightmare for Oslo).

Though Karman’s role as an inspiring leader in Yemen is indisputable, our elevation of people like her and Wael Ghonim and others as first among equals taps into something deeper than just a prize announcement: this is our desire for simple narratives of Great Men leading forth mass movements. Deviating from this storyline introduces (or, rather, accurately reflects) complexity and the possibility of the message being clouded or misrepresented—see the Occupy Wall Street protests as an example, with the mainstream media at wit’s end trying to figure out how to sell the story of varied, collective action.

Similarly, in recent years, those outside China have anointed a series of Great Democratic Hopes in China. In the run-up to last year’s Nobel Prize announcement, interest spiked in Liu Xiaobo and Charter 08. Then it was Ai Weiwei’s turn to get the hero treatment and coverage as the man leading masses of Chinese netizens toward the path to freedom.

Now revolution seems to rest in the hands of Han Han, he of the Evan Osnos New Yorker profile (behind a paywall, but well worth trekking down to your local library for): a renegade cultural icon, wise-ass blogger, and dogged reporter of government corruption. The voice of the Chinese youth is another way to put it. Though his name recognition here may still be on the low end, he’s more popular than Ai and Liu across the Chinese-speaking world and heralded at home as the next Lu Xun, one of China’s most important intellectuals and critics of the past century and so popular that he’s able to get away with writing just about whatever he wants–though especially hard-hitting posts often get deleted after the fact, but not after they’ve already been re-posted thousands of times over.

Which brings us to this recent interview he did with Channel News Asia (fast forward to 17:45):

Amid this mostly vapid interview suffused with social media buzzwords and one especially curious moment* where Han Han explains why he rejected an offer to meet with President Obama (see Will Moss of Imagethief for an insightful breakdown of Han Han’s answer), his dismissive comments on the illusory nature of online protest are seemingly shocking for those who expect Han Han to fit neatly into their preconceived notion of online freedom fighter: Continue reading

Weibo and 星期 vs 礼拜

Just a note to myself that WEIBO IS AMAZING. After throwing up my hands at Twitter’s worthless search functionality* (Google’s discussion search is useful, but no holy grail), it is a pleasure to use something this intuitive, even if I have to re-translate the whole thing into my second language, I daresay it still makes more sense than Twitter does. I’m playing around right now with all sorts of things, including working with my friend on writing some simple code for searching for banned keywords. For instance, searching for “艾未未” (Ai Wei Wei) yields this hilariously transparent message:

根据相关法律法规和政策,搜索结果未予显示。热门微博推荐 (Rough translation: According to laws, legislation, and policies, the search results are not shown. We recommend blogging about popular things. [emphasis mine; literal translation of 热门.)

Pussyfooting around this Weibo is not.

So, interesting results so far?

1) As someone who grew up speaking Taishanese, laibai (pinyin: libai; 礼拜) was my word for week (eg, 礼拜一 for Monday) while sengkay (xingqi; 星期) was reserved for newscasters and certain older speakers. While speaking with my language partner, who is Taiwanese, she almost exclusively used 礼拜 as well. But as any student of Mandarin today, 星期 is the standard word and 礼拜 seems to have developed a religious connotation.** But for the most part, they are semantically equivalent, and thus, variations in usage appear to simply be either a) regional b) generational or c) context (informal or more formal). It’s sort of (emphasis on sort of) like the great American debate between soda versus pop, and with Weibo, you don’t have to actually design and tabulate a survey of who uses what where; the data is all already up online, coded by gender, age, and location.

It’ll take some time to scrape some of this data (no way in hell I’m going to sit here and do this by hand; but the sad thing is that it probably will take me just as long to figure out how to code the script to do what I want… sigh), but preliminary results:

礼拜天: 251748 results 星期天: 2461924
礼拜日: 48962 星期日: 1115494
礼拜一: 272460 星期一: 3436480
礼拜二: 78241 星期二: 1336238
礼拜三: 88890 星期三: 1287634
礼拜四: 91038 星期四: 1272936
礼拜五: 245327 星期五: 3157894
礼拜六: 253894 星期六: 3177002
礼拜七: 2664*** 星期七: 50031***

All right! And because the deputy likes dots, here it is in visual form:

So it’s official, on Weibo, Monday is the most popular day, closely followed by Saturday and Friday. Wednesday and Thursday are in a dead heat for least popularly cited. Curious what a similar chart would be on Twitter… oh wait, I can’t generate one. Dur. (Though I guess you could use Google to get a rough estimate, but those aren’t hard numbers like these on Weibo.)

Future project would be to do similar analysis of paired words like this, and to further dig into the data and figure out where these libai users come from and what similarities they share.

*What is it with Web 2.0 folks and broken search? That was aimed at you Tumblr, get your act together.


***Not a real date, but just curious to see if it’s used. I’ll have to go back and analyze what it actually means when people say Seven-day.