Tag Archives: weibo

Coded and categorized: analyzing a sample of 219 blocked Weibo words

Thought I ought to cross-post something I just added to my Blocked on Weibo site:

Back in December, after I’d completed searching through half of my 700,000 word list, I decided to look more closely at what kind of words were being blocked. I used the 218 two- and three-character words that I’d uncovered at the time to be blocked only 2 one-character word are blocked: 屄, cunt / ; and ҉, a Cyrillic character that is associated with backwards or bi-directional writing) as a sample and then proceeded to tag them according to whatever categories I began to see developing. (The categories are at the end of this post and on the second page of the spreadsheet as well.)


direct link

As would be expected, most of these three-character and under keywords were names of people (most Chinese names are made up of a one character surname and a one or two character given name). 87 of the 219 were names of people, and the vast majority of those people, 54, were CCP members. Nine of them were involved with either corruption or other controversy in which they were usually dismissed. Fifteen of the people are dissidents of various sorts.  Three are criminals who were neither dissidents nor CCP politicians and are probably listed because their crimes were so gruesome.

Because of the way Weibo censors items, inoffensive words are inevitably caught in the net. For example, “grand justice” (大法官) was blocked because it contains 大法, i.e. Falun Dafa. Other inadvertent blocks included Théodore de Banville(庞维勒) because it contains 维勒, a reference to Uyghurs.

Words related to sex or sexual activity also compose a great deal of the list. This include anatomy like 女阴, slang like 吹箫 (blowjob, but literally blow flute), and “immoral” sex acts like 恋足, foot fetishes. Discrimination that would be considered wildly un-progressive in the West is also on display. Lesbian and homosexual (同性爱) are blocked, as are Islam (伊斯兰) and Muslim (穆斯林).

Finally, the list of blocked words on Weibo is actively adjusted and changed. These words were blocked at the time of their search in November and December, but most of this list has been unblocked since late-January.

Here are the categories I used to code this list:

? – I’m unsure of why it is blocked [1]
person – a fictional or real human with a name
place – geographical location or named geographical body
“clean” slang – slang or abbreviated phrases that are not “obscene”
sex – related to sex, sexual activity, or sexual organs
obscenity – obscene phrases
morality – words dealing with actions that might be considered “immoral”
crime – self explanatory
scandal/controversy/corruption – self explanatory
religion/”cult” – self explanatory
CCP – Chinese Communist Party
princeling – deals with children of CCP members
demonstration – related to a mass protest
dissent – related to ideas or thoughts conflicting with CCP
force / violence – related to weapons, shows of force, or violence
minority – dealing with minorities or ethnic groups in China
nonCCP politics – other political issues not related to CCP
foreign – self explanatory
internet/computer – self explanatory
other media, content, and art – self explanatory
history – either historical figures or events
inadvertant? – it is blocked because a word within it is blocked

[1] If other categories are checked off, it indicates my best guess.

Blocked on Weibo

I guess it’s about time I “unveil” what I’ve been working on for the past few weeks: Blocked on Weibo is a site where I’ll be posting words that you can’t search for on Weibo. I go into more detail on the site, but essentially, I’ve developed a computer script that is searching Weibo as we speak, uncovering words which return the bureaucratese “Sorry, due to relevant regulations and laws your results cannot be shown.” For instance, you might be amused to learn that “Mao bacon” is a banned search term on Weibo.

毛腊肉 (“hair bacon” / mao larou) is a reference to Mao’s embalmed body in the Mausoleum of Mao Zedong in Tian’anmen Square, Beijing. The character mao means hair, but is also Mao’s surname. larou commonly refers to bacon, but literally means “preserved meat.” Thus, the preserved meat of Mao: his embalmed body. The term is generally used in a derogatory fashion.

Why it might be blocked: Referring to Mao as a slab of meat is undoubtedly offensive to a government that still officially reveres the Great Leader, though only 70% of the time.

More: Among the top results for 毛腊肉 is a facetious recipe for how to prepare Mao Bacon. The instructions (a rough translation):

A fierce boar from the Huguang province [the pre-Qing name for Hunan and Hubei, where Mao lived—ed]: First, empty the internal organs and wash with 7 kg of salt, 0.2 kg nitrate (?), 0.4 kg pepper. For the deboned meat, use 2.5 kg salt 2.5, 0.2 kg fine nitrate, 5 kg of sugar, 3.7 kg of baijiu and soy sauce mixed, 3-4 kg of water. Optional ingredients that can be added prior include salt and crushed pepper, fennel, cinnamon and other spices; dry and flatten, seal up well and bathe in Chinese medicine for three days, until the surface fluffs up, that way the seasoning penetrates through the meat. Then disinfect it with alcohol and dry in the sun. [followed by various descriptions of how to eat/what it tastes like]

I’m currently finding new blocked words every day. I’ll try to post a new one every day or two. Hope you find it interesting.

Weibo and 星期 vs 礼拜

Just a note to myself that WEIBO IS AMAZING. After throwing up my hands at Twitter’s worthless search functionality* (Google’s discussion search is useful, but no holy grail), it is a pleasure to use something this intuitive, even if I have to re-translate the whole thing into my second language, I daresay it still makes more sense than Twitter does. I’m playing around right now with all sorts of things, including working with my friend on writing some simple code for searching for banned keywords. For instance, searching for “艾未未” (Ai Wei Wei) yields this hilariously transparent message:

根据相关法律法规和政策,搜索结果未予显示。热门微博推荐 (Rough translation: According to laws, legislation, and policies, the search results are not shown. We recommend blogging about popular things. [emphasis mine; literal translation of 热门.)

Pussyfooting around this Weibo is not.

So, interesting results so far?

1) As someone who grew up speaking Taishanese, laibai (pinyin: libai; 礼拜) was my word for week (eg, 礼拜一 for Monday) while sengkay (xingqi; 星期) was reserved for newscasters and certain older speakers. While speaking with my language partner, who is Taiwanese, she almost exclusively used 礼拜 as well. But as any student of Mandarin today, 星期 is the standard word and 礼拜 seems to have developed a religious connotation.** But for the most part, they are semantically equivalent, and thus, variations in usage appear to simply be either a) regional b) generational or c) context (informal or more formal). It’s sort of (emphasis on sort of) like the great American debate between soda versus pop, and with Weibo, you don’t have to actually design and tabulate a survey of who uses what where; the data is all already up online, coded by gender, age, and location.

It’ll take some time to scrape some of this data (no way in hell I’m going to sit here and do this by hand; but the sad thing is that it probably will take me just as long to figure out how to code the script to do what I want… sigh), but preliminary results:

礼拜天: 251748 results 星期天: 2461924
礼拜日: 48962 星期日: 1115494
礼拜一: 272460 星期一: 3436480
礼拜二: 78241 星期二: 1336238
礼拜三: 88890 星期三: 1287634
礼拜四: 91038 星期四: 1272936
礼拜五: 245327 星期五: 3157894
礼拜六: 253894 星期六: 3177002
礼拜七: 2664*** 星期七: 50031***

All right! And because the deputy likes dots, here it is in visual form:

So it’s official, on Weibo, Monday is the most popular day, closely followed by Saturday and Friday. Wednesday and Thursday are in a dead heat for least popularly cited. Curious what a similar chart would be on Twitter… oh wait, I can’t generate one. Dur. (Though I guess you could use Google to get a rough estimate, but those aren’t hard numbers like these on Weibo.)

Future project would be to do similar analysis of paired words like this, and to further dig into the data and figure out where these libai users come from and what similarities they share.


*What is it with Web 2.0 folks and broken search? That was aimed at you Tumblr, get your act together.

**Sources
http://www.antimoon.com/forum/t12609-0.htm
http://www.italki.com/answers/question/67987.htm
http://www.cjvlang.com/Dow/official.html
http://www.languagehat.com/archives/001550.php
http://www.huayuqiao.org/articles/huangheqing/hhq16.htm
http://ks.cn.yahoo.com/question/261033.html
http://www.laohuangli.net/tianti12.html
http://wenda.tianya.cn/wenda/thread?tid=3011eae23f6a3607

***Not a real date, but just curious to see if it’s used. I’ll have to go back and analyze what it actually means when people say Seven-day.