ChinesePod dialogues for HSK 5

Posted on April 2, 2018

ChinesePod is a website for students of Mandarin Chinese. It offers dialogues organized by level (from newbie to advanced); each dialogue is accompanied by an explanation, a list of vocabulary, etc. It is an excellent resource, which I can highly recommend, especially to improve your listening skills.

The only minor downside is that the dialogues offered by ChinesePod are not organized by HSK level, nor are they particularly designed to cover the HSK vocabulary. I therefore previously constructed lists of dialogues to cover the HSK 1, HSK 2, and HSK 3 levels; these lists are now also available directly from ChinesePod. I previously also constructed a list for HSK 4. This new post introduces a list for HSK 5.


If you just want to use the list and are not interested in how I constructed it, just continue to the next section.

When constructing the lists for HSK 1-4, part of the work was done by a software program that I wrote for this purpose (available on GitHub), and part of the work I did by hand. The analysis wasn’t fully automatic for reasons I explained before. However, this approach is no longer feasible for HSK 5, where the number of words is simply too large.

To construct the list for HSK 5 therefore I changed tactics a bit. Rather than focusing in words, I instead focused on the characters covered in each HSK level. I then assigned a score to each ChinesePod dialogue defined as:

number of dialogue characters from HSK 5
total number of dialogue characters - number of dialogue characters from HSK 1-4

This means that the score is higher for dialogues with a lot of HSK 5 vocabulary and lower for dialogues with a lot of characters from HSK 6 or outside the HSK vocabular entirely. (The score only counts different characters.)

I then sorted all dialogues by ChinesePod level, score, and release date (in that order), and finally limited the search to dialogues that covered a minimum of 3 new words (given all the dialogues that came before them), a maximum of 10 characters that don’t appear in HSK at all (the average is actually less than 1 non-HSK character per dialogue), and a maximum ChinesePod level of Upper Intermediate.

The code for all this is open source and available from GitHub.

HSK5 (145 dialogues, covers 83% of the characters in HSK5)

This covers 518 out of 621 characters in HSK 5. The 103 characters that aren’t covered are: 伟伸俊倡兑凌删勿匆厢县叙召吨唉喷嗯嘉奈妙妨姑姥姿娱娶媒嫩宴寿屿帘幼库恢恨悄愁慧憾抖捡措描敏昆柔柴桃桔梨梳棋歇泛泪滚漠炭熬燃猪猾玉皂碍窄竹筑粘糙纲纷绳翅耽胁胶舅艰虹裔裹诊诗豫贷跃辑迅返逐逗逻钓阻陌颗飘馒骤髦齿.

For Pleco users I also constructed a Pleco flashcard file that contains the key and supplementary vocabulary from these dialogues (for the supplementary vocabulary the flashcard file only includes the vocabulary that actually appears in the dialogue).