Cantonese tones explained: a beginner's guide to all six tones

Cantonese tones are the single most important thing to get right when learning the language. Unlike English, where tone mainly conveys emotion (rising pitch for questions, falling for statements), Cantonese uses tone to distinguish meaning. Say the same syllable at a different pitch, and you get a completely different word.
This guide breaks down all six Cantonese tones with examples, practical tips, and common mistakes to avoid.
Why tones matter in Cantonese
Consider the syllable "si." Depending on which tone you use, it can mean poem (詩 si1), history (史 si2), try (試 si3), time (時 si4), city (市 si5), or matter (事 si6). That is six completely different words from one syllable. Get the tone wrong, and you are saying something entirely different from what you intended.
Native speakers process tones automatically, the same way English speakers process the difference between "bat" and "bet" without thinking. Your brain needs training to do the same, but the good news is that this training is absolutely achievable with consistent practice.
The six Cantonese tones
Tone 1: high level
Pitch: high and flat, like holding a note at the top of your comfortable range. Think of the pitch you use when saying "hmmm" while thinking.
Example: 詩 (si1) meaning poem, 夫 (fu1) meaning husband, 三 (saam1) meaning three.
This is often the easiest tone for beginners because it is simply high and steady. No movement, just hold the pitch.
Tone 2: mid rising
Pitch: starts at a middle pitch and rises to high. Similar to the intonation English speakers use when asking a short question like "What?"
Example: 史 (si2) meaning history, 九 (gau2) meaning nine, 你 (nei5 is tone 5, but) 走 (zau2) meaning to leave.
The key is to start noticeably lower than tone 1 and end at roughly the same height as tone 1.
Tone 3: mid level
Pitch: flat and steady like tone 1, but at a middle pitch instead of high. Think of your normal, relaxed speaking pitch.
Example: 試 (si3) meaning try, 過 (gwo3) meaning to cross, 四 (sei3) meaning four.
Beginners often confuse tone 1 and tone 3 because both are level. The difference is purely height: tone 1 is high, tone 3 is mid. Practice them back to back to train your ear.
Tone 4: low falling
Pitch: starts low and falls even lower. Think of the tone of voice when you say "aww" in disappointment.
Example: 時 (si4) meaning time, 人 (jan4) meaning person, 來 (lai4) meaning to come.
This tone goes down. It starts in the low range and drops. Some learners describe it as a "sad" tone.
Tone 5: low rising
Pitch: starts low and rises to mid. Similar to tone 2 but starting and ending lower overall.
Example: 市 (si5) meaning city, 你 (nei5) meaning you, 我 (ngo5) meaning I/me.
Tone 5 is often the trickiest for learners because it is easily confused with tone 2 (both rise). The difference: tone 2 goes from mid to high, while tone 5 goes from low to mid.
Tone 6: low level
Pitch: flat and steady at a low pitch. Like tone 1 and tone 3 but at the bottom of your range.
Example: 事 (si6) meaning matter, 二 (ji6) meaning two, 食 (sik6) meaning to eat.
This tone sits at the bottom. No movement, just low. Some learners find it helpful to think of it as a relaxed, low hum.
The tone pairs that confuse learners most
Tone 1 vs tone 3: both are level, but tone 1 is high and tone 3 is mid. Practice saying 三 (saam1, three) and then 試 (si3, try) back to back. Focus on the pitch height difference.
Tone 2 vs tone 5: both rise, but from different starting points. Tone 2 starts mid, tone 5 starts low. Practice 九 (gau2, nine) and 你 (nei5, you) to feel the difference.
Tone 4 vs tone 6: both are low, but tone 4 falls while tone 6 is level. Practice 時 (si4, time) and 事 (si6, matter) to hear the falling vs flat distinction.
Practical tips for mastering tones
Listen before you speak. Spend your first week or two focused on hearing tone differences rather than producing them. Listen to native audio and try to identify which tone is being used before checking the answer.
Use Jyutping numbers. The tone numbers (1 through 6) in Jyutping are your training wheels. Every time you learn a new word, pay attention to the number. When you see nei5, that number 5 tells you exactly what your pitch should do.
Practice with minimal pairs. Words that differ only by tone (like the six "si" examples above) are the fastest way to train your ear. Say them in sequence: si1, si2, si3, si4, si5, si6.
Record yourself. Use your phone or an app with speech recognition to record yourself saying toned words, then compare to native audio. YumCha's speech recognition feature does this automatically, giving you instant feedback on whether your tones are accurate.
Do not skip tone practice. It is tempting to rush ahead to vocabulary and grammar, but tones are the foundation. Every word you learn from this point forward has a tone. Getting tones right early means every future word sticks better and is understood by native speakers.
How tones work in context
In natural speech, tones are relative rather than absolute. You do not need to hit exact musical pitches. What matters is the relationship between tones: high vs low, rising vs falling, level vs moving. Native speakers adjust their overall pitch range based on context, emotion, and emphasis, but the relative distinctions between tones remain consistent.
Connected speech also causes some tones to shift slightly. This is called tone sandhi. For beginners, do not worry about this. Focus on getting individual word tones right first, and the connected speech adjustments will come naturally with exposure.
With consistent daily practice, most learners can reliably distinguish and produce all six tones within two to three months. It is a skill that builds gradually, so trust the process and keep practising.


