In this article, the challenges toward a Persian SDS is briefly explained. In all languages, implementation of a SDS has many obstacles but here we only mentioned those which are specific to Persian and have not been of any concern for English. Wherever the phonetic of Persian words were needed the IPA/Persian format is used inside //.
- 1.1. Differences in spoken and written forms
The most important hurdle in Persian SDS's seems to be a large number of differences between spoken and written forms of this language. Some of the most obvious differences are as follows. At the moment there is no robust logical reasoning behind the divisions below, some sections might belong to the same linguistic category, and numbering is done just for later reference and discussions.
In conversational Persian, long vowels are almost simplified to be easier to pronounce. For instance, the words ending with /ɒːn/ or /ɒːm/ are simplified to /uːn/ and /uːm/ respectively. Some examples are mentioned in table 1.
Persian written |
IPA written |
IPA spoken |
خبرخوان |
xabarxɒːn |
xabarxuːn |
مسلمان |
mosalmɒːn |
mosalmuːn |
There is also a very frequent object marker /rɒː/ which in spoken form is normally pronounced as /ro/ or even simpler /o/ attached to the end of the object. Some examples are presented in table2.
Persian written |
IPA written |
IPA spoken |
سیب را خوردم |
sib rɒː xordam |
sibo xordam |
حرف خود را زد |
harfe xod ra zad |
harfe xodesho zad |
Some prepositions usually are removed during daily Persian conversations.
Persian written |
IPA written |
Persian spoken |
IPA spoken |
در را زدم |
dar rɒː zadam |
در زدم |
dar zadam |
هر کدام از ما |
har kodɒːm az mɒː |
هر کدوم ما |
har koduːme mɒː |
This happens in English too. So it falls beyond the purpose of this article, skip it!
Formal, normally written |
Less formal, more used in speech |
نوشیدم |
خوردم |
افزود |
اضافه کرد |
پیشین |
قبلی |
To construct a future verb tense, a specific form of the word "خواستن" is used, for example, "خواهم خورد" which means "I will eat" However since its pronunciation is relatively hard, Persian speakers usually change it to "میخورم" which indeed is another tense: present continuous. In other words, "میخورم" is used in conversations both for the future and for present continuous. Native speakers disambiguate these two forms using other clues in the context. Thus NLU unit must be capable of distinguishing this type of verb tense. Similarly, NLG must be capable of including some clues in the produced sentence to clarify the desired meaning for the listener.
Another problem is that in written Persian, most of the vowels are omitted most of the time. While in English vowels are represented with separate independent letters, in Persian they are said but normally omitted in writing. This omission leads some different words in speech to take the same form in the text. For example, for "wrestling" and "ship", the Persian speakers say 'koshti' and 'keshti' respectively. The written representation of them are "کُشتی" and "کِشتی" in the same respect, however in most KB's, both words are written in the same form: "کشتی", the signs "-ُ" (= o) and "-ِ" (= e) are omitted. Thus a case of ambiguity happens here and NLU must be robust enough to overcome such cases and understand the desired meaning. Not only NLU, but also SLG must be aware of vagueness sources to avoid them and produce clear sentences.
There are some distinct letters that are pronounced the same, leading to another source of ambiguity. For instance, "قالب" (form/shape) and "غالب" (majority) both are pronounced "qaleb". Another case is "حیات" (life) and "حیاط" (yard) where both are said colloquially as 'hajat'.
Unlike English in which interrogative sentences are quite distinguished from declarative ones, by replacing verb and subject, in Persian the same sequence of words can be used for both types; with the only difference that interrogative form is pronounced with a different intonation. For example, these two sentences are written
Declarative |
من سیبو خوردم |
I ate the apple. |
Interrogative |
من سیبو خوردم؟ |
Did I eat the apple? |
As you can see, both normal and questioning sentences are written with the same sequence of words, but the question is pronounced with an increasing intonation. Thus this makes the task of ASR together with TTS harder to distinguish between these two.
In contrast to English in which the position of POS tags are fixed within the sentences, in colloquial Persian words can appear in almost any order. Words that take precedence convey more emphasis in meaning.
من پنجره رو دیشب تو شرکت حتما بستم
پنجره رو من دیشب تو شرکت حتما بستم
دیشب من پنجره رو تو شرکت حتما بستم
بستم من پنجره رو دیشب تو شرکت حتما
حتما من پنجره رو دیشب تو شرکت بستم
...
Here some cultural differences between English and Persian speakers are mentioned. For more insight about this, the interested reader is referred to [1].
-
- 1.2.1. Taarof
It's quite usual in Iranian culture to say something whilst it is not the real intention of the speaker. This is known in Iranian culture as "Taarof". For example, if both sides of the conversation agree to start reading a poem, the user may say "you first" as a polite suggestion to the other side to take precedence. However, if a robot does not take the Iranian culture into account and start reading immediately, it ignores the expected behaviour. Repeated over time, such reactions make a negative impression. Instead, it's expected from the other side to say "no, you first please" even though it isn't the real intention.
-
- 1.2.2. Compliments
“Another example of a Persian taboo is complimenting a man on his wife’s looks. The remark "You have a lovely wife." or "Your wife is very beautiful." would be regarded as almost indecent by many Iranians. Yet the same compliment would be considered perfectly natural and even highly appreciated by Westerners.” [1]
-
- 1.2.3. Addressing people
Unlike English, in which it's common to call almost everybody with their first name, in Iranian culture, it's interpreted as impoliteness especially in some situations. As an instance, it's unaccepted for a child or student to call their parents or teacher by their first name. Disobeying these rules might lead the conversation robot to be seen as unnatural, rude, or at least ignorant.
- 1.3. Other challenges
- 1.3.1. Mixed words from English
Persian speakers are increasingly using English words in their speech. In the ASR phase, there is no problem, we transform the signal from utterance into its equivalent written form in Persian. But in the NLU phase, the Persian form may not be found in any Knowledge Base (KB) or ontology. For instance, the user says the sentence below and ASR produces:
پیج رسمی فرهادی تو اینستاگرام کدومه؟
For the first word (پیج ) we cannot find any entry in KB's such as Persian Wikipedia because it's an imported word from English, ("page") and is not annotated anywhere in Persian KB's.
Like the above-mentioned cases, English abbreviations are another source of difficulty in Persian speech recognition. Take for example the sentence "داوران ای اف سی انتخاب شدند" into account, where "ای اف سی" is the Persian transliteration of AFC. Mixed with other Persian words, it would be challenging to detect the boundaries of abbreviations, and to translate them to the equivalent English form.
Last but not least, the amount of annotated data for Persian is considerably lower than that of English. As an example, in Wikipedia, which serves as the main resource for Alana chat bot, the number of pages in English is about 52,293,836, whereas it is around 4,815,055 for Persian (less than 10% of English, see here). Yet another example is WordNet ontology, it contains 155,327 words organized in 175,979 synsets (see here for more details), whereas FarsNet (the most up to date Persian wordnet) has 100,000 words and more than 40,000 synsets.
In conclusion, many of the above-mentioned barriers happen occasionally. It's a matter of further research to see how frequent and likely they are? One useful research in this area is perhaps this Persian paper.
- Log in to post comments