Validating Cyrillic (UTF8) alphanumeric input with PHP preg_match and regular expresions

Recently I needed to make a validation rule that a Cyrillic alphanumeric string was entered in an input field. I saw answers like matching one by one all the characters in the alphabet. Luckily I found a more clever solution.

It can be done with preg_match() with the following pattern:

$str="ABC abc 1234 АБВ абв";

$pattern  = "/^[a-zA-Z\p{Cyrillic}0-9\s\-]+$/u";

$result = (bool) preg_match($pattern, $str);
   echo "$str is composed of Cyrillic and alphanumeric characters\n";

Here is a decomposition of the pattern:

a-zA-Z is for the Latin characters. You can omit this if you want only Cyrillic input.

\p{Cyrillic} is for the cyrillic characters. You can also use \p{Arabic}, \p{Greek} or other alphabeths. See this site for a full list of utf8 scripts.

0-9 is for the numbers.

\s is for the space

\- is for the dash


  1. Nice solution, but what if I wanted to validate only UPPERCASE Cyrillic characters?

  2. /^[А-Я]+$/u might be what you are looking for.

  3. What if I want to strip all non alphanumeric characters but also allow all foriegn characters? So utf8 safe.