Validating Cyrillic (UTF8) alphanumeric input with PHP preg_match and regular expresions

Recently I needed to make a validation rule that a Cyrillic alphanumeric string was entered in an input field. I saw answers like matching one by one all the characters in the alphabet. Luckily I found a more clever solution.

It can be done with preg_match() with the following pattern:


$str="ABC abc 1234 АБВ абв";

$pattern  = "/^[a-zA-Z\p{Cyrillic}0-9\s\-]+$/u";

$result = (bool) preg_match($pattern, $str);
if($result)
   echo "$str is composed of Cyrillic and alphanumeric characters\n";



Here is a decomposition of the pattern:

a-zA-Z is for the Latin characters. You can omit this if you want only Cyrillic input.

\p{Cyrillic} is for the cyrillic characters. You can also use \p{Arabic}, \p{Greek} or other alphabeths. See this site for a full list of utf8 scripts.

0-9 is for the numbers.

\s is for the space

\- is for the dash

6 comments:

  1. Nice solution, but what if I wanted to validate only UPPERCASE Cyrillic characters?

    ReplyDelete
  2. /^[А-Я]+$/u might be what you are looking for.

    ReplyDelete
  3. What if I want to strip all non alphanumeric characters but also allow all foriegn characters? So utf8 safe.

    ReplyDelete
  4. itworkarounds.blogspot.ru:80/2011/08/validating-cyrillic-utf8-alphanumeric.html

    ReplyDelete
  5. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.
    Data Science
    Selenium
    ETL Testing
    AWS
    Python Online Classes

    ReplyDelete
  6. That's really impressive and helpful information you have given, very valuable content.
    We are also into education and you also can take advantage really awesome job oriented courses

    ReplyDelete