Tag Archives: regex

Using Oracle Regular Expressions

Something a little more simple this time, or so i thought 😉 . Anyway, what I was trying to do was convert the first letter of every word into uppercase and the rest into lowercase. Immediatly I thought this would be a job for Oracle Regex. I started to have a look at some of the documentation and slowly noticed that this might not even be possible by directly using regular expressions, or more specifically, Oracle’s implementation of them.

Basically, the problem was that there are only 4 types that can be used and they are seperated info functions:

  • REGEXP_LIKE
  • REGEXP_INSTR
  • REGEXP_SUBSTR
  • REGEXP_REPLACE

More info on this from the earlier link!!

To cut a long story short, oracle has not implemented all of the functionality of regular expressions, that is available to os’s like linux/unix. So my thought of just using regex_replace and doing an uppercase on the first letter group could not be done. However, there is a function provided by oracle which will do that automatically with no regex :

SELECT INITCAP('don talbert jones') FROM DUAL;

This solved my initial problem of uppercasing the names, but I was thinking what if you wanted to do something more complicated like initcap the names and re-arange as well:

SELECT str,
       REGEXP_REPLACE
                 (INITCAP (str),
                  '([a-zA-Z]+)([^a-zA-Z]*)([a-zA-Z]+)([^a-zA-Z]*)([a-zA-Z]+)',
                  '5, 1 3'
                 )
  FROM (SELECT 'first middle last' str FROM DUAL)

This will work for strings that are names only and no special characters eg “hyphen”. A more robust approach would be :

SELECT REGEXP_REPLACE( INITCAP (str), '([^a-zA-Z]*)([a-zA-Z]{1})([a-zA-Z]*)([^a-zA-Z]*)', '1' || '2' || '3' || '4' )  RESULT
FROM (SELECT '-,riyaz ahm:ad khan' AS str FROM DUAL)

This is the breakdown of what this version will be doing:

  • Search for anything that is not a character and any number of them: ([^a-zA-Z]*)
  • Followed by a single character: ([a-zA-Z]{1})
  • Then any character and any number of them: ([a-zA-Z]*)
  • Finally, anything other than a character and any number of them: ([^a-zA-Z]*)

These groupings represent the groupings in order from 1-4, so if you wanted to change the order of the characters or even remove the non characters then you would just manipulate  the numbers.

Overall, I am a little disappointed that the functions like U “convert uppercase” was not included in the original implementation but all I can do is winge 😉 .