Remove a random expression from string

我是研究僧i 提交于 2019-12-20 02:09:07

问题


I have a string/column something like this

String a = "000003023_AggregateStopLossLimit_W x3A 973911_2012-12-22.PDF";

I want to create a substring which doesn't have the part ' x3A 973911' in it.

Whic means I want something like this,

000003023_AggregateStopLossLimit_W_2012-12-22.PDF

There is a list of such strings which will have different values but the format will be the same. I want the part of string to be removed which comes after the first space and ends at the next '_'.

This is what I have done already, this is working fine, but want to know if there is a better way of doing it.

String b = a.replaceAll(a.substring(a.indexOf(" "), a.indexOf("_",a.indexOf(" "))),"");

It would be even better if I can do this in db itself, which is oracle, instead of in java. Any idea to get this formatted string from the column directly using select?

One more requirement, I dont want to display the extension of the file.
So nothing after the '.' should be displayed, which means something like this '000003023_AggregateStopLossLimit_W_2012-12-22'
I tried the following using the previous solution of APC

 select regexp_replace ( your_string
                          , '([^[:space]]*) (.*)_(.*)....'
                          , '\1_\3') as new_string from your_table

This is working fine for now.
This should be removing last 4 characters and have the risk of not getting proper result if the extension is more or less than 3 or if the string is not truncated.
I'm looking for a more aesthetic way to do it.
Any chance?


回答1:


To do it in the database:

select regexp_replace ( your_string
                         , '([^[:space]]*) (.*)_(.*)'
                         , '\1_\3') as new_string
from your_table

Unfortunately Oracle doesn't have any syntax to enforce laziness (non-greediness) in its regex implementation. That's why my original '(.*) ' included the x3A: it matched up to the last space with a following underscore. However, the negation syntax will isolate the string up to the first space.

"The '_' after W is missing. Any chance to get that also?"

You can format the replacement string anyway you want. The easy way out is to do what I have done, and hardcode the underscore between the two matched patterns. Alternatively you could make it a search pattern in its own right and include it in the replacement string (although you're more more likley to do that for more complicated searches).


Oracle introduced Regular Expressions in 10g; the functions are covered in the documentation. The regex implementation is POSIX compliant, so it lacks some of the functions you might have come across in say Perl. The Regex support is detailed in an appendix to the SQL ref.

As for tutorials, well I have a much-thumbed copy of the O'Reilly pocket book; I was given my copy at Open World 2003 but the ebook is reasonably priced. Buy it here. Anotgher good starting point is a series of threads by cd on the OTN forum: start reading here.




回答2:


final String r = a.replaceAll(" .*?(?=_)", "");

if you print the r, it gave output:

000003023_AggregateStopLossLimit_W_2012-12-22.PDF



回答3:


If you need a SQL solution, this will update the rows:

update yourtable
set field = substr(field, 0, instr(field, ' ')-1) || substr(field, instr(field, '_', instr(field, ' ')))
;

and this will just show the converted value:

select
  yourtable.field,
  case
    when instr(field, '_', instr(field, ' '))>instr(field, ' ')
    then substr(field, 0, instr(field, ' ')-1) || substr(field, instr(field, '_', instr(field, ' ')))
    else field
  end as new_field
from
  yourtable



回答4:


Apart from Regex issues in the code you provided, i found it less readable also.

Try following:

int f = a.indexOf(" ");
int l = a.lastIndexOf("_");
a = a.substring(0,f+1) + a.substring(l+1, a.length);



回答5:


replaceAll takes a regex as an argument, if the substring contains regex markers (such as [, + for example) you will get an unexpected behaviour.

You can use replace instead which does the same thing but takes a string as a parameter.

Apart from that, if you know that you will have a space and a _ as delimiters, AND the substring in between does not occur elsewhere, then your approach looks fine. You could possibly make it slightly more readable with intermediate variables:

int start = a.indexOf(" ");
int end = a.indexOf("_", start);
String b = a.substring(0, start) + a.substring(end, a.length());



回答6:


You should replace REGEX_REPLACE function.

http://docs.oracle.com/cd/B12037_01/server.101/b10759/functions115.htm#SQLRF06302




回答7:


The Java solution given by @Kent above is very elegant and I recommend it. That said, if you want to accomplish this using Oracle's regex engine, you might try the following:

WITH t1 AS (
    SELECT '000003023_AggregateStopLossLimit_W x3A 973911_2012-12-22.PDF' AS filename
      FROM dual
)
SELECT filename, REGEXP_REPLACE(filename, ' [^_]*', '')
  FROM t1


来源:https://stackoverflow.com/questions/14100168/remove-a-random-expression-from-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!