Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

C# Regex for a match outside a specific region

1

43 views

I have to find occurrences of a certain string (needle) within another string (haystack) that don't occur between specific "braces".

For example consider this haystack: "BEGIN something END some other thing BEGIN something else END yet some more things." And this needle: "some" With the braces "BEGIN" and "END"

I want to find all needles that are not between braces. (there are two matches: the "some" followed by "other" and the "some" followed by "more")

I figured I could solve this with a Regex with negative lookahhead/lookbehind, but how?

I have tried

(?<!(BEGIN))some(?!(END))

which gives me 4 matches (obviously because no "some" is directly enclosed between "BEGIN" and "END")

I also tried

(?<!(BEGIN.*))some(?!(.*END))

but this gives me no matches at all (obviously because each needle is somehow preceeded by a "BEGIN")

No I'm stuck.

Here's the latest C# code I used:

string input = "BEGIN something END some other thing BEGIN something else END yet some more things.";
global::System.Text.RegularExpressions.Regex re = new Regex(@"(?<!(BEGIN.*))some(?!(.*END))");
global::System.Text.RegularExpressions.MatchCollection matches = re.Matches(input);
global::NUnit.Framework.Assert.AreEqual(2, matches.Count);

asked April 8, 2011 10:46 am CDT
posted via StackOverflow

4 Answers

0
 

You might try splitting the string on occurrences of BEGIN or END so that you can insure that there is only one BEGIN and one END in the string that you apply your regex to. Also, if you are looking for occurrences of SOME that are outside your BEGIN/END braces then I think you'd want to look behind for END and lookahead for BEGIN (positive lookahead/behind), the opposite of what you have.

Hope this helps.

answered April 8, 2011 11:23 am CDT
0
 

What if you just process the entire haystack and ignore the hay that is in between the braces (am I pushing the metaphor too far?)

For example, look through all the tokens (or characters, if you need to go to that level) and look for your braces. When the opening one is found, you loop through until you find the closing brace. At that point, you start looking for your needles until you find another opening brace. It's a bit more code than a Regex, but might be more readible and easier to troubleshoot.

answered April 8, 2011 11:23 am CDT
0
 

Would something like this work for you:

(?:^|END)((?!BEGIN).*?)(some)(.*?)(?:BEGIN|$)

This appears to match your text, as I tested using RegExDesigner.NET.

answered April 8, 2011 11:23 am CDT
1
 

One simple option is to skip the parts you don't want to match, and capture only the needles you need:

MatchCollection matches = Regex.Matches(input, "BEGIN.*?END|(?<Needle>some)");

You'll get the two "some"s you're after by taking the successful "Needle" groups out of all matches:

IEnumerable<Group> needles = matches.Cast<Match>()
                                    .Select(m => m.Groups["Needle"])
                                    .Where(g => g.Success);

answered April 9, 2011 1:09 pm CDT

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions