问题
I am trying to extract the data out of an encoded 2D barcode. The extraction part is working fine, and I can get the value in a text input.
E.g., the decoded string is
]d20105000456013482172012001000001/:210000000001
Based on the following rules (couldn't get the proper table markdown thus attaching a picture), I am trying to extract the substrings from the string mentioned above.
Substrings I want to extract:
05000456013482 (which is after the delimiter 01)
201200 (which is after delimiter 17)
00001 (which is after delimiter 10)
0000000001 (which is after delimiter 21)
P.S - > the first 3 chars in the original string (]d2
) are always the same since it just simply signifies the decoding method.
Now some quirks:
1) The number of letters after delimiter 10
is not fixed. So, in the above-given example even though it is 00001
it could be even 001
. Similarly, the number of letters after delimiter 21
is also not fixed and it could be of varying length.
For different length delimiters, I have added a constant /:
to determine when encoding has ended after scanning through a handheld device.
Now, I have a look for /:
after delimiter 10 and extract the string until it hits /:
or EOL and find delimiter 21 and remove the string until it hits /:
or EOL
2) The number of letters after delimiter 01
and 17
are always fixed (14 letter and six letters respectively)
as shown in the table.
Note: The position of delimiters could change. In order words, the encoded barcode could be written in a different sequence.
]d20105000456013482172012001000001/:210000000001 - Note: No /:
sign after 21 group since it is EOL
]d2172012001000001/:210000000001/:0105000456013482 - Note: Both 10 and 21 group have /.
sign to signify we have to extract until that sign
]d21000001/:210000000001/:010500045601348217201200 - First two are of varying length, and the next two are of fixed length.
I am not an expert in regex and thus far I only tried using some simple patterns like (01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$
which doesn't work in the given an example since it looks for 10 like the first 2 chars. Also, using substring(x, x)
method only works in case of a fixed length string when I am aware of which indexes I have to pluck the string.
P.S - Either JS and jQuery help is appreciated.
回答1:
While you could try to make a very complicated regex to do this, it would be more readable, and maintainable to parse through the string in steps.
Basic steps would be to:
- remove the decode method characters (]d2).
- Split off the first two characters from the result of step 1.
- Use that to choose which method to extract the data
- Remove and save that data from the string, goto step 2 repeat until exhausted string.
Now since you have a table of the structure of the AI/data you can make several methods to extract the different forms of data
For instance, since AI: 01, 11, 15, 17 are all fixed length you can just use string's slice method with the length
str.slice(0,14); //for 01
str.slice(0,6); //for 11 15 17
While the variable ones like AI 21, would be something like
var fnc1 = "/:";
var fnc1Index = str.indexOf(fnc1);
str.slice(0,fnc1Index);
Demo
var dataNames = {
'01': 'GTIN',
'10': 'batchNumber',
'11': 'prodDate',
'15': 'bestDate',
'17': 'expireDate',
'21': 'serialNumber'
};
var input = document.querySelector("input");
document.querySelector("button").addEventListener("click",function(){
var str = input.value;
console.log( parseGS1(str) );
});
function parseGS1(str) {
var fnc1 = "/:";
var data = {};
//remove ]d2
str = str.slice(3);
while (str.length) {
//get the AI identifier: 01,10,11 etc
let aiIdent = str.slice(0, 2);
//get the name we want to use for the data object
let dataName = dataNames[aiIdent];
//update the string
str = str.slice(2);
switch (aiIdent) {
case "01":
data[dataName] = str.slice(0, 14);
str = str.slice(14);
break;
case "10":
case "21":
let fnc1Index = str.indexOf(fnc1);
//eol or fnc1 cases
if(fnc1Index==-1){
data[dataName] = str.slice(0);
str = "";
} else {
data[dataName] = str.slice(0, fnc1Index);
str = str.slice(fnc1Index + 2);
}
break;
case "11":
case "15":
case "17":
data[dataName] = str.slice(0, 6);
str = str.slice(6);
break;
default:
console.log("unexpected ident encountered:",aiIndent);
return false;
break;
}
}
return data;
}
<input><button>Parse</button>
回答2:
Ok, here's my take on this. I created a regex that will match all possible patterns. That way all parts are split correctly, all that remains is to use the first two digits to know what it means.
^\]d2(?:((?:10|21)[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|((?:11|15|17)[0-9]{6}))*
I suggest you copy it into regex101.com to read the full descriptors and test it out against different possible results.
There are 3 mains parts:
((?:10|21)[a-zA-Z0-9]{1,20}(?:\/:|$))
Which tests for the sections starting in 10 and 21. It looks for alphanumerical entities between 1 and 20 times. It should end either with EOL
or /:
(01[0-9]{14})
Looks up for the GTIN, pretty straightforward.
((?:11|15|17)[0-9]{6})
Looks up for the 3 date fields.
As we expect those 3 segments to come in any order, I've glued them around | to imply a OR
and expect this big sequence to repeat (with the *
at the end expressing 0 or more, we could define the exact minimum and maximum for more reliability)
I am unsure if this will work for everything as the test strings you gave do not include identifiers inside actual values... It could very well happen that a product's best before date is in January so there will be a 01 in its value. But forcing the regex to execute in this manner should circumvent some of those problems.
EDIT: Capturing groups are only capturing the last occurence, so we need to split their definitions:
^\]d2(?:(21[a-zA-Z0-9]{1,20}(?:\/:|$))|(10[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|(11[0-9]{6})|(15[0-9]{6})|(17[0-9]{6}))*
EDIT AGAIN: Javascript seems to cause us some headaches... I am not sure of the correct way to handle it, but here's an example code that could work.
var str = "]d20105000456013482172012001000001/:210000000001";
var r = new RegExp("(21[a-zA-Z0-9]{1,20}(?:\/:|$))|(10[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|(11[0-9]{6})|(15[0-9]{6})|(17[0-9]{6})", "g");
var i = 0;
while ((match = r.exec(str)) != null) {
console.log(match[0]);
}
I am not very happy with how it turns out though. There might be better solutions.
来源:https://stackoverflow.com/questions/45918849/extracting-substring-from-string-based-on-delimiter