Documentation
Prime combinator
Parsing combinator with human-readable syntax
- Version: 1.0.9
- Author: Roman Fantaev
- Created: 1 November, 2020
If you have any questions that are beyond the scope of this help page, Please feel free to email me
Installation
Library is located at maven central:
Maven:
<dependency>
<groupId>com.primeframeworks</groupId>
<artifactId>primeCombinator</artifactId>
<version>1.0.9</version>
<scope>compile</scope>
</dependency>
Gradle:
implementation com.primeframeworks:primeCombinator:1.0.9
How to use
Let's say we need to fetch protocol and domain name from URL. Here is an example how we take it from this address: http://combinator.primeframeworks.com
//Kotlin example
val parsedUrl = SequenceOf(
Any(Str("http"), Str("https")),
Str("://"),
CustomWord(EnglishLetter().asChar(), Character('.')))
.parse("http://combinator.primeframeworks.com")
.get()
val protocol = (Any().fromSequence(parsedUrl.sequence, 0).anyOne as Str.StrParsed).str
val domainName = CustomWord().fromSequence(parsedUrl.sequence, 2).customWord
assertEquals("http", protocol)
assertEquals("combinator.primeframeworks.com", domainName)
In given example we specify that first in text there will either String: "http"
or
String: "https"
.
Next goes String:"://"
.
Then goes word that consists of letters and symbol "."
which forms full domain
name.
Because we parse "sequence"
we have a list of results as an output where we know
that first element is"Any"
and third is "CustomWord"
.
We use "fromSequence"
to convert parsed part into corresponding class and take
value.
We have to cast Any().fromSequence(parsedUrl.sequence, 0)
to
Str.StrParsed
because Any return "general type" (could be any) but we know that
only
Str.StrParsed
could be returned as that is what Str
parser produces.
How to create custom parser
It is possible to create custom parser and combine with others. That is where primeCombinator shines and gives you ability to create very advance parsing logic with is easy to read.
//Kotlin example
class UrlParsed(val protocol: String, val domain: String, mappedFrom: Parsed) : Parsed(mappedFrom)
val urlParser = SequenceOf(
Any(Str("http"), Str("https")),
Str("://"),
CustomWord(EnglishLetter().asChar(), Character('.'))
).map { sequenceParserOutput->
val protocol = (Any().fromSequence(sequenceParserOutput.sequence, 0).anyOne as Str.StrParsed).str
val domainName = CustomWord().fromSequence(sequenceParserOutput.sequence, 2).customWord
UrlParsed(protocol, domainName, sequenceParserOutput)
}
val urlParsed = urlParser.parse("http://combinator.primeframeworks.com").get()
assertEquals("http", urlParsed.protocol)
assertEquals("combinator.primeframeworks.com", urlParsed.domain)
Congratulations! Now you have custom parser "urlParser"
and can combine it with
other Parsers.
You can try out:
SequenceOf(Str("url: "),
urlParser).parse("url: http://combinator.primeframeworks.com")
Parsed
is a general interface for parsers output. We created custom output UrlParsed
for custom parser urlParser
and made it
strictly typed.
class UrlParsed(val protocol: String, val domain: String, mappedFrom: Parsed) : Parsed(mappedFrom)
This part with base constructor
is needed to save positions in text from which
parser reads. As we're only changing output but not parsing logic we simply copy indexes from
output from "mapped" parser.
In other words we save position "a" (where we started parse) and position "b" (where we stopped
parsing) in Parsed
core class.
You always can simply use this approach for any custom parser without need to "look inside"
framework unless you want to.
Available parsers
Documentation and examples of available parsers out of the box.
Parsers are compatible with each other.
Any
Allows to specify several parsers and first successful is peeked as a result. Parsers are tried in same order they specified in constructor.
//Kotlin example
val anyParsed = Any(Word(), EnglishDigit()).parse("1 is not a name").get()
assertEquals(0, anyParsed.indexStart)
assertEquals(0, anyParsed.indexEnd)
assertEquals(1, (anyParsed.anyOne as EnglishDigit.EnglishDigitParsed).digit)
Here parser tried to find word first, but words consists only from letters, so it fails, then
it tries EnglishDigit
which is successfully parsed first part of the string.
Any parser returns AnyParsed
which contains anyOne
which is first
successful parser result (in our case it's EnglishDigitParsed
).
AnyCharacter
Takes any single character. Could be digit letter or symbol. Fails only if end of input
//Kotlin example
val anyCharacterParsed = AnyCharacter().parse("abc").get()
assertEquals(0, anyCharacterParsed.indexStart)
assertEquals(0, anyCharacterParsed.indexEnd)
assertEquals('a', anyCharacterParsed.char)>
We parsed first character in string. Result is AnyCharacterParsed
.
Beginning
Parses beginning of input
//Kotlin example
val beginningParsing = Beginning().parse("").get()
assertEquals(0, beginningParsing.indexStart)
assertEquals(-1, beginningParsing.indexEnd)
Parsed beginning of empty string an got BeginningParsed
without errors.
Beginning
is useful when we want parse whole document like:
//Kotlin example
val document = SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed
val eightParserResultInsideRepeat = repeatParserOutput.repeatersParsed[8].anyOne as Word.WordParsed
assertEquals("name", eightParserResultInsideRepeat.word)
Between
Parses value included between something. Usually used with bracket for text like: [value]
//Kotlin example
val betweenParsed = Between(Character('['), Character('b'), Character(']')).parse("[b]").get()
assertEquals(0, betweenParsed.indexStart)
assertEquals(2, betweenParsed.indexEnd)
assertEquals('b', betweenParsed.between.char)
assertEquals('[', betweenParsed.left.char)
assertEquals(']', betweenParsed.right.char)
Here we parsed character "b"
included into brackets. Parser returns BetweenParsed
Character
Parses specified single character.
//Kotlin example
val characterParsed = Character('a').parse("abc").get()
assertEquals(0, characterParsed.indexStart)
assertEquals(0, characterParsed.indexEnd)
assertEquals('a', characterParsed.char)
Here we parsed specified character "a"
. Parser returns CharacterParsed
CustomWord
Parses word which consists of specified characters.
Normal Word
parser takes only letters,CustomWord
allows to specify any characters. Can be used to parse variable names, domains and other.
//Kotlin example
val document = CustomWord(EnglishLetter().asChar(), Character('.')).parse("my.domain.com").get()
assertEquals("my.domain.com", document.customWord)
Here we parsed domain name as a single custom word. Parser returns CustomWordParsed
DoubleQuote
Allows to parse symbol "
//Kotlin example
val doubleQuoteParsed = DoubleQuote().parse(""""""").get()
assertEquals(0, doubleQuoteParsed.indexStart)
assertEquals(0, doubleQuoteParsed.indexEnd)
assertEquals(""""""", doubleQuoteParsed.str)
Here we parsed single "
. Parser returns Str
End
Parses end of the document
End
is useful when we want parse whole document like:
//Kotlin example
val document = SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed
val eightParserResultInsideRepeat = repeatParserOutput.repeatersParsed[8].anyOne as Word.WordParsed
assertEquals("name", eightParserResultInsideRepeat.word)
EndOfInputParser
Abstruct class supposed to check if there is no end of input
//Kotlin
abstract class EndOfInputParser : Parser {
override fun parse(previous: Parsed): ParsedResult {
return if (previous.currentIndex() > previous.textMaxIndex()) {
ParsedResult.asError("Unexpected end of input")
} else {
parseNext(previous)
}
}
abstract fun parseNext(previous: Parsed): ParsedResult
}
Usually inherited by Parses which need to parse 1 or more symbols
EnglishDigit
Allows to parse single digit from 1 to 9
//Kotlin example
val englishDigitParsed = EnglishDigit().parse("123").get()
assertEquals(0, englishDigitParsed.indexStart)
assertEquals(0, englishDigitParsed.indexEnd)
assertEquals(1, englishDigitParsed.digit)
Here we parsed single 1
digit. Parser returns EnglishDigitParsed
EnglishLetter
Allows to parse one letter a-z
or A-Z
//Kotlin example
val englishLetterParsed = EnglishLetter().parse("abv").get()
assertEquals(0, englishLetterParsed.indexStart)
assertEquals(0, englishLetterParsed.indexEnd)
assertEquals('a', englishLetterParsed.letter)
Here we parsed single letter a
. Parser returns EnglishLetterParsed
Long
Allows to parse Long like232312324
//Kotlin example
val longParsed = Long().parse("134").get()
assertEquals(0, longParsed.indexStart)
assertEquals(2, longParsed.indexEnd)
assertEquals(134L, longParsed.long)
Here we parsed long: 134
. Parser returns LongParsed
Not
Not inverses parser. It doesn't capture end index! Meaning next parser will start from the same index
//Kotlin example
val notA = Not(Str("a")).parse("b").get()
assertEquals(0, notA.indexStart)
Here we parsed not symbol a
. Parser returns NotParsed
Repeat
Repeat specified parsers until fails
//Kotlin example
val repeatParsed = Repeat(EnglishLetter()).parse("Name1").get()
assertEquals(0, repeatParsed.indexStart)
assertEquals(3, repeatParsed.indexEnd)
assertEquals(4, repeatParsed.repeatersParsed.size)
Here we parsed N,a,m,e
4 letters. Parser returns RepeatParsed
Note that digit is not parsed in the example
RepeatableBetween
Allows to parse and join content between "
//Kotlin example
val repeatableBetweenParsed = RepeatableBetween(Str("["), EnglishLetter(), Str("]")).parse("[Na]").get()
assertEquals(0, repeatableBetweenParsed.indexStart)
assertEquals(3, repeatableBetweenParsed.indexEnd)
assertEquals("[", repeatableBetweenParsed.left.str)
assertEquals("]", repeatableBetweenParsed.right.str)
assertEquals(2, repeatableBetweenParsed.between.size)
assertEquals('N', repeatableBetweenParsed.between[0].letter)
assertEquals('a', repeatableBetweenParsed.between[1].letter)
Here we parsed 2 letters N,a
between [
and ]
. Parser returns Str
Note that class has joinRepeaters
which allows merging result
//Kotlin example
val repeatableBetweenParsed = RepeatableBetween(Str("["), EnglishLetter(), Str("]"))
.joinRepeaters { it.map { it.letter }.joinToString(separator = "") }
.parse("[Na]").get()
assertEquals(0, repeatableBetweenParsed.indexStart)
assertEquals(3, repeatableBetweenParsed.indexEnd)
assertEquals("Na", repeatableBetweenParsed.between)
RepeatUntil
Repeat until meets "until"
//Kotlin example
val repeatUntilParsed = RepeatUntil(Character('a'), Character('b')).parse("aaab").get()
assertEquals('a', repeatUntilParsed.repeatersParsed[0].char)
assertEquals('a', repeatUntilParsed.repeatersParsed[1].char)
assertEquals('a', repeatUntilParsed.repeatersParsed[2].char)
Here we parsed 3 timesa
before reached b
. Parser returns RepeatUntilParsed
SequenceOf
SequenceOf parses sequence of parsers. Fails if at least one of parsers failed. Returns list with result of each parser in the same order as parsers supplied.
//Kotlin example
val parsedSequenceOf = SequenceOf(Beginning(), Spaces(), Word(), Spaces(), Word(), End()).parse(" Name is").get()
val beginning = Beginning().fromSequence(parsedSequenceOf.sequence, 0)
val spaces = Spaces().fromSequence(parsedSequenceOf.sequence, 1)
val name = Word().fromSequence(parsedSequenceOf.sequence, 2)
val spaces2 = Spaces().fromSequence(parsedSequenceOf.sequence, 3)
val wordIs = Word().fromSequence(parsedSequenceOf.sequence, 4)
val end = End().fromSequence(parsedSequenceOf.sequence, 5)
assertEquals(" ", spaces.spaces)
assertEquals("Name", name.word)
assertEquals("is", wordIs.word)
Here we parsed supplied set of parsers
Note that we can simply cast element from result list to desired class as an alternative approach(Make sure it corresponds to what type supplied parser is returning)
//Kotlin example
val document = SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed
Spaces
Parse single or several spaces at once
//Kotlin example
val parsedSpaces = Spaces().parse(" Name is ...").get()
assertEquals(parsedSpaces.spaces, " ")
assertEquals(parsedSpaces.indexStart, 0)
assertEquals(parsedSpaces.indexEnd, 2)
Here we parsed 3 spaces
. Parser returns SpacesParsed
Str
Parse specified string
//Kotlin example
val strParsed = Str("One,").parse("One, two, here we go ").get()
assertEquals("One,", strParsed.str)
Here we parsed specified string"One,"
. Parser returns StrParsed
Word
Parse any word which cosists only from english letters(see also #EnglishLetter)
//Kotlin example
val parsedWord = Word().parse("Name is ...").get()
assertEquals(parsedWord.word, "Name")
assertEquals(parsedWord.indexStart, 0)
assertEquals(parsedWord.indexEnd, 3)
Here we parsed first woedName
4 letters. Parser returns WordParsed
FAQ
A FAQ is a list of frequently asked questions (FAQs) and answers on a particular topic.
Source & Credits
Html template:
- HarnishDesign - https://themeforest.net/user/harnishdesign#contact
Logo:
-
Icon made by Freepik from www.flaticon.com
Changelog
See what's new added, changed, fixed, improved or updated in the latest versions.
For Future Updates Follow us on git
Version 1.0.9 1 November, 2020
- Added Framework with basic Parsers