Documentation

Prime combinator

Parsing combinator with human-readable syntax


  • Created: 1 November, 2020

If you have any questions that are beyond the scope of this help page, Please feel free to email me


Installation

Library is located at maven central:

Maven:
<dependency>
     <groupId>com.primeframeworks</groupId>
     <artifactId>primeCombinator</artifactId>
     <version>1.0.9</version>
     <scope>compile</scope>
</dependency>
Gradle:
implementation com.primeframeworks:primeCombinator:1.0.9

How to use

Let's say we need to fetch protocol and domain name from URL. Here is an example how we take it from this address: http://combinator.primeframeworks.com

//Kotlin example

val parsedUrl = SequenceOf(
    Any(Str("http"), Str("https")),
    Str("://"),
    CustomWord(EnglishLetter().asChar(), Character('.')))
    .parse("http://combinator.primeframeworks.com")
    .get()

val protocol = (Any().fromSequence(parsedUrl.sequence, 0).anyOne as Str.StrParsed).str
val domainName = CustomWord().fromSequence(parsedUrl.sequence, 2).customWord

assertEquals("http", protocol)
assertEquals("combinator.primeframeworks.com", domainName)

In given example we specify that first in text there will either String: "http" or String: "https". Next goes String:"://". Then goes word that consists of letters and symbol "." which forms full domain name.

Because we parse "sequence" we have a list of results as an output where we know that first element is"Any" and third is "CustomWord". We use "fromSequence" to convert parsed part into corresponding class and take value. We have to cast Any().fromSequence(parsedUrl.sequence, 0) to Str.StrParsed because Any return "general type" (could be any) but we know that only Str.StrParsed could be returned as that is what Str parser produces.


How to create custom parser

It is possible to create custom parser and combine with others. That is where primeCombinator shines and gives you ability to create very advance parsing logic with is easy to read.

//Kotlin example

class UrlParsed(val protocol: String, val domain: String, mappedFrom: Parsed) : Parsed(mappedFrom)

val urlParser = SequenceOf(
    Any(Str("http"), Str("https")),
    Str("://"),
    CustomWord(EnglishLetter().asChar(), Character('.'))
).map { sequenceParserOutput->
    val protocol = (Any().fromSequence(sequenceParserOutput.sequence, 0).anyOne as Str.StrParsed).str
    val domainName = CustomWord().fromSequence(sequenceParserOutput.sequence, 2).customWord

    UrlParsed(protocol, domainName, sequenceParserOutput)
}

val urlParsed = urlParser.parse("http://combinator.primeframeworks.com").get()

assertEquals("http", urlParsed.protocol)
assertEquals("combinator.primeframeworks.com", urlParsed.domain)

Congratulations! Now you have custom parser "urlParser" and can combine it with other Parsers. You can try out:

SequenceOf(Str("url: "), urlParser).parse("url: http://combinator.primeframeworks.com")

Parsed is a general interface for parsers output. We created custom output UrlParsed for custom parser urlParser and made it strictly typed.

class UrlParsed(val protocol: String, val domain: String, mappedFrom: Parsed) : Parsed(mappedFrom)

This part with base constructor is needed to save positions in text from which parser reads. As we're only changing output but not parsing logic we simply copy indexes from output from "mapped" parser. In other words we save position "a" (where we started parse) and position "b" (where we stopped parsing) in Parsed core class. You always can simply use this approach for any custom parser without need to "look inside" framework unless you want to.


Available parsers

Documentation and examples of available parsers out of the box.

Parsers are compatible with each other.


Any

Allows to specify several parsers and first successful is peeked as a result. Parsers are tried in same order they specified in constructor.

//Kotlin example

val anyParsed = Any(Word(), EnglishDigit()).parse("1 is not a name").get()
assertEquals(0, anyParsed.indexStart)
assertEquals(0, anyParsed.indexEnd)
assertEquals(1, (anyParsed.anyOne as EnglishDigit.EnglishDigitParsed).digit)

Here parser tried to find word first, but words consists only from letters, so it fails, then it tries EnglishDigit which is successfully parsed first part of the string.

Any parser returns AnyParsed which contains anyOne which is first successful parser result (in our case it's EnglishDigitParsed).


AnyCharacter

Takes any single character. Could be digit letter or symbol. Fails only if end of input

//Kotlin example

val anyCharacterParsed = AnyCharacter().parse("abc").get()
assertEquals(0, anyCharacterParsed.indexStart)
assertEquals(0, anyCharacterParsed.indexEnd)
assertEquals('a', anyCharacterParsed.char)>

We parsed first character in string. Result is AnyCharacterParsed.


Beginning

Parses beginning of input

//Kotlin example

val beginningParsing = Beginning().parse("").get()
assertEquals(0, beginningParsing.indexStart)
assertEquals(-1, beginningParsing.indexEnd)

Parsed beginning of empty string an got BeginningParsed without errors.

Beginning is useful when we want parse whole document like:

//Kotlin example
val document =  SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed
val eightParserResultInsideRepeat = repeatParserOutput.repeatersParsed[8].anyOne as Word.WordParsed
assertEquals("name", eightParserResultInsideRepeat.word)

Between

Parses value included between something. Usually used with bracket for text like: [value]

//Kotlin example
val betweenParsed = Between(Character('['), Character('b'), Character(']')).parse("[b]").get()
assertEquals(0, betweenParsed.indexStart)
assertEquals(2, betweenParsed.indexEnd)
assertEquals('b', betweenParsed.between.char)
assertEquals('[', betweenParsed.left.char)
assertEquals(']', betweenParsed.right.char)

Here we parsed character "b" included into brackets. Parser returns BetweenParsed


Character

Parses specified single character.

//Kotlin example
val characterParsed = Character('a').parse("abc").get()
assertEquals(0, characterParsed.indexStart)
assertEquals(0, characterParsed.indexEnd)
assertEquals('a', characterParsed.char)

Here we parsed specified character "a". Parser returns CharacterParsed


CustomWord

Parses word which consists of specified characters. Normal Word parser takes only letters,CustomWord allows to specify any characters. Can be used to parse variable names, domains and other.

//Kotlin example
val document =  CustomWord(EnglishLetter().asChar(), Character('.')).parse("my.domain.com").get()
assertEquals("my.domain.com", document.customWord)

Here we parsed domain name as a single custom word. Parser returns CustomWordParsed


DoubleQuote

Allows to parse symbol "

//Kotlin example
val doubleQuoteParsed = DoubleQuote().parse(""""""").get()
assertEquals(0, doubleQuoteParsed.indexStart)
assertEquals(0, doubleQuoteParsed.indexEnd)
assertEquals(""""""", doubleQuoteParsed.str)

Here we parsed single ". Parser returns Str


End

Parses end of the document

End is useful when we want parse whole document like:

//Kotlin example
val document =  SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed
val eightParserResultInsideRepeat = repeatParserOutput.repeatersParsed[8].anyOne as Word.WordParsed
assertEquals("name", eightParserResultInsideRepeat.word)

EndOfInputParser

Abstruct class supposed to check if there is no end of input

//Kotlin
abstract class EndOfInputParser : Parser {
    override fun parse(previous: Parsed): ParsedResult {
        return if (previous.currentIndex() > previous.textMaxIndex()) {
            ParsedResult.asError("Unexpected end of input")
        } else {
            parseNext(previous)
        }
    }

    abstract fun parseNext(previous: Parsed): ParsedResult
}

Usually inherited by Parses which need to parse 1 or more symbols


EnglishDigit

Allows to parse single digit from 1 to 9

//Kotlin example
val englishDigitParsed = EnglishDigit().parse("123").get()
assertEquals(0, englishDigitParsed.indexStart)
assertEquals(0, englishDigitParsed.indexEnd)
assertEquals(1, englishDigitParsed.digit)

Here we parsed single 1 digit. Parser returns EnglishDigitParsed


EnglishLetter

Allows to parse one letter a-z or A-Z

//Kotlin example
val englishLetterParsed = EnglishLetter().parse("abv").get()
assertEquals(0, englishLetterParsed.indexStart)
assertEquals(0, englishLetterParsed.indexEnd)
assertEquals('a', englishLetterParsed.letter)

Here we parsed single letter a. Parser returns EnglishLetterParsed


Long

Allows to parse Long like232312324

//Kotlin example
val longParsed = Long().parse("134").get()
assertEquals(0, longParsed.indexStart)
assertEquals(2, longParsed.indexEnd)
assertEquals(134L, longParsed.long)

Here we parsed long: 134. Parser returns LongParsed


Not

Not inverses parser. It doesn't capture end index! Meaning next parser will start from the same index

//Kotlin example
val notA = Not(Str("a")).parse("b").get()
assertEquals(0, notA.indexStart)

Here we parsed not symbol a. Parser returns NotParsed


Repeat

Repeat specified parsers until fails

//Kotlin example
val repeatParsed = Repeat(EnglishLetter()).parse("Name1").get()
assertEquals(0, repeatParsed.indexStart)
assertEquals(3, repeatParsed.indexEnd)
assertEquals(4, repeatParsed.repeatersParsed.size)

Here we parsed N,a,m,e 4 letters. Parser returns RepeatParsed

Note that digit is not parsed in the example


RepeatableBetween

Allows to parse and join content between "

//Kotlin example
val repeatableBetweenParsed = RepeatableBetween(Str("["), EnglishLetter(), Str("]")).parse("[Na]").get()
assertEquals(0, repeatableBetweenParsed.indexStart)
assertEquals(3, repeatableBetweenParsed.indexEnd)
assertEquals("[", repeatableBetweenParsed.left.str)
assertEquals("]", repeatableBetweenParsed.right.str)
assertEquals(2, repeatableBetweenParsed.between.size)
assertEquals('N', repeatableBetweenParsed.between[0].letter)
assertEquals('a', repeatableBetweenParsed.between[1].letter)

Here we parsed 2 letters N,a between [ and ]. Parser returns Str

Note that class has joinRepeaters which allows merging result

//Kotlin example
val repeatableBetweenParsed = RepeatableBetween(Str("["), EnglishLetter(), Str("]"))
    .joinRepeaters { it.map { it.letter }.joinToString(separator = "") }
    .parse("[Na]").get()
assertEquals(0, repeatableBetweenParsed.indexStart)
assertEquals(3, repeatableBetweenParsed.indexEnd)
assertEquals("Na", repeatableBetweenParsed.between)

RepeatUntil

Repeat until meets "until"

//Kotlin example
val repeatUntilParsed = RepeatUntil(Character('a'), Character('b')).parse("aaab").get()
assertEquals('a', repeatUntilParsed.repeatersParsed[0].char)
assertEquals('a', repeatUntilParsed.repeatersParsed[1].char)
assertEquals('a', repeatUntilParsed.repeatersParsed[2].char)

Here we parsed 3 timesa before reached b. Parser returns RepeatUntilParsed


SequenceOf

SequenceOf parses sequence of parsers. Fails if at least one of parsers failed. Returns list with result of each parser in the same order as parsers supplied.

//Kotlin example
val parsedSequenceOf = SequenceOf(Beginning(), Spaces(), Word(), Spaces(), Word(), End()).parse("   Name is").get()

val beginning = Beginning().fromSequence(parsedSequenceOf.sequence, 0)
val spaces = Spaces().fromSequence(parsedSequenceOf.sequence, 1)
val name = Word().fromSequence(parsedSequenceOf.sequence, 2)
val spaces2 = Spaces().fromSequence(parsedSequenceOf.sequence, 3)
val wordIs = Word().fromSequence(parsedSequenceOf.sequence, 4)
val end = End().fromSequence(parsedSequenceOf.sequence, 5)

assertEquals("   ", spaces.spaces)
assertEquals("Name", name.word)
assertEquals("is", wordIs.word)

Here we parsed supplied set of parsers

Note that we can simply cast element from result list to desired class as an alternative approach(Make sure it corresponds to what type supplied parser is returning)

//Kotlin example
val document =  SequenceOf(Beginning(), Repeat(Any(Word(), Spaces(), EnglishDigit())), End()).parse("1 is not a name").get()
val repeatParserOutput = document.sequence[1] as Repeat.RepeatParsed

Spaces

Parse single or several spaces at once

//Kotlin example
val parsedSpaces = Spaces().parse("   Name is ...").get()
assertEquals(parsedSpaces.spaces, "   ")
assertEquals(parsedSpaces.indexStart, 0)
assertEquals(parsedSpaces.indexEnd, 2)

Here we parsed 3 spaces . Parser returns SpacesParsed


Str

Parse specified string

//Kotlin example
val strParsed = Str("One,").parse("One, two, here we go ").get()
assertEquals("One,", strParsed.str)

Here we parsed specified string"One,". Parser returns StrParsed


Word

Parse any word which cosists only from english letters(see also #EnglishLetter)

//Kotlin example
val parsedWord = Word().parse("Name is ...").get()
assertEquals(parsedWord.word, "Name")
assertEquals(parsedWord.indexStart, 0)
assertEquals(parsedWord.indexEnd, 3)

Here we parsed first woedName 4 letters. Parser returns WordParsed


FAQ

A FAQ is a list of frequently asked questions (FAQs) and answers on a particular topic.

Yes, as long as you don't change author in classes from current framework. In other words you preserve author's name in files.

Source & Credits

Html template:

Logo:


Changelog

See what's new added, changed, fixed, improved or updated in the latest versions.

For Future Updates Follow us on git

Version 1.0.9 1 November, 2020

  • Added Framework with basic Parsers