This content originally appeared on DEV Community and was authored by Artemis
Parsers are what now?!
Even to a Haskeller —proficient in the magics of functors and monads— the term of relative monads might be foreign; let's thus motivate their existence.
Recently I've been thinking about how parsing expression grammar specifies very little about how syntax errors are to be reported or aggregated.
Declaration ← Function / Struct / Module
Function ← "fn" Name "(" ...
Struct ← "struct" Name "{" ...
Module ← "module" Name "{" ...
Take the above grammar fragment as an example; given the input struct {...
, what error would one expect?
The best outcome is along the lines of "expected a name in the struct declaration", but how do we know to report that error? Does the grammar not say to parse Module
if Struct
does not match? So then the error should be "expected module
but got struct
". That's not very helpful... Alternatively we could aggregate all errors and report whatever the three different rules report.
While specifying what is to be done can be handled by the specification, how do we actually implement any of the above?
Take typical parser combinators in Haskell:
-- we could parameterize the input type
-- it is kept as String for simplicity's sake
newtype Parser err suc = Parser ( String -> (String, Either err suc) )
There will be two basic combinators: chaining parsers (P1 P2 P3
) and alternating them (P1 / P2 / P3
). The former is implemented using simple applicative functors:
instance Applicative (Parser err) where
pure suc = Parser $ \i -> (i, Right suc)
(Parser p) <*> (Parser q) = Parser $ \i ->
let (rest, f) = p i in
let (final, suc) = q rest in
(rest, f <*> suc)
While the latter is implemented using the Alternative
typeclass:
instance Alternative (Parser ()) where
empty = Parser $ \i -> (i, Left ())
(Parser p) <|> (Parser q) = Parser $ \i ->
case p i of
(_, Left _) -> q i
(rest, suc) -> (rest, suc)
However we already see some issues: as we cannot reliably construct values for all possible error types we have to use ()
for empty. This also makes our implementation of (<|>)
only apply to parsers with said error type. Even if we managed to generalize (<|>)
to all errors, we would still only get the most recent error that occurred, representing only "expected module
" from the earlier example.
Instead we need the ability to aggregate our errors, just like we do with successful outputs in Applicative
...
So let's implement Applicative
for errors!
-- A newtype wrapper to change the order of type parameters on parser
newtype Flip f a b = Flip (f b a)
instance Applicative (Flip Parser suc) where
pure :: err -> Flip Parser suc err
pure err = Flip $ Parser $ \i -> (i, Left err)
-- interestingly the implementation looks quite close to Alternative
(<*>) :: Flip Parser suc (err -> err') -> Flip Parser suc err -> Flip Parser suc err'
(Flip (Parser p)) <*> (Flip (Parser q)) = Flip $ Parser $ \i ->
case p i of
(_, Left f) -> case q i of
(_, Left err) -> (i, f err)
(rest, suc) -> (rest, suc)
(rest, suc) -> (rest, suc)
Wow! That's unwieldy.
As Haskell lacks the machinery to implement typeclasses on multiple different type parameters, the result is quite ugly. Thus we arrive at the Bifunctor
typeclass. A result of exactly this problem on other types like tuples. If functors can be binary, why can't applicatives or even monads for that matter?
So let's do as true programmers do and look up those typeclasses on the internet:
Haskell's Data.Bifunctor
is basically:
class Bifunctor f where
bimap :: (a -> c) -> (b -> d) -> f a b -> f c d
I could find a Biapply
as well. My question is, why isn't there a complete bi-hierarchy (bierarchy?) like:
class Bifunctor f => Biapplicative f where
…
Following Sjoerd's answer, we finally arrive at relative bimonads.
A monad in category theory is an endofunctor, i.e. a functor where the domain and codomain is the same category. But a Bifunctor
is a functor from the product category Hask x Hask
to Hask
. But we could try to find out what a monad in the Hask x
…
Due to their categorical roots, monads don't play well with such ad-hoc extensions. A monad is an endofunctor, meaning it maps types to types. A bifunctor is a functor from two types to one type. These two notions are incompatible, but a bifunctor can collapse two types into one, so our bimonad just needs to act on the output of some bifunctor!
This process of using functors to 'preprocess' our inputs is what relative monads are.
After some experimentation however it becomes apparent that we cannot do the same for applicatives, as the biap
function does not compose for Either
. Using a normal extension of applicatives into two parameters isn't ideal either, as bipure :: a -> b -> f a b
does not give us the ability to specify whether the parser is an always failing or an always successful one. However, monads can express applicative composition using bind and return.
The same is true for relative bimonads.
Taking the typeclass from the stackoverflow thread linked:
class (Bifunctor j) => RelativeBimonad j m where
bireturn :: j a b -> m a b
bibind :: m a b -> (j a b -> m c d) -> m c d
Implementing these typeclasses for our parser is relatively simple. The bifunctor that performs the preprocessing is simply the Either
that we have in the definition of our parser.
instance RelativeBimonad Either Parser where
bireturn :: Either err suc -> Parser err suc
bireturn res = Parser $ \i -> (i, res)
bibind :: Parser err suc
-> (Either err suc -> Parser err' suc')
-> Parser err' suc'
bibind (Parser p) f = Parser $ \i ->
let (rest, out) = p i in
let (Parser q) = f out in
q rest
It is important to note, that bibind
collapses the alternative combinator (<|>)
(or as in PEG /
) and the sequencing combinator (juxtaposition in PEG). As this is usually hard to reason about, I define two small helpers:
onSuc :: (suc -> Parser err suc') -> Either err suc -> Parser err suc'
onSuc f (Right suc) = f suc
onSuc _ (Left err) = bireturn $ Left err
onErr :: (err -> Parser err' suc) -> Either err suc -> Parser err' suc
onErr f (Left err) = f err
onErr _ (Right suc) = bireturn $ Right suc
Which will allow us to write the long ago mentioned example as:
data Error = NoDeclErr
| StructErr Part
| FuncErr Part
| ModErr Part
deriving (Show)
data Part = Keyword
| Name
| Body
deriving (Show, Eq)
data Decl = Struct String String
| Func String String
| Mod String String
deriving (Show)
parseString err str = Parser $ \i -> case stripPrefix str i of
Just r -> (r, Right str)
Nothing -> (i, Left err)
-- as example is not a full grammar, this accepts any string surrounded by open, close
parseBody open close =
bibind (parseString Body [open ])(onSuc (\_ ->
bibind (parseUntil close )(onSuc (\s ->
bibind (parseString Body [close])(onSuc (\_ ->
bireturn $ Right s
))))))
where parseUntil char = Parser $ \i -> (dropWhile (/= char) i, Right $ takeWhile (/= char) i)
parseName = Parser $ \i -> case takeWhile isAlpha i of
_ : _ -> (dropWhile isAlpha i, Right $ takeWhile isAlpha i)
_ -> (i, Left Name)
whiteSpace = Parser $ \i -> (dropWhile isSpace i, Right $ ())
parseWhole mkWhole key open close =
bibind (parseString Keyword key)(onSuc (\_ ->
bibind whiteSpace (\_ ->
bibind parseName (onSuc (\s1 ->
bibind whiteSpace (\_ ->
bibind (parseBody open close) (onSuc (\s2 ->
bireturn $ Right $ mkWhole s1 s2
))))))))
parseStruct = parseWhole Struct "struct" '{' '}'
parseFunc = parseWhole Func "fn" '(' ')'
parseMod = parseWhole Mod "module" '{' '}'
parseDecl =
bibind parseFunc (onErr (\e1 ->
bibind parseStruct (onErr (\e2 ->
bibind parseMod (onErr (\e3 ->
bireturn $ Left $ errSelect e1 e2 e3
))))))
-- selects the first non-keyword (non-immediate) error
errSelect e1 e2 e3
| e1 /= Keyword = FuncErr e1
| e2 /= Keyword = StructErr e2
| e3 /= Keyword = ModErr e3
| otherwise = NoDeclErr
runParser :: Parser a b -> String -> Either a b
runParser (Parser p) i = let (_, o) = p i in o
main :: IO ()
main = print $ runParser parseDecl "struct Foo"
Which will print Left (StructErr Body)
, correctly parsing the existing structure and choosing the correct error to report based on a simple user supplied heuristic.
It is very evident that unlike monads, bimonads do not have the same level of language support. If there was a notation similar to the 'do'-notation for bimonads, this code would look way cleaner. Though the bibind parser (onSuc f)
pattern is just the normal monad instance, which we are not restricted from using.
The following Github Gist contains the example code with all boilerplate needed to run, as well as the typical implementation of Monad
on parsers and rewriting of two rules using do-notation.
This content originally appeared on DEV Community and was authored by Artemis
Artemis | Sciencx (2024-08-20T01:46:45+00:00) Parsers are relative bimonads. Retrieved from https://www.scien.cx/2024/08/20/parsers-are-relative-bimonads/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.