High kind data

    Yes, you have not dreamed, and you have not misheard - it is a high kind. A kind is a term of type theory, meaning in essence the type of [data] type.

    But first, some lyrics.

    Habré published several articles, which described in detail the method of data validation in functional languages.

    This article - my five cents in this HYIP. We will look at data validation in Haskell.

    Type validation


    Earlier, an example of a validation technique using type validation was considered:

    typeEmailContactInfo  = StringtypePostalContactInfo = StringdataContactInfo = EmailOnlyEmailContactInfo | PostOnlyPostalContactInfo | 
                       EmailAndPost (EmailContactInfo, PostalContactInfo)
    dataPerson = Person
      { pName :: String,
      , pContactInfo :: ContactInfo,
      }

    Using this method is simply impossible to create incorrect data. However, despite the fact that such validation is very easy to create and read, using it forces you to write a lot of routine and make many changes to the code. So, the use of such a method is limited only for really important data.

    Validation of data of a high kind




    In this article, we will look at another validation method — using data of a higher kind.

    Suppose we have a data type:

    dataPerson = Person
      { pName :: String
      , pAge  :: Int
      }

    And we will validate the data only if all the fields in the record are valid.
    Since Haskell is more functional than most functional languages, you can easily get rid of most of the routines.

    Here it is possible and therefore this method is widely used among library authors on Haskell.

    For purposes of discussion, let's imagine that we want the user to fill in personal information via a web form or something else. In other words, it is possible that they may spoil the filling of some of the information without necessarily canceling the rest of the data structure. If they successfully complete the entire structure, we would like to get a filled-in Record Person .

    One of the modeling methods is to use the second data type:

    dataMaybePerson = MaybePerson
      { mpName :: MaybeString
      , mpAge  :: MaybeInt
      }

    where, let me remind you use the optional type:

    -- already in PreludedataMaybe a = Nothing | Just a

    From here, the validation function is quite simple:

    validate :: MaybePerson -> MaybePersonvalidate (MaybePerson name age) =
      Person <$> name <*> age

    A little more detail about the functions (<$>) and (<*>)
    The function (<$>) is only an infix synonym for the functor fmap

    -- already in Preludefmap :: Functor f => (a -> b) -> f a -> f b
    (<$>) :: Functor f => (a -> b) -> f a -> f b
    (<$>) = fmap

    And (<*>) is a function of applying an applicative functor

    -- already in Prelude
    (<*>) :: Applicative f => f (a -> b) -> f a -> f b

    And for the optional type, these functions have the following definition

    -- already in Prelude
    (<$>) :: (a -> b) -> Maybe a -> Maybe b
    _ <$> Nothing   = Nothingf <$> (Just a)  = Just (f a)
    (<*>) :: Maybe (a -> b) -> Maybe a -> Maybe b
    (Just f) <*> m   = f <$> m
    Nothing  <*> _   = Nothing


    Our validation works, but it is annoying to write by hand an additional routine code, since this is done completely mechanically. Moreover, the duplication of these efforts means that we will need to use our brains in the future to make sure that all three definitions remain synchronized. Would it be great if the compiler could handle it?

    SURPRISE! HE CAN! We will help the high race!

    In Haskell there is such a thing as a genus, it is also a kind , and the simplest and most accurate enough explanation is that a genus is a type of [data] type. The most widely used genus is * , which can be called "finite"

    ghci> :k IntInt :: *
    ghci> :k StringString :: *
    ghci> :k MaybeIntMaybeInt :: *
    ghci> :k MaybeStringMaybeString :: *
    ghci> :k [Int]
    [Int] :: *

    And what kind of Maybe ?
    ghci> :k MaybeMaybe :: * -> *
    ghci> :k []
    [] :: * -> *

    This is an example of a high kind.

    Notice that we can describe both Person and MaybePerson with the following unique data of a high gender:

    dataPerson' f = Person
      { pName :: f String
      , pAge  :: f Int
      }

    Here we parameterize Person ' over something f (with the genus * -> * ), which allows us to do the following to use the source types:

    typePerson = Person'IdentitytypeMaybePerson = Person'Maybe

    Here we use a simple Identity type wrapper.

    -- already in PreludenewtypeIdentity a = Identity { runIdentity :: a }

    Although this works, it is a bit annoying in the case of Person , since now all our data is wrapped inside Identity :

    ghci> :t pName @IdentitypName :: Person -> IdentityStringghci> :t runIdentity. pName
    runIdentity. pName :: Person -> String

    We can eliminate this annoyance trivially, after which we consider why it is such a definition of Person ' that is really useful. To get rid of identifiers, we can use the family of types (function at the type level), which erases them:

    {-# LANGUAGE TypeFamilies #-}-- "Higher-Kinded Data"typefamilyHKD f a whereHKDIdentity a = a
      HKD f        a = f a
    dataPerson' f = Person
      { pName :: HKD f String
      , pAge  :: HKD f Int
      } deriving (Generic)

    Generic output we need for the 2nd part of the article.

    Using the family of HKD types means that GHC automatically erases any Identity wrappers in our views:

    ghci> :t pName @IdentitypName :: Person -> Stringghci> :t pName @MaybepName :: Person -> MaybeString

    and it is precisely such a version of Person of a high kind that can be best used as a replacement for our original one.

    The obvious question is that we bought ourselves with all this work done. Let's go back to the validation wording to help us answer this question.

    We can now rewrite it using our new technology:

    validate :: Person'Maybe -> MaybePersonvalidate (Person name age) =
      Person <$> name <*> age

    Not a very interesting change? But the intrigue is how little you need to change. As you can see, only our type and pattern match our initial implementation. What is neat here is that we have now consolidated Person and MaybePerson into the same representation, and therefore they are no longer related only in a nominal sense.

    Generics and more general validation function


    The current version of the validation function must be written for each new data type, even though the code is quite routine.

    We can write a validation version that will work for any higher data type.

    It would be possible to use a template Haskell ( TemplateHaskell ), but it generates the code and is used only in extreme cases. We will not.

    The secret is to contact GHC.Generics. If you are unfamiliar with the library, it provides an isomorphism from the regular Haskell data type into a general representation that can be structurally controlled by a smart programmer (that is, us.) By providing code for changing constant types, products, and coproducts, we can force GHC write code for us independent of the type. This is a very neat technique that will tickle your toes, if you have not seen it before.

    We want to end up with something like:

    validate :: _ => d Maybe -> Maybe (d Identity)

    From the point of view of Generics, any type can most generally be divided into several structures:

    -- undefined data, lifted version of EmptydataV1        p-- Unit: used for constructors without arguments, lifted version of ()dataU1        p = U1-- a container for a c, Constants, additional parameters and recursion of kind *newtypeK1    i c p = K1 { unK1 :: c } -- a wrapper, Meta-information (constructor names, etc.)newtypeM1  i t f p = M1 { unM1 :: fp } -- Sums: encode choice between constructors, lifted version of Eitherdata    (:+:) f g p = L1 (fp) | R1 (gp) -- Products: encode multiple arguments to constructors, lifted version of (,)data    (:*:) f g p = (fp) :*: (gp) 

    That is, there may be non-liberalized structures, non-argument structures, constant structures, meta-informational (constructors, etc.). As well as associations of structures - total or associations of the type of OR-OR and multiplication, they are also cortex associations or records.

    First we need to define a class that will be the workhorse of our transformation. By experience, this is always the most difficult part - the types of these generalized transformations are extremely abstract and, in my opinion, very difficult to reason. Let's use:

    {-# LANGUAGE MultiParamTypeClasses #-}classGValidate i o where
      gvalidate :: i p -> Maybe (o p)

    You can use the “soft and slow” rules for reasoning about how your class type should look, but in general you will need both an input parameter and an output parameter. Both of them must be of the kind * -> * , and then transmit this existentialized p , thanks to dark, unholy reasons that are not known to humanity. Then, using a small checklist, we go through to help wrap our head around this nightmarish hellish landscape, which we will go around in succession later.

    In any case, our class is already in our hands, now we just need to write out copies of our class for different types of GHC.Generic . We can start with the base case, which we should be able to check, namely, Maybe k :

    {-# LANGUAGE FlexibleInstances #-}{-# LANGUAGE TypeOperators     #-}instanceGValidate (K1a (Maybek)) (K1ak) where-- gvalidate :: K1 a (Maybe k) -> Maybe (K1 a k)
      gvalidate (K1 k) = K1 <$> k
      {-# INLINE gvalidate #-}

    K1 is a "constant type", which means that this is where our structural recursion ends. In our Person ' s example, this will be pName :: HKD f String .

    In most cases when you have a base case, the rest are simply mechanically determined instances for other types. If you do not need access to metadata about the original type anywhere, these instances will almost always be trivial homomorphisms.

    We can start with multiplicative structures — if we have GValidate io and GValidate i 'o' , we should be able to run them in parallel:

    instance (GValidateio, GValidatei'o')
        => GValidate (i :*: i') (o :*: o') where
      gvalidate (l :*: r) = (:*:)
                        <$> gvalidate l
                        <*> gvalidate r
      {-# INLINE gvalidate #-}

    If K1 refers directly to our Person ' selectors , (: * :) roughly corresponds to the comma syntax with which we separate our fields in the record.

    We can define a similar instance GValidate for coproducts or total structures (values are separated by | in the definition of the data):

    instance (GValidateio, GValidatei'o')
        => GValidate (i :+: i') (o :+: o') where
      gvalidate (L1 l) = L1 <$> gvalidate l
      gvalidate (R1 r) = R1 <$> gvalidate r
      {-# INLINE gvalidate #-}

    In addition, since we do not care about finding metadata, we can simply define GValidate io on the metadata constructor:

    instanceGValidate i o
        => GValidate (M1_a_bi) (M1_a'_b'o) where
      gvalidate (M1 x) = M1 <$> gvalidate x
      {-# INLINE gvalidate #-}

    Now there are uninteresting structures for a complete description. We will provide them with the following trivial copies for non-residential types ( V1 ) and for designers without any parameters ( U1 ):

    instanceGValidateV1V1where
      gvalidate = undefined
      {-# INLINE gvalidate #-}instanceGValidateU1U1where
      gvalidate U1 = JustU1{-# INLINE gvalidate #-}

    Using undefined is safe here because it can only be called with a value of V1 . Fortunately for us, V1 is uninhabited and uninitialized, so this can never happen, so we are morally right in our use of undefined .

    Without further ado, now that we have this whole mechanism, we can finally write a non-generic version of validation:

    {-# LANGUAGE FlexibleContexts #-}validate
        :: ( Generic (f Maybe)
           , Generic (f Identity)
           , GValidate (Rep (f Maybe))
                       (Rep (f Identity))
           )
        => f Maybe
        -> Maybe (f Identity)
    validate = fmap to . gvalidate . from

    Every time you can get a wide smile, when the signature for the function is longer than the actual implementation; that means we hired a compiler to write code for us. What is important for validation here is that there is no mention of Person ' ; this function will work for any type defined as high data. Voila!

    Results


    That's all for today, guys. We got to know the idea of ​​high-order data, saw how this is completely equivalent to the type of data that was determined in a more traditional way, and also caught a glimpse of what things are possible with this approach.

    It allows you to do all sorts of amazing things, such as: generate lenses for arbitrary data types without resorting to Pattern Haskel; sequence by data type; and automatically track dependencies for using record fields.

    Happy application of high delivery!

    Original: Higher-Kinded Data

    Also popular now: