Lexer in Haskell and Racket

Crafting Interpreters

image-title-here

This is the book I am reading and I started to code the interpreter using JDK19’s preview features. The book though uses an older version of the JDK.

The preview features introduce record pattern matching but soon I realized languages like Haskell and OCaml were built with such pattern matching in mind.

So I decided to code part of it using Haskell. And then I started to code the same parts using Racket which is a far more lofty goal. I think.

Explanation(TODO)

As I mention every time I am still learning Functional Programming principles. So I put together this code based on data structures generally used for such purposes. The intention is to drive this test based on structures like these.

Haskell Code

I will refactor this code and probably create a Git repo.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.Char ( isSpace, isDigit, isAlpha, isAlphaNum )
import Data.Either
import Data.Maybe

data Operator i = Literal i| Plus | Minus | Div
  | InvalidChar i
    deriving (Show, Eq) 

data Error i e = Error
  { errorOffset :: Offset
  , error :: ErrorType i e
  } deriving (Eq, Show)


data ErrorType i e
  = EndOfInput
  | Unexpected i
  | Expected i e
  | ExpectedEndOfFile i
  | BeginningOfInput i
  | Empty
  deriving (Eq, Show)

type Offset = Int

operator :: Char -> Operator Char
operator c | c == '+' = Plus
           | c == '-' = Minus
           | c == '/' = Div
           
scanner :: Offset -> ErrorType Offset Char-> String -> Either [Error Offset (ErrorType Offset Char)]  [Operator Char]
scanner offset error (x:xs) 
  | isDigit x = tokenize offset error (Literal x) 
  | x == '+' = tokenize offset error (operator x)
  | otherwise = tokenize offset error (InvalidChar x) 
  where
      tokenize offset error t = 
          case scanner offset error xs of
              Left err -> Left err
              Right tokens -> Right (t : tokens)
scanner offset error "" = Right []


main :: IO ()
main = do
    let result = scanner 0 (BeginningOfInput 0) "1+x2"
    case result of
        Left err -> putStrLn $ "Error: " ++ show err
        Right tokens -> putStrLn $ "Tokens: " ++ show tokens

The result is this. The error condition is not tested.

λ> :main
Tokens: [Literal '1',Plus,InvalidChar 'x',Literal '2']

Typed Racket code

I read that Racket has many levels and types of languages and here I am using Typed Racket.

I have to note that this is my first ever Racket code and the intention is to port my haskell to Typed Racket. So I have explained what I learnt using pieces of code. Neither the Racket code or Haskell code implementes the entire interpreter or even the lexer. The git repo. will have the entire code if I manage to learn Racket sufficiently to code the whole interpreter in the book.

Step 1

This compiles without errors. But the code is not tested. I will add more test code while I refactor. In this case I have a feeling that the code is more verbose than the equivalent Haskell code.

#lang typed/racket
(require racket/list)

(: operator : Char -> (Operator Char))
(define (operator c )
  (match c
     ['+' Plus]
     ['-' Minus]
     ['/' Div]))
 
(struct (i) Literal ([v : i])
    #:transparent
  )


(struct Plus ())
(struct Minus ())
(struct Div ())
(struct (i) InvalidChar ([v : i]))

(define-type (Operator i ) (U Plus Minus Div (Literal i) (InvalidChar i)))

(struct EndOfInput())
(struct Empty() )
(struct (i) Unexpected([val : i]))
(define-type (Expected i e) (U Integer Char))
(struct (i)  ExpectedEndOfFile([val : i]))
(struct  (i) BeginningOfInput ([val : i]))

(define-type (ErrorType i e)
      (U Empty EndOfInput
      ( Unexpected i)
      ( Expected i e)
      ( ExpectedEndOfFile i)
      ( BeginningOfInput i)))

(define-type (Error i e) (U Integer
                            (ErrorType i e)))
(define (parse-exp e)
  (match e
    [(? number?) (Literal e)]
    ))


(define input 2)

(parse-exp input)

Simple test

(define (parse-exp e)
  (match e
    [(? number?) (Literal e)]
    ))


(define input 2)

(parse-exp input)

But this prints

#<Literal>

without the value. So I have to learn how to print a struct.

(struct (i) Literal ([v : i])
    #:transparent
  )
 
 

This is the Racket way of printing the struct with the value.

(Literal 2)

Step 2

This took longer than I expected as Racket seems to be very different than even Haskell. Many iterations later with help from Racket experts I was able to understand how to code the function.

(: parse-exp (-> (U (Listof Char)  )
                 (Listof(U (ErrorType Integer Char) (Operator Char) ))))
(define (parse-exp lst )
  (match lst 
  ['() (list(EndOfInput))]
  [(list x xs ...)
  (match x 
    [(? char-numeric?)  (cons (Literal x) (parse-exp xs))]
    [#\+ (cons (Plus) (parse-exp xs)) ]
    [_ (list (Unexpected 0)) ])])) 

 
 

The type of this function is only slightly different than the Haskell code and the error conditions are not tested fully. The Offset of each charater is not tracked but that should not be difficult.

The output is

(list (Literal #\1) #<Plus> (Literal #\2) #<EndOfInput>)
Written on April 20, 2023