Sunday, August 2, 2020

Generalization of list operations using foldr

Introduction:

List is a generic type or polymorphic type means it can take many shape like List[Int], List[String] etc. And to provide strong type checking, static-type language’s compiler have to know the type of data it will contain. Generally we define it with mentioning the containing type in the definition itself or contained type infered by the compiler itself if it is not mentioned. e.g.

val a:List[Int] = List(1,2,3,4) //defined with contained type

val b = List(1,2,3,4) //defined without contained type

List type also provide many operations which can behave uniformly to many contained types. Example include length, map, filter etc. Each of these operations act on list elements one at a time. Following we will see how these operation can be defined using a more generalized scheme that is provided to us by one other list operation called foldr.

List Operations length, map, filter

list operations usually apply their respective algorithm to each member of list one at a time. And recursively calls the next element till it reaches the end of the list. Following are the definition of some common operations.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = {
        ls match {
                case Nil => Nil
                case h:t => fn(h): map(fn, xs)
        }
}

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = {
        ls match {
                case Nil => Nil
                case h:t => if (fn(h)) h:filter(fn, t) else filter(fn, t)
        }
}

def length(ls:List[_]): Int = {
        ls match {
                case Nil => 0
                case h:t => 1 + length(t)
        }
}

If we observe these operations closely they uses same common pattern.

  • Recursion scheme : each operation is recursing the list calling itself till it reaches the end of the list.

Following are the variable part in above operations.

  1. Computation and Aggregation: In case non empty list, each operation is applying some function (fn argument in map and filter) to the head of the list. In case of length we can assume a function which ignores its input and return constant 1. After computing we do some kind of aggregation, which helps in assembling the final result. In case of length its add(+). And In case of filter and map its cons(:) operator of list.

  2. Empty list return type.

If we abstract out the above two variable part as the argument. We can further genralise and obtain a more abstract function. It will have the the signature like below

def operateAndAggregateOnList[A, B](compAgg: (A -> B) -> B, zero: B, ls: List[A]): B

But we do not need that as list class already have similar operation called foldr. Following is the definition for foldr

def foldr(fn: (A -> B) -> B, zero: B, ls: List[A]): B = {
        ls match {
                case Nil => zero
                case h:t => fn(A, foldr(fn, zero, tail))
        }
}

As you can see foldr definition looks very similar to the operations map, filter and length except we have abstracted out base case return type and compute and aggregate function. We can further define map, length and filter in terms of foldr.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = foldr((a,b) => fn(a):b, Nil, ls)

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = foldr((a,b) => if(fn(a)) a:b else b, Nil, ls)

def length(ls:List[A]): Int = foldr((a,b) => 1+b, 0, ls)

concat in terms of foldr

Last I want to show you one another function concat which concat two lists. General scheme is same in this case as well.

def concat[A](firstLs: List[A], secLs: List[A]): List[A] = {
        firstLs match {
                case Nil => secLs
                case h:t => h:concat(t, secLs)
        }
}

Can we define concat in terms of foldr of course we can. In foldr argument teminology zero argument is second list secLs, fn is A → List[B] → List[B]

def concat[A](firtLs: List[A], secLs: List[A]): List[A] = foldr((a,b) => a:b, secLs, firtLs)

conclusion

foldr gives the abstraction over computation and aggregation scheme over a list. As we have seen its a gernealized function and can be used to define other list function which have common list traversal pattern.


Statistics Notes

Statistics Measuring center of dataset : 1. Mean is sensitive to outliers its beneficial to use mean if dataset member are close to eac...