Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Friday, April 8, 2022

DistributedSystem

NoSql

NoSql Database types

  1. Document Based
  2. Column based Usecases: Write-large number of small updates Read - read sequentially. Example: HBase

NoSql schema designing

  1. When to have multiple collections in nosql. - If the objects you are going to embed may be accessed in a isolated way (it makes sense to access it out of the document context) you have a reason for not embedding. - If the array with embedded objects may grow in an unbounded way, you have another reason for not embedding.Embedding one to many relationship on the one side can help in saving extra queries. But gain can quickly turn into lose if those objects are getting updated very frequently.

  2. Three basic different schema design One-to-N relationship in NoSql:

  3. Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object.
  4. Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects can be queried independently of 1 side.
  5. Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions(large indefinte size)

What to choose - sql or nosql

  1. Nosql When to use : - Schema structure If the data in your application has a document like structure(i.e. a tree with one to many relationships where typically the entire tree is loaded at once) then its probabily is good idea to use document model. However the relational technique of shreddig- splitting the document into multiple tables can lead to cumbersome schema and unnecessay complicated application code. - flexible and evolving schema. - size Suitable for big volume of data. - Compliance Suitable where eventual consistency can be tolerated. - JSON schema has better locality than the multi table schema.

cons: 1. Many to one and Many to many relatioships are very weakly supported.. As projects get bigger they tend to have more usecases. And subobjects in a document are queried independently of the main object. As soon as these usecases start to have many-to-many and many-to-one queries. It does not fit well in Json schema. This leads to breaking of hierarchial model(JSON) to relational model. 2. querying a small piece of data from a big document will fetch the whole document. 3. updation of document size form some update in some field require rewritten of whole document again. information.

  1. Sql when to use : - size : RDBMS are at their best when performing intensive read/write operations on small or medium sized data sets. Need strong consistency. - Compliance : Usecases that require strict ACID compliance e.g. finance, Banking, ecommerce etc - Schema structure if the schema is consistent and does not change much. Also data size is limited.

cons: Does not scale well in horizontal scalability bcause of ACID rules

Notes : For highly interconnected data the document model is awkward, the relational model is acceptable and graph model are most neutral.

There is an implicit schema because the application need some kind of structure but it is not enforced by the database. A more accurate term is schema-on-read(the structure of the data is implicit and only interpreted when data is read) , in contrast to schema on write (the traditional approach of relational database where schema is explicit and the database ensures all written conforms to all).

Scheama on read advantages: Case1: there are many different type of objects and it is not practical to put each type of object in its own table. case2:The structue of data is determined by external systems over which you have no control and which may change at any time.

Famous Non sql Database

Cassandra: Records are sharded based on partition keys. Within same partition key records are sorted based on a key. BigTable: It combines multiple files in a single block to store on disk. And is very efficient in reading a small amount of data. HDFS/GlusterFS: Distributed File storage system.Suggested for Video binary stroage

Bloom filter

  • It gives definite answer in case a particular key is not present and it may give false positive in which case key might also not be present.
  • Given N bits of master set then each key is passed to K hash functions, each function will return a bit position which is then set to mark the presence of the key.

CAP theorem

It states that any networked shared-data system can have at most two of three desirable properties: 1. Consistency - Every node will serve the latest copy of the data. 2. High Availability - Any non failing node will replies to the request in a certain amount of time 3. Partition Tolerance - System will continue to function even in case of network partition.

As a consequence of CAP theoren in practice, we categorized distributed system following ways : 1.CA system: Since this particular system needs to be consistent, therefore in case of network partition the whole system will stopped working as all the nodes need to serve latest data. It is not a coherent design for any distributed application. e.g. RDBMS

  1. CP system: This particular kind of system is very similar to CA system but in the case of network partition node will retry indefinitely(loosing Availiblity - read definition) until client times out. e.g. Big tabl, HBASE

  2. AP system: These system will continue to serve stale data in case of network partition without comprimising availability. But system will not be consistent. e.g Cassandra, Mongo etc. AP system is also called BASE mean Basically Available Soft-state and Eventually Consistent.

Extension to CAP theorem is PACELC theorem where PAC is from cap theorem which says in case Partition(P) system can choose either Availibility(A) or Consistency(C). And ELC means in case of no Parition (E) system can choose either Latency(L) or Consistency(C). ???

Sunday, August 2, 2020

Generalization of list operations using foldr

Introduction:

List is a generic type or polymorphic type means it can take many shape like List[Int], List[String] etc. And to provide strong type checking, static-type language’s compiler have to know the type of data it will contain. Generally we define it with mentioning the containing type in the definition itself or contained type infered by the compiler itself if it is not mentioned. e.g.

val a:List[Int] = List(1,2,3,4) //defined with contained type

val b = List(1,2,3,4) //defined without contained type

List type also provide many operations which can behave uniformly to many contained types. Example include length, map, filter etc. Each of these operations act on list elements one at a time. Following we will see how these operation can be defined using a more generalized scheme that is provided to us by one other list operation called foldr.

List Operations length, map, filter

list operations usually apply their respective algorithm to each member of list one at a time. And recursively calls the next element till it reaches the end of the list. Following are the definition of some common operations.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = {
        ls match {
                case Nil => Nil
                case h:t => fn(h): map(fn, xs)
        }
}

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = {
        ls match {
                case Nil => Nil
                case h:t => if (fn(h)) h:filter(fn, t) else filter(fn, t)
        }
}

def length(ls:List[_]): Int = {
        ls match {
                case Nil => 0
                case h:t => 1 + length(t)
        }
}

If we observe these operations closely they uses same common pattern.

  • Recursion scheme : each operation is recursing the list calling itself till it reaches the end of the list.

Following are the variable part in above operations.

  1. Computation and Aggregation: In case non empty list, each operation is applying some function (fn argument in map and filter) to the head of the list. In case of length we can assume a function which ignores its input and return constant 1. After computing we do some kind of aggregation, which helps in assembling the final result. In case of length its add(+). And In case of filter and map its cons(:) operator of list.

  2. Empty list return type.

If we abstract out the above two variable part as the argument. We can further genralise and obtain a more abstract function. It will have the the signature like below

def operateAndAggregateOnList[A, B](compAgg: (A -> B) -> B, zero: B, ls: List[A]): B

But we do not need that as list class already have similar operation called foldr. Following is the definition for foldr

def foldr(fn: (A -> B) -> B, zero: B, ls: List[A]): B = {
        ls match {
                case Nil => zero
                case h:t => fn(A, foldr(fn, zero, tail))
        }
}

As you can see foldr definition looks very similar to the operations map, filter and length except we have abstracted out base case return type and compute and aggregate function. We can further define map, length and filter in terms of foldr.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = foldr((a,b) => fn(a):b, Nil, ls)

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = foldr((a,b) => if(fn(a)) a:b else b, Nil, ls)

def length(ls:List[A]): Int = foldr((a,b) => 1+b, 0, ls)

concat in terms of foldr

Last I want to show you one another function concat which concat two lists. General scheme is same in this case as well.

def concat[A](firstLs: List[A], secLs: List[A]): List[A] = {
        firstLs match {
                case Nil => secLs
                case h:t => h:concat(t, secLs)
        }
}

Can we define concat in terms of foldr of course we can. In foldr argument teminology zero argument is second list secLs, fn is A → List[B] → List[B]

def concat[A](firtLs: List[A], secLs: List[A]): List[A] = foldr((a,b) => a:b, secLs, firtLs)

conclusion

foldr gives the abstraction over computation and aggregation scheme over a list. As we have seen its a gernealized function and can be used to define other list function which have common list traversal pattern.


Saturday, July 11, 2020

Java Generics Type Bounds


Introduction 

This article is an introduction into Java generic type bounds.

Normal Type

Typically in java normal types are covariant. By that I mean a super type reference can point to sub type object. This allows substitution principle to take its shape in our day-to-day coding. I bet you must have seen this code snippet before

Animal animal = new Bird();

Generic Type

Besides normal types we have special types in Java. These Types take other types as input for their existence e.g. List<T>. These types are called Generic Types. List<T> type needs the T parameter to take shape.Where T is called Type parameter and List is Generic type.
We generally use these generic types to hold normal types and apply generic algorithms through them e.g. sorting of a List, map a Optional value to some other value etc. Other generic type examples are Class<T>, Map<K, V>, Stream<T>, Optional<T>  etc.

The substitution principle(that I mentioned earlier for normal types) in Type parameters needs some tweaking to work with Generic types. This means you will get syntax error if you try to do this.

class Holder<T> {...}

Holder<Animal> holder = new Holder<Bird>();

Though Bird is subtype of Animal, but we cannot substitute Holder<Bird> in Holder<Animal> reference. Next I will introduce type bounds which allows us to have substitution principle to work in type parameters as well.

Type Bounds

Type bounds allows us put some extra type constraints on Type parameters. Through type constraints we can limit the allowed types for Type parameters. There are two flavors of it. They are called Upper Bound and Lower Bound. Next I will explain them one by one.

Upper Bound

In upper bound, type constraint fixates the super type. That means type argument should be bounded by the super type and Type parameter can be any sub class of that super type. We use following syntax for upper bound.

MyGenericClass<T extends SuperType> ref;

Animal in below example is type constraint. The holdRef reference can point to any Holder object whose Type parameter is sub type of Animal e.g.

Holder<T extends Animal> holdRef 

holdRef = new Holder<Bird>
holdRef = new Holder<Dog>
 

Lower Bound

It is very similar to Upper bound but the constraint direction is reversed. In lower bound, type constraint fixates the sub class type and type parameter can be any super class of that sub class type. We use the following syntax for it.

MyGenericClass<T super SubType> ref;

In below example Dog is the type constraint. It limits the type parameter to be any super type of Dog. The holdRef reference can point to any Holder object whose Type parameter is super type of Dog e.g.

Holder<T super Dog> holdRef
holdRef = new Holder<Mammal>
holdRef = new Holder<Animal> 

Conclusion

In java we can use Generic types to represent classes that have very general functionality. This functionality is generally type agnostic. E.g. size of a list, it does not matter if List contains the Integer, String, etc. 

Or functionality that have some type limitation but still can be generalized to large number of types. E.g. sort functionality where type needs to implement Comparable interface through which we can know the relative order between two elements. It does not matter if we pass Integer, String or any user defined class like Employee to sort algorithm as long as type implements the comparable interface.     

Type bounds allows us to put constraint on Type parameter of Genetic types.
Upper bound limits the type parameter to be sub type of a constraint type. Lower bound limits the type parameter to be super type of a constraint type. Both prevents nasty runtime exceptions and save the programmer time and show the error at compilation time. 
    




   

Statistics Notes

Statistics Measuring center of dataset : 1. Mean is sensitive to outliers its beneficial to use mean if dataset member are close to eac...