My effect on world: programming

NoSql

NoSql Database types

Document Based
Column based Usecases: Write-large number of small updates Read - read sequentially. Example: HBase

NoSql schema designing

When to have multiple collections in nosql. - If the objects you are going to embed may be accessed in a isolated way (it makes sense to access it out of the document context) you have a reason for not embedding. - If the array with embedded objects may grow in an unbounded way, you have another reason for not embedding.Embedding one to many relationship on the one side can help in saving extra queries. But gain can quickly turn into lose if those objects are getting updated very frequently.
Three basic different schema design One-to-N relationship in NoSql:
Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object.
Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects can be queried independently of 1 side.
Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions(large indefinte size)

What to choose - sql or nosql

Nosql When to use : - Schema structure If the data in your application has a document like structure(i.e. a tree with one to many relationships where typically the entire tree is loaded at once) then its probabily is good idea to use document model. However the relational technique of shreddig- splitting the document into multiple tables can lead to cumbersome schema and unnecessay complicated application code. - flexible and evolving schema. - size Suitable for big volume of data. - Compliance Suitable where eventual consistency can be tolerated. - JSON schema has better locality than the multi table schema.

cons: 1. Many to one and Many to many relatioships are very weakly supported.. As projects get bigger they tend to have more usecases. And subobjects in a document are queried independently of the main object. As soon as these usecases start to have many-to-many and many-to-one queries. It does not fit well in Json schema. This leads to breaking of hierarchial model(JSON) to relational model. 2. querying a small piece of data from a big document will fetch the whole document. 3. updation of document size form some update in some field require rewritten of whole document again. information.

Sql when to use : - size : RDBMS are at their best when performing intensive read/write operations on small or medium sized data sets. Need strong consistency. - Compliance : Usecases that require strict ACID compliance e.g. finance, Banking, ecommerce etc - Schema structure if the schema is consistent and does not change much. Also data size is limited.

cons: Does not scale well in horizontal scalability bcause of ACID rules

Notes : For highly interconnected data the document model is awkward, the relational model is acceptable and graph model are most neutral.

There is an implicit schema because the application need some kind of structure but it is not enforced by the database. A more accurate term is schema-on-read(the structure of the data is implicit and only interpreted when data is read) , in contrast to schema on write (the traditional approach of relational database where schema is explicit and the database ensures all written conforms to all).

Scheama on read advantages: Case1: there are many different type of objects and it is not practical to put each type of object in its own table. case2:The structue of data is determined by external systems over which you have no control and which may change at any time.

Famous Non sql Database

Cassandra: Records are sharded based on partition keys. Within same partition key records are sorted based on a key. BigTable: It combines multiple files in a single block to store on disk. And is very efficient in reading a small amount of data. HDFS/GlusterFS: Distributed File storage system.Suggested for Video binary stroage

Bloom filter

It gives definite answer in case a particular key is not present and it may give false positive in which case key might also not be present.
Given N bits of master set then each key is passed to K hash functions, each function will return a bit position which is then set to mark the presence of the key.

CAP theorem

It states that any networked shared-data system can have at most two of three desirable properties: 1. Consistency - Every node will serve the latest copy of the data. 2. High Availability - Any non failing node will replies to the request in a certain amount of time 3. Partition Tolerance - System will continue to function even in case of network partition.

As a consequence of CAP theoren in practice, we categorized distributed system following ways : 1.CA system: Since this particular system needs to be consistent, therefore in case of network partition the whole system will stopped working as all the nodes need to serve latest data. It is not a coherent design for any distributed application. e.g. RDBMS

CP system: This particular kind of system is very similar to CA system but in the case of network partition node will retry indefinitely(loosing Availiblity - read definition) until client times out. e.g. Big tabl, HBASE
AP system: These system will continue to serve stale data in case of network partition without comprimising availability. But system will not be consistent. e.g Cassandra, Mongo etc. AP system is also called BASE mean Basically Available Soft-state and Eventually Consistent.

Extension to CAP theorem is PACELC theorem where PAC is from cap theorem which says in case Partition(P) system can choose either Availibility(A) or Consistency(C). And ELC means in case of no Parition (E) system can choose either Latency(L) or Consistency(C). ???

Introduction:

List is a generic type or polymorphic type means it can take many shape like List[Int], List[String] etc. And to provide strong type checking, static-type language’s compiler have to know the type of data it will contain. Generally we define it with mentioning the containing type in the definition itself or contained type infered by the compiler itself if it is not mentioned. e.g.

val a:List[Int] = List(1,2,3,4) //defined with contained type

val b = List(1,2,3,4) //defined without contained type

List type also provide many operations which can behave uniformly to many contained types. Example include length, map, filter etc. Each of these operations act on list elements one at a time. Following we will see how these operation can be defined using a more generalized scheme that is provided to us by one other list operation called foldr.

List Operations length, map, filter

list operations usually apply their respective algorithm to each member of list one at a time. And recursively calls the next element till it reaches the end of the list. Following are the definition of some common operations.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = {
        ls match {
                case Nil => Nil
                case h:t => fn(h): map(fn, xs)
        }
}

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = {
        ls match {
                case Nil => Nil
                case h:t => if (fn(h)) h:filter(fn, t) else filter(fn, t)
        }
}

def length(ls:List[_]): Int = {
        ls match {
                case Nil => 0
                case h:t => 1 + length(t)
        }
}

If we observe these operations closely they uses same common pattern.

Recursion scheme : each operation is recursing the list calling itself till it reaches the end of the list.

Following are the variable part in above operations.

Computation and Aggregation: In case non empty list, each operation is applying some function (fn argument in map and filter) to the head of the list. In case of length we can assume a function which ignores its input and return constant 1. After computing we do some kind of aggregation, which helps in assembling the final result. In case of length its add(+). And In case of filter and map its cons(:) operator of list.
Empty list return type.

If we abstract out the above two variable part as the argument. We can further genralise and obtain a more abstract function. It will have the the signature like below

def operateAndAggregateOnList[A, B](compAgg: (A -> B) -> B, zero: B, ls: List[A]): B

But we do not need that as list class already have similar operation called foldr. Following is the definition for foldr

def foldr(fn: (A -> B) -> B, zero: B, ls: List[A]): B = {
        ls match {
                case Nil => zero
                case h:t => fn(A, foldr(fn, zero, tail))
        }
}

As you can see foldr definition looks very similar to the operations map, filter and length except we have abstracted out base case return type and compute and aggregate function. We can further define map, length and filter in terms of foldr.

def map[A, B](fn: A -> B, ls: List[A]): List[B] = foldr((a,b) => fn(a):b, Nil, ls)

def filter[A](fn: A -> Boolean, ls: List[A]): List[A] = foldr((a,b) => if(fn(a)) a:b else b, Nil, ls)

def length(ls:List[A]): Int = foldr((a,b) => 1+b, 0, ls)

concat in terms of foldr

Last I want to show you one another function concat which concat two lists. General scheme is same in this case as well.

def concat[A](firstLs: List[A], secLs: List[A]): List[A] = {
        firstLs match {
                case Nil => secLs
                case h:t => h:concat(t, secLs)
        }
}

Can we define concat in terms of foldr of course we can. In foldr argument teminology zero argument is second list secLs, fn is A → List[B] → List[B]

def concat[A](firtLs: List[A], secLs: List[A]): List[A] = foldr((a,b) => a:b, secLs, firtLs)

conclusion

foldr gives the abstraction over computation and aggregation scheme over a list. As we have seen its a gernealized function and can be used to define other list function which have common list traversal pattern.

My effect on world

Friday, April 8, 2022

NoSql

NoSql Database types

NoSql schema designing

What to choose - sql or nosql

Famous Non sql Database

Bloom filter

CAP theorem

Sunday, August 2, 2020

Generalization of list operations using foldr

Introduction:

List Operations length, map, filter

concat in terms of foldr

conclusion

Saturday, July 11, 2020

Java Generics Type Bounds

Introduction

Normal Type

Generic Type

Type Bounds

Upper Bound

Lower Bound

Conclusion

Statistics Notes

Search This Blog