Assume you have 2 optional parameters: Maybe(firstName), Maybe(lastName)
You want to instantiate (or build an instance of) Name class which is possibly written in Java with these parameters, if they exist (ie; Maybe.Just, or Option.Some).
Ok! Let us try writing it.
Think of multiple classes similar to Name with multiple (a list of) optional parameters. Now, this is our problem definition. A quick solution would be abstracting out the above function (say f) and fold the list of optional parameters (using f), but that is in fact,
discarding the possibility of using a better mechanism provided by Functional Programming paradigm. Below given is one of the solutions. While this may not be the only solution, it is still a cleaner way of solving this problem.
A Solution
State Monad
I have intentionally avoided the use of State.modify function in scalaz, since it is less
expressive at this stage.
If you don’t know State..
You can see some of my code scribblings on State data-type (without scalaz) in the following links.
You may read them in its order. The comments in code may give you some idea on what is state data type.
That will make you feel comfortable with some notes in scalaz tutorial (Google).
We have heard about this design principle in software development.
Let’s get straight into a problem and try and understand what this principle says!
Let us design an ATM (Automated Teller Machine).
An ATM has transactions. A transaction can be WithdrawalTransaction, DepositTransaction, TransferTransaction etc.
Next, we have a UI for ATM, and UI can be Braille UI, Screen UI and Speech UI.
Different transactions can speak to UI methods to publish their message, however, the message might differ based on a transaction.
Having a single UI interface having RequestTransferAmount, RequestDepositAmount that can work with all the types
wouldn’t be the right solution here. It is because, a type of transaction, say DepositTransaction, can enforce a change to UI interface to add a new functionality, which would further lead to
changes in all the other transactions that extended the same UI. In short our the problem is:
“A change in UI for solving a problem withDepositTransaction resulted in changes with TransferTransaction.
It looks like we need a UI for Withdraw, Deposit and Transfer.i.e, DepositUI, WithdrawUI and TransferUI.
First stage of Solution:
All Transaction classes extend Transactions as expected.
We have separate DepositUI, WithdrawUI and TransferUI that gets mixed into UI.
A particular transaction won’t work with UI since it is already fatty. (We call such an interface fatty because not all of these functions would be relevant for a particular implementation. That is, TransferTransaction would have to provide a fake/dummy/empty implementation for RequestDepositAmount if it is extending this fatty interface)
All transactions can then be provided with its corresponding UI type as given below
Intent of doing this, clients should not be forced to depend on interfaces they don’t use. This is called interface segragation principle
So, here we go: A part of the solution:
This is it!
A few more points to note
You may note that, DepositTransaction must know about DepositUI and Withdraw Transaction must know about WithDrawUI. As you can see,
we solved this problem by making the constructor of each type of transaction expecting the right type of UI. We could pass in the UI
as given below
Another way of handling this problem is to have a global package listing down all types of UIs. i.e, static global objects. Hence we can avoid
passing in the UI during the construction of different transactions. However, they are two different approaches that allow us to follow
Interface Segregation Principle. Please note that if these globals are put into a class instead of a package, that would be violation of the principle as we are in a way
combining all the interfaces together when we import this class to work with a specified transaction. Personally, I would go with the solution
given in the above example, that is, pass the right UI to the right transaction.
The polyad and monad - Monadic is not always the best approach
Assume that, We might be in need of a function f that has to access both DepositUI and
WithDrawUI. We could separately pass these UIs to f as given below.
We made the function polyadic, and passed multiple parameters into it.
Let me quote a sentence from the book Clean Code written by Robert.C.Martin here
A function with two arguments is harder to understand than a monadic function. For example, writeField(name) is easier to understand than writeField(output-stream, name).
Dyads aren’t evil, and you will certainly have to write them. However, you should be aware that they come at a cost and should take advantage of what mechanism may be available to you to convert them into monads.
In the above example, you might have noted that we are mixing in various UIs to have a UI type. Following the above quotes
let us make our function monadic. (You may note that UI in our example is a mix of all interfaces)
Here we made it monadic, however it comes with cost - it drifted us away from following our interface segregation principle. That is, any change in any of the interface affects the function f and all of its clients
forcing it to recompile. Hence we prefer polyadic approach here..
What is State transformer and STRef trying to achieve?
Encapsulating stateful computations that manipulate multiple mutable objects in the context of non-strict purely functional language.
Once the actions are encoded in ST, that if we try to access a naked mutable object, it throws a compilation error.
Let us define the ST monad (in scalaz). In the simplified form it is,
Monad with unit as returnST and flatMap as flatMap itself.
The difference between State and State-monad is, the state is mutated in place, and is not observable from outside.
The World represents some state of the world. (I have tried to use scalaz’s State instead of ST monad (basically, S instead of a World[S]) to solve the same problem, but got stuck in between. If interested, you may have a look at this line of code in my FP exercise repo. implementations)
It encapsulates a state transformer.
The contents of the state doesn’t really matter, but the type is (for what we want to do with ST monad - transform the state by mutating objects in place)
Since type S is unique for a given ST, it is going to be a pure function.
STRef
STRef is a mutable variable (updatable location in the state capabale of holding a value) that’s used only with in the context of ST Monad. And that is, ST[S, STRef[S, A]]`
Exposing STRef by any chance should lead to compile time error, or in other words no freedom to compose STRef by itself without a State thread.
Examples
i.e, newVar(a).run in scalaz will throw a compile time error.
newVar(a).flatMap(_.mod(_ + 1).flatMap(read)).run exposes only a and not STRef(a) to the client making it super safe.
Rank 2 polymorphism technique is used here.
Refer to [my project]((https://github.com/afsalthaj/supaku-sukara/blob/master/src/main/scala/com/thaj/functionalprogramming/exercises/part1/PureStatefulAPIGeneric.scala#L243) for further examples, where we try to run the actions in many ways resulting in compile time errors.
I am writing this with a hope that you would get a wide angle perception on the term “Symbol” in programming. I am deliberately mixing multiple programming languages to make our understanding better, and at-least one of them would make sense to you - if not entirely - intuitively! As you read through, you would discover better reasons on why we use multiple languages to explain this concept.
Before we talk about Symbol….
Before we explain Symbols, let’s get familiarised with the behaviour of strings (instantiation, storage, etc.) in all these languages.
let x be "afsal" and it’s a string type (not language specific). x will be allocated a memory space M and is identified by an id (sometimes we call object_id). Invoking x is getting the contents of x identified by id.
Note: Most of you would guess this to be an intro to reference identity, reference equality and related topic. But, for the time being, let us restrict ourselves on to the terminologies mentioned above.
Some code in Python
Let us use Python here as an example of getting the id of a variable. You may not see this feature in all languages.
But the concept remains the same.
String with same contents share same id?
As seen above, the strings with the same content share the same id.
It means, when we tried to define y with the same content as that of x, the application looked up the heap/memory and identified that y could reuse the contents of x.
If y needs to re-use x, obvious that they should have the same object_id. id (x) == id (y)
No extra copy of y is created, and we saved some space.
In Java/Scala?
In computer science, the behaviour is termed as string interning. That is, internalising the strings will ensure that all strings with same contents share the same memory.
One source of drawbacks is that string interning may be problematic when mixed with multithreading, but this discussion is out of scope.
In the above examples with python/ruby, the string x is internalised automatically, so is for Java/Scala. (However, ou will find differences in behaviour across these languages soon)
In java, String.intern() internalise the strings forcefully, but you don’t do this generally.
The intern() method returns a canonical representation of the string object.
Again, for demonstration I am using Scala console. I strongly recommend to view this example as a conceptual explanation of String interning, and it doesn’t intend to explain the various functions in Scala/Java and its differences. For the time being, all that you would need to know
is eq method in Scala verifies if two strings are pointing to the same memory location.
Automatic String interning: naive testing in Scala (Skip through if you don’t care)
Let’s do a simple test on the automatic interning of Strings. For simplicity, let’s use Scala console. Let’s create ten strings, each with 10000000 characters of ‘a’.
All ten long strings have same contents, and ideally, they should share the same id.
Is interning a property of Strings?
Yes, it is a property of strings. You won’t find an intern method for an Integer variable.
However, you may note that ids are constant for certain primitives. Ex: id of integer 1 is always the same.
Example: In python, it would look like this. You can verify the same in Ruby, and it behaves in the same manner.
Is this a consistent behaviour for Strings?
Surprisingly, the answer is NO.
In java world, we say there is no guarantee that strings would be internalised and it depends on JVM’s whim, and probably the content itself. Python doesn’t intern strings with special characters for example.
Let’s analyse the consistency of this string interning in back to python/ruby (programming language does matter here). We will come to know String interning, being such a generalised term, behave in different ways in different languages.
In python:
No default string intern in Ruby?
In Ruby, you would see something known as object_id. It is the equivalent of id in python. But as per documentation, the object_ids always differ for 2 active objects.
Hence, the following code gives different object_id (or id) for two variables with the same content afsal. There is no string interning happening here!
How to intern strings in Ruby?
The answer is “Symbol”.
To make it further simpler, call the method intern for a string in Ruby, and you get a “Symbol” in return.
Now you must be wondering, can we solve the intern inconsistencies in any language using Symbols? Yes and No. Yes for languages who has specific Symbol type but we may not always use this in practice, and there are alternative methods in languages that don’t have symbols inbuilt. In other words, even if some languages don’t have the concept of Symbol, they have string internalising, and there is one or other way of doing it. In Ruby/Java/Scala it is done by making use of symbols. In Python, use the method intern, and in Clojure, we could use keywords or symbols. Let us try to understand it better.
Symbol
The concept of symbol is not language specific, but they differ in some or other ways. Let us explore.
Symbol in Ruby
As mentioned in above examples, the way Ruby handles intern is by converting it into symbols. i.e, Symbol representation of String "afsal" is :afsal.
If you are not using symbols, every time you define “afsal”, Ruby instantiates a new String object, interpreter looks at the memory (heap) and allocate a new memory, assign a new object_id to keep track of the object, and interpreter marks it for destruction if not used often, resulting in the repetition of the whole steps next time the String “afsal” is defined. It can affect the performance - especially when it comes to massive datasets.
Some performance comparison:
Let us define a string 100000000 times and similarly a symbol 100000000 times, and see which one performs better:
Hurray! symbols are performing almost ~2 times faster than string.
Usage of symbols:
Many times symbols are used as identifiers.Example: Every method name in Ruby is saved as a symbol under the hood.
Symbols are widely used as keys in your hash. By using symbols as keys, Ruby need to compare only the object_ids of the already stored key with the new ones, and not its contents/compute-hash-of-each-value. It could be used anywhere with-in your application.
Are symbols always better than strings?
Be aware using excessive use of Symbols results in lots of memory usage. The frequent casting of Symbols to Strings can also slow down your application. Said that memory leakage due to the usage of Symbols is not a concern in Ruby anymore (for version > 2.2) as there is symbol garbage collector now.
Symbol in Python
There is no python equivalent for Ruby’s symbols. However, as you have seen from the above examples, they are interned by default. We have also seen a few examples of strings that were not interned by default in Python. But we can force the intern of those strings using the intern function.
Symbols in Clojure
Clojurists have Strings, Symbols and keywords. This trichotomy may confuse many. It may partly make sense to you as we need String and Symbol.
According to documentation, the one which provides faster equality tests is given by keywords (i.e.,afsal). Also, you may observe that these keywords are not Symbols. As far as I can understand, a Symbol in Clojure/Lisp is mainly used to manipulate the function names, variables and also program forms (closely related to macros).
We could use Symbols (‘afsal) to manipulate objects, but that is done less common in practice. Another difference is while Symbols are namespace qualified, keywords are not.
However, we could explicitly qualify a keyword in Clojure to a particular namespace by giving an extra column
Although the difference is not explained in detail here, it is important to know at-least this much if you are into Clojure.
Conclusion
In dynamic languages, symbols are often used to identify things that have a stronger meaning than a string content, identifiers that are often used more than once. Moreover, in homoiconic languages like Clojure, where code can act as data, the programmer has control over manipulating functions and variables using Symbols to produce various custom behaviours. However, in statically typed languages we could argue that your comparison space is already restricted by types, and most of the times the homoiconic nature doesn’t exist. Hence, although symbols have the same meaning in the context of Java/Scala (i.e., guaranteed interning and faster equality operations), they are probably less used in practice when we compare with Ruby or Clojure.