Tuesday, November 23, 2010

Devoxx 2010 Wrap Up (Epilog)

This post ends my blogs around Devoxx 2010 that has just ended on last Friday. Overall, it was an excellent conference, very informative, and refreshing too.

If there are two things to retain from the content of the conference -- at least for me -- it would be Scala and NoSQL. I believe that these two subjects have taken so much interests from many people who came to the conference. 

The first Scala touch was on Scala lab session on 2nd day that  were packed before 9.30. Couple of programmer fellows simply could not join the session because of that. The trends were confirmed in the evening, at BOF session where again, many conference attendees needed to stay out of the room. Two Scala 2.8 presentations the next day confirmed even more the popularity of Scala in the conference. Akka presentation by Viktor Klang might not have the same success to the others, but it was still a massive achievement for a specialized library outside the language itself. Not to mention that Scala was also mentioned by Brian Goetz during his session, by Josh Bloch/Bill Pugh in Java Puzzlers too.
For the content of the presentation, I would mention Odersky presentation on the Scala 2.8 collection as my favorite presentations on the subject. The Odersky's presentation has shown the power of high kinded Scala programming language -- some are scary things -- but it shows how productive a Scala programmer might be.


Fortunately, not all things about Scala were good. Stephen Colebourne, for example, said that Scala would not be the next JVM language. I'm not sure about the reason yet, but it might be because Scala has done too much than what the Java programmers wait. Other problem was when Scala seems to become victim on his popularity when tweets on Scail - a fictitious new Scala web framework - flooded the Internet. The joke came from Scala BOF, that made association Ruby => Rail, Groovy => Grail, Scala => Scail. I was also the victim of the jokes when I retweeted a tweet on the joke - Damn !

The second subject of the conference was nosql. They were just fantastically represented. We had Hadoop, HBase, MongoDB, Cassandra, and Voldemort. We had even Mahout, one of the Hadoop sub project . I would like also to add Elastic Search to the same category. I attended almost all nosql sessions, except Cassandra and Voldemort that unfortunately were at the same time as Odersky's Scala Collection and Goetz's Project Lambda. On the other hand, I attended the two sessions of Tom White from Cloudera on Hadoop, the two sessions of Jonathan Gray on HBase (and Facebook). I would even dare to say that Tom White was one of my favorite speaker in the conference. I would say that I'm particularly impressed by the Nosql movements although still have some reserves on the subject.

The nosql trends are popular because of its ability to scale out in the presence the world of very huge amount of data, pentabyte. Twitter, Facebook, Youtube, ... are the most visible use case of such application, but there are more than those guys. Twitter and Facebook had  interesting presentations where Dimitri Ryaboy  and Jonathan Gray showed an impressive list of on going projects on the subject (yes, Facebook announce on its HBase use just before the conference played in favor of nosql again) . No doubt, the movements will play important roles in the near future. I myself, am very interested in a specific conference on the subject, called Buzzwords that will take place in Berlin 2011.



Of course, there were not only Scala and nosql in the conference.
There were the future of Java where Mark Reinhold, Brian Goetz, and Dalibor Topic played important roles during the conference with their sessions on Java Modularity, Project Lambda, and Open JDK. Couple of interesting things from their presentations (note that, the cinema was full for Brian Goetz').

And of course, the Java Puzzle session by Josh Bloch and Bill Pugh, who could forget such inspiring session. Thanks for entertaining and enlightening us. 


I would like also to mention an interesting presentation on Apache Camel by Claus Ibsen. Camel is awesome. Finally,  a pretty cool presentation was made by Talip Ozturk on Hazelcast.
-

While the contents were excellent, there were still many things that could be improved. First, the transport. Since there were not enough hotels around the conference, a lot of people come from hotels in the center of Antwerp. The trams from city center to conference venue were simply too small and overloaded in the morning. Buses specific for the conference would have been better.


Finally, I would like to thank to SII, my employer who has financed my travel to the conference, and of course would love to come back to the conference next year. Thank you too for my colleagues at Amadeus who have shown the interests on the conference. Needless to say however, all opinions in this blog during the conference are fully mine, they are not from SII nor my client, Amadeus.  

Notes on Devoxx 5th day

The fifth day notes is here, finally. I've been so lazy to write the 5th day notes, you know, kind of post-conference syndrome.

There were only 4 parallel sessions compared to the other days where there were 6, and the conference ended at 1 pm, compared to 7 pm the other days.


Keynote Panel on the Future of Java by Joshua Bloch, Mark Reinhold, Stephen Colebourne, Antonio Goncalves, Juergen Hoeller and Bill Venners


  • Josh Bloch worked for Sun before, and he was the architect of the Java Collection, now he's working for Google. Author of Effective Java.
  • Mark Reinhold, from Oracle, he's been in Java longer than Josh.
  • Stephen Colebourne is a Java champion and active in Apache Software Foundation, he is kind of sitting between the two worlds: Oracle and ASF. During the conference he created an important buzz by telling that "Scala will not be the future Java".
  • Antonio Goncalves represented the Java community. He's the Paris JUG leader.
  • Juergen Hoeller is from Spring Framework.
  • Bill Venners represented the new languages on JVM, e.g. Scala.
The panelists were asked couple of questions coming from Devoxx whiteboards or from a site specifically built for the panel.

There were couple of interesting questions like how Oracle would be perceived by Java community, Android, OSGi vs Jigsaw, around JCP and of course Doug Lea departure, impact of functional programming to Java, a modification in Java reflection, what Java does not have compared to .NET, whether Java should keep its backward compatibility, until what is a typical Java programmer.

The panel was really interesting, Stephen talked a lot, Bill talked a little. The others were in the average. I admired Mark Reinhold that was pretty available to answer the questions, while actually he could have avoided some. 

If there are two things that I should retain from the panel, it would be TCK: No restriction on TCK would be lifted, that's final (bye ASF), and the wait and see of Java User Group regarding the possiblity to be included into Oracle User Group. I should also mention that Josh said "but Java has already functional programming capabilities". 

At the end, the panel was quite informative, and it was nice to see Josh Bloch and Mark Reinhold sat side by side. The image would be good for Java community.

Apache Camel by Claus Ibsen

As a user of JMS, JMS is damn complicated. The messaging should not be that complicated. That's exactly what Apache Camel proposes: simplifying integration of n-tier systems.It is done through Enterprise Integration Pattern, which is a catalog of patterns. Messaging, Remote Invocation, Message Filter are some random picks of the patterns.

Oh, yes. Apache Camel has Scala DSL and from the Akka presentation the day before, Akka supports Apache Camel too.

Quite a cool presentation although I didn't know Enterprise Integration Pattern that much, but this one is worth to have a look though.

Elastic Search by Shay Banon

Elastic Search is a free-text search engine, specially designed for cloud environment. It is a Lucene-powered system, and in this regard it's just like Solr or Hibernate Search. 
Unlike Hibernate Search, however, elastic search works on document-oriented database.
The data model is represented in JSON, the query is also in JSON DSL.
Finally, the elastic search is distributed, so Shay explained the index replication algorithms (sharding, etc.).

--
Devoxx ended by attending Elastic Search and Apache Camel sessions. Quite good presentations to finish the conference. I took some times to visit Diamant Museum in Antwerp downtown and then flied back to Nice from Brussels in the evening.

Friday, November 19, 2010

Notes on Devoxx 4th day

This blog might be my last daily note on Devoxx university and conference. Everything has an end, and devoxx must end tomorrow (Friday, November 19). I'm not sure to write notes on tomorrow sessions, and if I do, it will be for Saturday at the earliest.


After having difficulties in public transport yesterday, this morning I left a little bit later with the risk of missing the first couple of minutes of keynote speaker. 


So, here is my summary of the day. 


Future Roadmap of JEE (Keynote), Jerome Dochez, Linda de Michiel, Paul Sandoz


JEE on Cloud (Jerome Dochez, JD)
When I arrived at the conference, Jerome Dochez was presenting JEE on cloud environment. He mentioned that the cloud support should not be revolutional, but should be an evolution. Programmers should not be asked to change a lot of things from what they have known so far.

At least two things he mentioned particularly: State Management and Better packaging.

He finished the JEE on Cloud presentation part by running a small and successful demo on GlassFish.

Modularity (JD)
There are important efforts on going on making JEE more modular, especially to leverage the development on Java Modularity in general (Jigsaw). Unfortunately, the dependencies to Jigsaw means that modularity on JEE would also be late. 

Some points on modularity that I noted:
  • Applications are made of modules (modules in term of Jigsaw)
  • Dependencies are made of explicit instead of by convention of configuration.
  • Versionings are built in.
JSF (JD)
He mentioned two kinds of modifications: short and long term. Some of the short term ones:
  • Transient state saving
  • XML view cleanup
  • Facelets cache API
  • XML free  (oops, it was written fee in his slide, but yes, it is to removed XML taxes)
He mentioned also about support of HTML 5. I'm not sure whether it is shorter or longer term.

JMS  (JD)
There has been almost no important modifications on JMS, and this time it will change, there will be important modifications on JMS. The modifications include ambiguities resolution, standardize couple of vendor extensions, integration with other specs and also to non-Java languages (which ones ?).

Web Tier (JD)
WebSocket support, Standard JSON API, and NIO2-based web container (I'm not sure to understand the relation between NIO2 and web container, but anyway ...). He mentioned Grizzly library in his presentation.

JPA (Linda De Michiels)
There are couple of interesting things in her talks. I noted some interesting things only here:

Mapping:
  • Support on custom mappings.
  • Dynamic fetching plan. This is contrast to JPA today that requires fetching definition upfront using annotation (EAGER, LAZY).
  • Better support for immutable attributes (read-only entities)
  • More flexible XML descriptors. Yeah, XML, why don't make it Java based ??
API:
  • Additional event listeners and callbacks Really ?? Are they really used in production ?)
  • Support for dynamic persistence unit. This one is cool, no XML, right ?
  • Inspection on persistence unit. Cool as well
Query:
  • Stored procedure support. 
  • Interoperate JPQL and criteria queries. For example, create criteria query from JPQL.
For me, all programmatic supports like dynamic fetching and persistence units are the most interesting improvements.


JAX ??? (Paul Sandoz)
I did not really follow the last part of the presentation, no notes I could share here.


The Essence of Caching by Greg Luck


A side story -- this is the only presentation so far where the company I'm working for, Amadeus, is mentioned. Yes !! :-)

Why caching ? 
Because of performance problem




Amdahl's Law
What important to keep in mind on performance optimization is the Amdahl's Law: 
Speedup = 1 / ( ( 1 - f) + ( f / s) )  where f is the proportion of program being sped up.  Illustration: Making 10% of portion of the system 20 times faster makes 1.105 overall improvement.

The law is important in deciding the part of the system to be sped up. For example, if the problem of downloading a page is on the downloading of its content, there is no point of improving the server side code. Maybe CDN is needed in this case.

Performance Problem Sources  

  • Rendering
  • Program
  • Marshalling/unmarshalling data.
  • Database

Caching solved the problem mainly by offloading some data to a cache, e.g. to memory-based cache.

Cache Efficiency 
Cache efficiency = Cache Hits/Total Hits. 
Needs to take into account pareto principle.

Cache Coherency
To handle cache coherency, one of the simplest solution is by applying TTL + LRU. But there are other strategy: Eternal item + invalidation strategy, write through pattern.

Cache in Clustered Environment
A problem called N* problem is inherent in cache in clustered enviroment. The problem is by also clustering the caching. This introduces another problem: bootstrap problem and cache coherency again.
CAP Theorem
CAP = Consistency, Availability, Partition Tolerance => there must be trade off of the three in clustered environment.

The session was very interesting and informative, all that only in one hour. The number of audience in this session was quite important.


Akka by Viktor Klang

I saw couple of presentation of Akka online, and this morning I saw it live. Even with the pretty distracting Devoxx template, Akka presentation was still excellent. Great job by Viktor and Akka team. 

Akka is designed taking into account that it is hard to make concurrent program right. Akka comes with two solutions: Actor and STM. Akka has Scala and Java implementation. In general Scala implementation is better, but Akka has succeeded in removing a lot of boiler plate codes almost inherent in Java.

Actor
Actor is a higher level abstraction on thread. It has an important property of shares nothing. So, one actor does not share anything with any other actor, so actors work in isolation (unlike Clint Eastwood, George Clooney, ..., that cannot work in isolation, although they might share nothing too). The communication between actors are through message passing. Each actor has mailbox where the message is queued.

Three different types of message sending: sending fully one way, one way but with implicit future, and one way with explicit future. In Scala, they are represented by !, !!, !!! methods.

For fault tolerance, Akka uses "let it crash semantics" stolen from Erlang. It also has a notion of supervisor hierarchy.

Actor can be remote, and there are two types of remote actor: client-managed remote actor and server-managed remote actor. Client-managed actor is handy, but it of course cannot be deployed in untrusted enviroment. 

Remote actor implementation is based by Netty and uses ProtoBuf.

Software Transactional Memory (STM)
Akka supports STM. It provides couple of transactional data structure like transactional maps, transactional lists, and so on.

Java implementation uses a library called Multiverse.

The combination of Actor and STM called Transactor.

Miscellaneous
Akka has a couple of interesting add-ons like Spring add-on, Camel , MongoDB, CouchDB, and couple of other things.

Viktor's presentation was awesome. Akka is awesome.


Data Management at Twitter Scale System by Dimitri Ryaboy

Dimitri presentation was not only about Hadoop how and all its ecosystems are used in Twitter. Hadoop is appropriate for offline processing, but the presentation is not only about offline processing. It's also about online.  The presentation was dense and presented very quickly that I had sometimes difficulties to follow. But, here are some notes:
Twitter is 95 million tweets, with 3000 TPS = Tweets per seconds. 

First problem, UUID Generator. Twitter uses Snowflake: https://github.com/twitter/snowflake. The issue is how to make the generator scalable. The UUID is not necessarily sorted, it should only be roughly sorted (k-sorted).

Second problem, Sharding. Twitter uses Gizzard https://github.com/twitter/gizzard , a Scala-based framework for sharding. Sharding is, by the way, storing data accross multiple nodes. 
Gizzard maps a range of tweet to a particular shard. A shard is mapped to a replication tree (hmm..., not really sure to understand this, but, I write it here anyway). Shard can be physical when it refers to a particular backend. It can also be logical, when it refers to other shards.

Third problem, Fault tolerance. I lost a little bit in this section. Not much I have in my note.  The only thing I have in my note is the system must be tolerant of eventual consistency and stay CALM (Consistency as Logical Monotonicity). Hmm... this is pretty puzzle for now. But anyway, that was about fault tolerance.

Fourth problem, is about timeline treatment (message vector cache). To display timelines means that billions of tweets must be filtered to only show messages from the people one follows The solution: Haplochairus.
Because it is a cache: cache effeciency.

FlockDB is the solution used for social graph store. It is basically a customized distributed index database. It is used to handle, e.g. intersection operations on @ . https://github.com/twitter/flockdb .

Cassandra is used for geo database like nearby search and realtime data analysis.
Gaglia for monitoring.

For offline processing, Hadoop is used. Hadoop is appropriate for some analysis that cannot be  achieved using SQL.  
Elephant-Bird is used to work with data in Hadoop, and finally HBase is used to address mutability and random access in Hadoop.

Excellent presentation from Twitter engineer.  Love it very much.

Hadoop, HBase, and Hive in Production at Facebook by Jonathan Gray

The previous presentation came from Twitter and this one is from Facebook.  First, Jonathan explained why HDFS / Hadoop. Basically, the choice came from the fact that traditional database processing was slower than the debit of incoming data: need 24 hours to process one day data.  

The use of Hadoop introduced some other problems: difficult to write mapreduce jobs. Solution: Hive. So Hive is the datawarehouse solution for Facebook. But Hive is itself still not user friendly: fear of command line. HiPAL is introduced for querying using web UI.

Current limitation: name node is still a single point of failure. The high availability solution today is not enough because it takes hours for backup name node  to fail over. Facebook is now working on something called AvatarNode. Jonathan claimed 10s of failover using AvatarNode.

Other limitation is a non-optimized map-reduce. Facebook is working on Hive optimization. Other problem is about better scheduling, called fairshares scheduler, that controls the task by its priority/nature.  Jonathan claimed that queries at Facebook to be less than 10 minutes now.

HBase is used because it's linearly scalable, fast indexed units, and integration with Hadoop.  It is also suitable for realtime analysis because it has optimized increment. 

Why Cassandra was not selected for Facebook Messaging ? There is a probblem of consistency  at Cassandra that does not suit messaging requirements. HBase, is good. 

Modularity in Java by Mark Reinhold

Mark presentation this afternoon was really interesting. Unfortunately, I was so tired to write notes that I missed many points in his presentation. Instead of writing something completely wrong. Although my notes have many erros given the times used to write ones, but it should be OK for a blog. I'm afraid that if I write something here, it would completely be wrong, even for a blog. 

At the end of his presentation, Mark took some times explaining "Why not OSGi" that should answer many questions on the subject. 

At the end of his presentation, Mark Reinhold gave us one URL to follow: http://openjdk.java.net/projects/jigsaw/ .  He said that the project was not that active because of JDK 7 delivery deadline, but it will come back soon.

Java Puzzler by William Pugh and Josh Bloch
Before starting hist presentation, Mark started with a joke "The reason I'm here is to make sure to have the best place for Java Puzzle session". That should describe how popular this session is. Indeed, it was the most popular session so far at devoxx. 

The two speakers came with 6 brand new puzzles around couple of subjects. Well, I will not write them here because it will not be fun and even if it is fun, it will take some times. The subjects they took are around generics, collections, raw type ,big decimal, and couple of other things. 
One thing that I keep in mind is: "Do not ignore warning of typing" issued by the compiler. 

I answered 3 of 6 puzzle correctly, and considered one of them as cheating puzzle :-) , so 3 of 5, not bad, heh ? Yeah, not bad. But, to be honest, only one of them that I answered correctly with the correct explanation too.

--

OK. Java Puzzler completed my day. It was just teriffic day. The best day at devoxx. Quite sad that it will end tomorrow... Back to Nice in the afternoon.

    Thursday, November 18, 2010

    Notes on Devoxx 3rd day

    We entered the conference today, and yes, it was a long queue everywhere: transportation, breakfast, vestiaire, session, everywhere. 
    But, the sessions were very interesting, at least majority of them.


    Here is my summary.

    Introduction by Stephan Janssen
    Cool introduction by Stephan Janssen and on Parleys. We can have the video of all presentations through Parleys for only 79 Euros. With that, you can watch Parleys presentation in the toilet, if you have iPad of course.


    Keynotes by Mark Reinhold
    Mark Reinhold presented a lot of cool things on Java. I tweeted something that I regreted afterward, I tagged his presentation was disappointing. It was indeed disappointing for me, because I expected to have some new information on Java evolution. He didn't present anything new, but his presentation was an excellent presentation on Java evolution and roadmap. You cannot not have better presentation on the roadmap than his presentation.


     He started his presentation with a statement that Oracle wanted Java to exist until 2030.  He then explained some axis of improvements: productivity, performance, universality, modularity, integration, servicability. 


    For productivity, he cited some of the works on Project Coin, like the infamous <> symbol and Automatic Resource Management (ARM). 


    For performance, he mentioned about some changes on the language by introducing the results from project lambda like lambda expression, method extension, and some of other things on the subject. But, as Brian Goetz confirmed later on, this must be done in the context of performance improvements in the context of multicore.


    For universality, he mentioned about support of JVM for other languages. I saw more or less the same cloud of languages before in SophiaConf, with Scala and Groovy in relatively small size. I found the presentation on the subject was very short. There will be a lot sessions on new language on JVM at Devoxx, but no presentation on the corresponding JSR.


    For modularity, he presented quickly the jigsaw project with the concept of simplifying classpaths with jmod. jmod install, jmod add-repo can be used. A little bit inspired by ruby gem, I suppose.


    For serviceability, he talked a little on a quite sensitive subject: JVM convergence.


    He mentioned two interesting things: about the possibility of having reification in Java and also value class (= Scala case class?). 


    Overall, nothing really new in the presentation, but I think that was a comprehensive presentation that Java developers expect to have directly from the source.


    The State of the Web by Dion Almaer and Ben Galbraith
    You know what? This was the most entertaining presentation so far in the conference. Small problem: it was so entertaining, so well choreographed,  that I didn't really get the contents. 


    I didn't even take time to take some notes, the presentation was just too entertaining with graphisms and stuffs, really good for your eyes (but maybe not for your brain). So, just couple Math.random points that I vaguely retained from the presentation. 

    • Application is content.
    • Mobile application becomes very important.
    • HTML 5 is great.
    • Web needs app store model.
    Not much really, so I should stop here too.

    JPA by Linda de Michiel

    I think I made a mistake on coming to the session after reading back to back at least 2 times JPA book by Schincariol and Keith. All in  Linda's presentation are in the book. Sorry, but I could not really tell much on the presentation, just read the book. It was an extremely dense presentation though.

    The State of Hadoop by Tom White
    In term of style, Tom White's presentation on Hadoop was completely opposite of Almaer & Galbraith presentation: Tom's style is very monotonic and without rythms. But, guess what? I got his contents better.

    Tom started by explaining the replication mechanisms in HDFS, followed by on how read and write are done. He made quick overview on the algorithms of the read and write that optimizes the bandwidth usage.

    Then, the explanation moved to failure modes, especially the differences between data node crash and name node crash. When data node crashes, client will read from other replicas and name node instructs data node to replicate. Name node crash is a much important issue because it means down time. There is an ongoing effort on high availability name node: https://issues.apache.org/jira/browse/HDFS-1064 .

    The presentation continued to map reduce overview. Input => Map => Shuffle => Reduce => Output. Tom used an illustration of unix pipeline to illustrate the concept. Pretty cool.
    Tack tracker and job tracker failure modes were then explained. 

    He showed also some examples on Hadoop use. 

    The presentation moved forward to ecosystem. He showed an overly complicated project graphs showing the relations among the Hadoop projects. Very complex, but then he simplified (he mapped and reduced, right ?).

    Fundamental Projects

    Some projects are fundamental projects. They are HDFS (the file system), map reduce frameworks, zoo keeper, and avro. Zoo keeper is a coordintion service for distributed application, including leader selection algorithms and distributed locking. Avro is data serialization library.
    The main challenges on the fundamental projects are the API update impacts and multi programming language support, typically Python.

    Component for Analysis

    Some projects in Hadoop are intended for analysis. We have Pig, Hive, Cascading, Mahout. Pig and Hive are the data retrieval components using SQL like language or a pretty independent query language (Pig). Mahout is a library for machine learning that implements some of map/reduce algorithms.

    Howl project is intended to share table among services.

    Components for Data Loading

    Some projects are intended to facilitate data load components like Sqoop and Flume. 

    Components for Coordination

    Oozie and Whirr are examples of components that handle coordination.

    I found the presentation extremely fluid and informative. Unfortunately, like the other Tom White's session, the audience is not that responsive. I wonder why.

    Lambda Project by Brian Goetz


    One thing that one must always keep in mind, lambda project is intended not to make Java more concise, but to make the modification that is applicable for parallelization. With this in mind, not every cool things that language like Scala propose would be available in Java.

    Nothing really new in Brian's presentation: 
    • Use of SAM instead of function type.
    • Some starter SAMs are to be included in the JDK.
    • Method references that allow do Collections.sort(persons, #Person.getLastName). 
    • I did not hear the word "defender methods" anymore, I heard a lot extension methods instead.
    • Exception transparency.
    • Interface conflicts handling.
    What's new in Scala 2.8 by Dick Wall and Bill Venners
    Dick Wall and Bill Venners presented new things included in 2.8. They presented using live coding. Very interesting way of presenting things, although some errors made the presentation long. But, this duo is simply my favorite in the conference so far. Dick Wall and Bill Venners for President !
    • Tooling. Dick showed IntelliJ behavior on implicit method.
    • The presentation started with REPL. One new thing I learnt was :sh to invoke shell command. Other cool things on REPL were also presented.
    • Default value for case class. Interesting example on copy that comes together with case class. I should give this a try.
    • tailrec annotation to make sure that the recursion that we think tail recursive is indeed tail recursive. I have used the annotation in the codes inside this blog though. Fibonacci and Factorial were used as examples. Classical ones. 
    • Nested package.
    • @specialized
    • continuation, scarry things, to get a way from this unless you know what you're doing, too advanced.
    Nothing much to say , the demo was just great.

    Scala Collection and Parallelization by Martin Odersky
    That was not the title. The real title is more enigmatic: Future Proofing Collections from Mutable to Persistent to Parallel. But it's actually this: scala collection and parallelization.

    Martin started with the slide that he called "If I have to keep one slide, this is the one". Basically, he explained his concept of scalable language that can be agile, but type safe and performant. This looks contradictive, but  this can be reached by combining Object Orientation and Functional Programming. 

    The focus of the presentation was on Collection, because Collection is heavily used in the codes, and a lot of problems are on the collection. Collections in Scala, especially in 2.8 are:
    • Object oriented
    • Generic
    • Persistent
    • Higher Order 
    • Uniform Return type principle: Function should return collections of the same type as the (left hand side) operand. That is, map of List should return the same class of List. If it is set, it should return the same class of Map, and so on.

    Concurrency is hard. Actors and STM are two good tools to address concurrency, but they are not enough. In concurrent world, for safety and performance, immutable collections are needed. That's exactly what Scala proposes. Scala proposed immutable collections, it also proposed parallelization (well, not in 2.8 though).  

    He took an example coming from a paper of Lutz Prechelt that compared several programming languages using a case study of phone code. Martin showed how Scala implemented the solution of phone code. It was a code of more or less 30 lines of codes, excellent. To take advantage of concurrency, the collection used in the example can be changed to parallel computation and par method can be used. 

    Bit Rot is Dark Side

    So, Scala was, at least around 18.30 Antwerp time, was good. But, there was a dark side: the implementation of Uniform Return Type principle. It turned out that implementing the principle represented an important challenge. The duplications are needed. The function filter, for example, needs to be reimplemented in every class. What a mess !

    We then reached something Martin called as Bit Rot: lots of duplication methods, inconsistencies, and broken window effect. 

    Scala does not let you down on this. Martin Odersky presented the solution of higher kinded types that one could encode in Scala, thanks to implicit construct of Scala. Martin Odersky presented how this encoded, especially in the Scala collection code -- to implement Uniform Return Type principle.

    The complete solution included in the presentation was very hard to grasp (I might want to blog my understanding on the solution some day). I was just wondering, why Martin Odersky insisted to present this at the Devoxx conference. Shouldn't it better for him to get into detail on parallelization of par and parallel collections instead ?

    But anyway, the dark side of his presentation gave me a homework to do. Hopefully I would be able to solve it on time, right Professor Odersky ?

    Frites et Mayo
    The day terminated -- for me , there were still couple of BOFs running in the evening -- with Frites et Mayonnaise to consume. Great ones, with very long queues for everybody. Before the conference, I could not imagine to be in the same queue as Mark Reinhold (and yes, he was 4 people behind me, not sure he got the frites though).

    Play Framework Meet-up at Axxess
    Before going back to the hotel to web-caming with my children, I took some times to attend Play Framework Meet-up at Axxess. Met couple of interesting people there, of course Peter Hilton and Nicolas Leroux, and somebody from Alfresco who will present Activiti tomorrow (18 November). They are nice people.
    I have played with Play couple times ago, and I think of playing it again after the conference. 





    Wednesday, November 17, 2010

    Notes on Devoxx 2nd day (2 - Evening)

    The reason I cut the notes into two was because there were a lot of things to say on evening session. So, here they are.

    Scala BOF
    In my previous post, I said that I did not like to bash other programming language -- after all, for a programming language lover, you can't do that. But,...., I could not avoid comparing  Groovy and Scala BOFs. The latter had simply double in term of audience... It could have been even better if the place was bigger, some people simply could not join the room anymore.

    All right. Dick Wall and Bill Venners were there, and in addition, Martin Odersky was there too.

    • Help with implicit. Martin said that IntelliJ showed and helped with code completion. Great, maybe I would give a try then.
    • Code completion for implicit in REPL. Why not ? Technically feasible, just the question of resources.
    • Code completion in REPL again: why didn't it work for jar loaded by :cp ?  No answers.
    • Compilation time. Martin said we should expect that compilation time would not be drastically improved because the Scala compiler did a lot of things, especially on type checking, type erasure, translation to Java, and so on. The best we can expect is reducing by order of two. Otherwise, one may think of using better build tool, why not sbt. Otherwise, a classical answer: Scala will pay off with less codes (well...).
    • SBT news : well, it's fine. on going.
    • Debugger: anybody could tell the experiences on debugger ? No answer (I myself did not use debugger for my project, I used  plain old println).
    • Scala ecosystem, industry adoption: Scala works with Java very fine. Industry started to adopt Scala. Play! was mentioned a couple of times. Dick Wall mentioned an interesting project called Squeryl, ORM in Scala. Bill mentioned about ScalaTest. The biggest missing things are tooling like Check Style, PMD, Code Coverage.
    • Binary compatibility issues on 2.8 when migrating to 2.7: yes, it would be taken into account so that it never happened again. But, unfortunately, it will likely to happen on 2.9 with the introduction of parallel collection. Martin considered that should be acceptable for a language like Scala that is still relatively young, he compared with Java Collection that will be final hard to change.
    • (A small incident happened when we discussed this, somebody somewhere outside the room yelled noisily ... didn't know to whom ....  We took that a joke: that must be a Java Collection developer)
    • Tooling again: check style. Dick Wall told that FindBug might work although it complained too much by default at the moment.
    • Best practice: it is important, to control the variation of codes, especially in big company where one code may move from one team to another. However, it might be challenging, given two trends so far: Scalaz trend and Java-ish trend. But, the three agreed to consider this issue as important.
    • Training. Scala did have some good trainings, Martin, Bill, and Dick are Scala trainers. One of the trainee was there too. 
    • Big company behind Scala. Martin said that there might be advantage of having that kind of company, but Ruby and Python were the two cases that both did not have big company behind them, but they did succeed.
    • Code generation tool, like Grail for Groovy, Rail for Ruby. Dick argued that those two examples might not be applicable for Scala. The thing is to find other killer-app for Scala, and it's not necessarily a tool like Grail or Rail. The killer app might be Akka or Lift (Play! framework was again mentioned)
    • 2.9 version. Parallel collection.
    • Target of Scala, only JVM ? No .NET as well (funded by Microsoft directly). Is there any limitation imposed by this choice? Type erasure for example? Not really. Type erasure is not a limitation on itself either, only .NET does not erase type. Manifest introduced in 2.8 should be helpful for type erasure.
    • Scala Solution. Scala solution is intended to provide professional development tool (again, tooling) and for longer future maybe on middleware. 
    • Is there any design that you regret ? (Long reflection by Martin..., I expected XML literal, but not..., it was :) postfix operator.
    Like Scala Lab in the morning, this session was fantastic. Thanks to the three people. Scala Rocks !

    Hazelcast by Talip Ozturk

    It was an interesting presentation on Hazelcast by Talip Ozturk. He is a nice guy. 
    • How share data structures (map, queue, list) across several JVMs
    • Multicast is used to discovery, followed by TCP/IP communication.
    • Security: data can be encrypted.
    • Demo on how map updated in one node affected other node.
    • Also on locking/unlocking
    • Interesting presentation on internal mechanism of Hazelcast.
    • Executor service is useful to avoid unnecessary distributed computation.
    • Update are synchronous.
    Pretty interesting things Talip presented here. It was very powerful tool, but damn simple. Maybe I'll have a try. 



    Notes on Devoxx 2nd day (Morning - Afternoon)

    Scala Hands On by Dick Wall and Bill Venners
    This session is my favorite session so far. There were not so many new things I learnt from the session. After all: it was a very short course on beginning Scala.

    Couple of things I learnt  at the session though :
    • I always took for granted things like args.foreach( s => println(s.reverse)) . I did not notice that args.foreach(s => println(s.reverse()) did  not actually compile. After the session  I knew why.  
    • extending case class came into my mind several time during my project. I didn't know whether it worked. At the class, I saw that a case class extends another case class led to deprecation warning: it simply wont work smoothly.
    • I discovered some limitations on code completion in Scala REPL.
    Overall, the session was very entertaining, nobody wanted a break ! It was just great. The duo Bill and Dick was just fantastic.  I also took advantage of observing Java learning Scala. It was a good experience, especially for my talk on Scala at the office soon.

    By the way, the lab was full of people. I didn't come late, but I sat far behind, got difficulties to read what was on the screen.

    Lunch
    It was a pretty quick lunch with Pasta and Shrimp. It was great ! But after 3 hours at the lab without power, the priority was to plug my laptop. So, short lunch.

    Programming in Pain by Enno Runne
    It was a quickie session by Enno Rune on Java and Scala comparisons. To be honest, comparing Java and Scala to tell that Scala is superior is not that useful and not the approach I support when promoting my favorite programming language (Scala). Why should one bash Java to show that Scala is great. Scala IS great. That's all.  No need agressive comparison to Java. No need to tell Java is painful to live with once you knew  Scala.

    But anyway, the content of the presentation is actually quite great. You can have it here:  
    http://www.slideshare.net/ennorunne/programming-in-pain

    HBase by Michael Stack and Jonathan Gray

    I have some serious difficulties on understanding the session. I hope my notes here make sense:
    • HBase is based on Hadoop, hence HFDS.
    • Based on Big Table paper on Google that everyone should read, even Michael's son read that (I tried to read the paper between session, I realized how brilliant Michael's son would be).
    • Some names on HBase customer, the most popular one of course is Facebook.
    • But HFDS lack of random read/write capabilities, and HBase is basically adding the random read/write capbilities to Hadoop. 
    • The read/write is achieved by writing items and then compacting later (??)
    • Guarantee: atomic row writing, regardless the number of column families.
    • Compared to Relational Database, HBase is more up-scalable, up to PBytes of data. Best when there are big number of columns, and still perform well in presence of sparse columns.
    • HBase data is structured in rosw and column ,  like Relational DB.
    • A row has a row key ~ primary key and the table is ordered by the row key.
    • Each row has timestamp that is provided by HBase, but application can also use their own timestamp.
    • HBase may store multiple rows of data.
    • HBase provides a shell, - JIRB based.
    • Unfortunately no demo of the shell because of network problem.
    • But, some examples of create table, get, put, scan , ..
    • Syntax is Ruby heavy.
    • Then, a long-heavy-boring-hard-to-understand presentation on HBase architecture, but ... anyway.
    • Some terms: Region ~ Partition, RegionServer.
    • Region split mechanism and recovery.
    • Back to programming again: Java API presentation: create table, put, get , scan, mapreduce
    Pause. 10 minutes after pause, session restarted, but with less than 50 % audience. To be honest, even the presentation content was good, there were a lot improvement needed to make the contents understandable easily.
    • Interesting API on concurrency: locking, checkAndPut, increment, filters.
    • A little presentation on deployment: nothing really interesting though.
    • Monitoring with Ganglia was mentioned. JMX too.
    • More on backup, tuning.
    • Compactions can be configured.
    Overall, it was quite informative session, but it was somehow quite hard to grasp. Maybe the subject is difficult, maybe the presentation was not that clear. But, yes, like other no sql things, this one needs to be explored further.

    Double Pasta
    Since I planned to stay until 9, at least, I went down and found that some pastas were left. Yes, they were the same pasta as for lunch. It didn't matter. I re-took one, with some clementines, and hop, ready to go till late evening.

    Java EE Tooling by Ludovic Champenois 
    The idea of the presentation was comparing three IDEs: Eclipse, IntelliJ, and of course NetBeans (Ludovic is one of GlassFish architect) in term of their supports on Java EE 6. Java EE 6 is btw, annotation based EJB/Servlet/CDI/JPA 2.0. 
    So, he demonstrated features like code completion, contextual help, menus, wizards in the three IDEs for the supports of JEE previously mentioned.  It was pretty cool demo. He clicked here and there, opened wizard here and there, launching http page here and there. Well, pretty cool to see. 
    At the end, he showed a comparison matrix on the three, and NetBeans was the best, followed by IntelliJ, and finally Eclipse that had no support of CDI at all.

    O, yes. He showed some scarry long annotations of JPA. What was that ??? It was very fast and I was at the last row at the cinema. But it was kind of 100 lines of JPA annotations out there. 

    Excel on JVM by Peter Arrenbrecht
    Peter presented his tool formulacompiler  http://www.formulacompiler.org that compiles Excel to Java byte codes. 
    •  Excel is an excellent tool for modelling, but not for computation. 
    • So, let the user model in Excel and compile the Excel to Java classes, and then you can build any application from that.
    • Cool idea, isn't it? 
    • The result of compilation is a jar file, so that you can use it in any applications without reparsing the excel or the java code file.
    • Some profilings against Apache PIO was shown. Pretty cool indeed. 
    I wouldn't say that it was my favorite presentation. But Peter did present the subject  well. 

    That was my morning and afternoon sessions.