Friday, May 27, 2016

Let's get it started!

Hello readers!
So the community bonding period has finally ended. It was a fun experience discussing ideas and designs with Florian, my mentor. We had numerous meetings with intense brainstorming involved, specially regarding the design part.

First let me share one of the design board I made over these meetings - https://docs.google.com/drawings/d/1b79uUHsDd1WHZQ9sZsVLTcra0eW0CA6Qc1h3ySpRY0E/edit?usp=sharing

OK so let me give you some context. I am building a message queue based archiving interface. For me, as I have been investigating various backends to perform the message queues part, implemeting message queue part is not really a challenge anymore. Infact I had developed prototypes for some message queue backends(RabbitMQ and ZeroMQ) while preparing my proposal and researching about project. During the community bonding period, I revisited RabbitMQ and found that it perfectly fits the needs, and is fairly simple to use and integrate. Thus in our first meeting, me and Florian decided to go with RabbitMQ as our primary backend(ZeroMQ will be looked into once we are done with this).


Now the challenging part remains understanding the minor details about how the archiver interface is loaded, what all functions it has to perform, and how it can be made to function under varying configurations. Also the configurations can be saved in ini style files, and these may be loaded and used on the fly. BASICALLY HOW WE ARE GOING TO INTEGRATE IT WITH CURRENT MAILMAN SERVER.

To tackle this, I spent days going over the codebase multiple times, and trying to come up with how the message queues part shall interact. One nice suggestion by Florian proved very fruitful to me was to go about it in a testing module manner, i.e. think of all the tests our interface should be subject to, and what should be the input and expected output of these. This led me into researching about unittest module, tox, and reading pre-existing tests for current archiving modules. This led me to a more firm understanding of the interface design.

Next, we actually discussed design. I had some things in my mind, but over irc it gets difficult to express those ideas. This led me to making diagrams explaining my ideas, and also allowed Florian to ask me his doubts about my ideas, and also provide suggestions. After two or maybe three meetings of intense discussions, we finalised on a simple architectural design(calling interface each time for each archiver) that is practical and clean. But still I have a solution in my mind(using publish model) which has one flaw, and I shall try my hands at it once the simple design is implemented.

The coding period has already begun, and as of writing this, I have made the interface python package, installed it in my virtualenv, and it shows up in postorius. It also has a bit dirty code for the message queue part, but as I mentioned, that isn't a problem.
Hoping to rock this project!
Have an awesome Summer!

Tuesday, May 3, 2016

The Proposal - Explained

Hi
Now let's get to what I am going to work on over the summer. As mentioned in the last post, this one is going to elaborate my understanding and ideas for the project.
I applied for the project Message Queue based Email Archiver under the org GNU Mailman. GNU Mailman is an open source org which develops and maintains Mailman, the mailing list management system. Mailing lists are used in more places than one might think about to begin with. They're probably being  used in the institution and the organisation(s) you're enrolled in, in the promotional schemes of big corporates and so on. You just might be a member of one of those without even realising it (Think of some spam that you receive!). What mailman does is to manage these mailing lists, their moderators, owners, banned lists and so on. Also it maintains a server to which all the mails to registered mailing lists are directed, validates that email, puts it on hold for moderation by list admins if required, calculates the recipients and delivers the mails into their inboxes. Wait, another thing, it also has an option to archive the mails it sends out to list subscribers, and thats where my project steps in :) .

Right now, the archiving is done something like this -



Plugin.pngWAIT! If you're freaked out by the complexity, worry not. I'll explain in broader sense.
The mailman server and the archive servers are currently separate, i.e. decoupled, which is a good thing, keeping separate things separate as we can.
A mail to can be archived by to different archives by their own archive specific methods(like a POST request, or a mail( inside a mail :D ) to the archive server.
Despite such a neat design, we have some issues here. Firstly, currently the POST request method isn't secure enough, thus archive server and mailman server need to be in the same subnet(or even local machine). Secondly, what if our archive server decided to go off to sleep for a teeny weeny lil second. BOOM. message lost, an error code received, no retryability(right now). Enter the scene message patterns.  These beautiful architectural solutions provide some fascinating ways in which machines can pass messages among themselves. Also, these offer options for a queue of messages to be made between our mailman server and the archive server, where messages can be stored for the archive server to consume as and when it can(in an asynchronous manner). Another thing, we can attach as many archive servers, or even other possible web apps to these queues and not bother our mailman server. Also, the issue about whether our mail was received by the archive can now be handled by our message queuing system, saving out mailman server quite some headache. These are just some of the possibilities that message patterns bring in.

My project implementing an interface to accomodate such message queue based systems in our current system, and then subsequently implement message patterns in various backends. The choice of the backends here is dependent on what features it provides. I investigated some backends during my study about the project, and found RabbitMQ quite good for the job. Also ZeroMQ and Redis  were also found to be possible good candidates. A small noting made made by me on these  in my proposal is as follows - 
 "
  • RabbitMQ - It also offers flexible and customizable routing options between publisher and message queues. This could be helpful to publish to select subset (based on mailing list) subscribers listening to our publisher. Also its reliability feature suits the requirements in our context as it provides both acknowledgements as well as personalised queues for each subscriber.
  • ZeroMQ - Unlike others, it can run without a dedicated message broker. It basically provides web sockets which can be configured to customize message patterns, and can thus be used to suit specific personal needs. Though highly customizable, it might require explicit implementation of a few features.
  • Redis - Though it started with some initial confusion about lack of any reliable way to message transfer after I approached the redis community for solutions, I subsequently found quite a nice solution in this blog post. It offers features like persistence to disk and message reference queues per subscriber, although no approach was found to provide retryability in case of failure.  Thus , it is a lightweight broker system which can be used to provide reliability. 


Following was the timeline I proposed after much deliberations with myself, mentors, and some seniors who had worked on such projects before.

Timeline :


Till 10 May
Find out more about Message Patterns and understand Mailman and Archiver Interface code architecture. Get familiar with ‘tox’, the testing system.
10 May - 22 May
Find out more about the various backends available and corresponding message patterns supported. Discuss and finalise various architectural designs for messaging system with the mentors.
23 May - June 1
Implement plugin for IArchiver supporting multiple backends. Test the plugin with dummy messaging systems.
June 2 - June 12
Implement a message queue system for a viable backend. Finish with the minimum viable product.
June 13 - June 19
Write unit tests for the mvp. Ask for community review of the product. Catch up with any leftover tasks
June 20 - June 27
Code submitted for mid term evaluation. Write basic documentation for mvp. Reconsider the architecture for message queues and plugin.
June 28 -July 8
Improvise upon the message queue system based on the evaluation. Research about additional backends and message patterns viable for the job.
July 9  - July 25
Discuss and implement more message queuing patterns suitable. Implement support for suitable alternative backends. Attempt to incorporate list events into the messaging system.
July 26 - August 1
Refactoring Week. Seek community review for existing product and design architecture. Write integration tests.
Aug 2 - August 13
Complete any leftover work. Write documentation. Improvise upon reviews by mentors.Attempt to incorporate extensible metrics(as mentioned above). Write any tests left(system testing). Complete any unachieved milestone.
August 14 - August 20
Tidy up code. Complete documentation. Perform rigorous testing upon system. Submit for final submission.

If I am able to complete the intended tasks as part of the project, I intend to incorporate events such as list creation, moderator/owner assignment and other list events as part of a larger more general archiving system. One approach I feel would be good is to implement a handler/system of handlers which direct such list events to the archive servers.

 I hope I have well explained the objectives for this project. Please feel free to comment here or contact me by email if you have any suggestions/doubts.
Cheers

Monday, May 2, 2016

GSoC - Applied and Selected. Yay!

Hello Readers
The past month has been full of activities, GSoC application, assignments, exams, and desperate waiting for results(GSoC). I'd like to emphasize more upon my application for gsoc, the proposal, waiting for results, and the RESULT.

As I approached the mid of March and GSoC proposal deadline neared(25th March), I came to a realization, a horrific one, I HAD NOT EVEN SELECTED A PROJECT I WAS GOING TO APPLY FOR. Most applicants, and even my friends had finalized their projects and had had lengthy discussions with the mentors. I felt it was the end of my GSoC journey, and felt that yes it was fun contributing to GNU Mailman,BUT my dreams of GSoC were going to be shattered.
As one last measure to save the sinking ship, I went through the project list once more, and found Message Queue based Email archiver. Now this was completely new to me, I had mostly worked in the Mailman core area, not the email archiving part. But with no options left, I went through the whole Mailman architecture, understood where and how archiving is being done and what the project could mean. After this only, I contacted the project mentor, Florian (a wonderful person as I would find out). I did so keeping in mind the little time left, and to make my conversation with the mentor more fruitful, since discussing a project with little background isn't really an effective way to work.

Approaching Florian made me feel more confident about the possibility of still having a chance at GSoC. He seemed interested about my application, and liked that I had done some homework before approaching him. Now it was time for more intense action, next 4-5 days were spent understanding what message queues are, what backends are available and what they offer. This involved going through pycon talks, approaching seniors(Thanks to Nehal and Anhad) who had some experience with this, and reading articles and documentations.
Now with just 3 days left for proposal deadline, I became more anxious, my proposal hadn't even been drafted yet! Also I need to get it reviewed by mentors and seniors who had been through the drill.
One thing I learnt - Writing proposal requires a very clear idea of the project and the existing architecture (that is when you genuinely want to write a good proposal). Thus while drafting mine, I had to again go over a lot of things, and ensure I really had a good idea of things involved. It took me while 2 days to complete the draft, and now it was time for reviews.
OOPS!! The timeline I proposed is crappy! Also I hadn't clearly mentioned the minimum viable product the deliverables from the project. Mentors and seniors pointed out that the timeline I had developed was not practical. It required a whole lot of thinking and asking for advice from seniors to get that straight. Also, I added what I thought should be finally with a thumbs up from mentors, I submitted the proposal and hoped for the best.(That night after the proposal deadline, me and my friends decided to go for a walk, which turned into a long one, longer than what was comfortable, crap! we walked for around 15 kms)

Acads resumed, struggled with assignments, and thought about the results, FOR A MONTH!! That was tiresome. I felt it was indeed fun to have applied for GSoC, and I had worked hard. Now even if I wasn't going to be selected, i would not have had any regrets. It had certainly given me enough experience to crack it the next year, and exposed me to the beautiful world of open source community.

Result day, I was excited and a bit nervous the whole day. My whole group of friends had applied, and we all were anxiously going over the gsoc irc channel for any updates. Finally result time. SELECTED!!! YAY. It was a dream come true. Another friend had also cracked it. I was on cloud nine. Now as the customary celebration on my college for any achievement and birthday is, I was given GPL. Don't ask for the full form, it just means you are gonna receive the spanking of your life by your friends, followed by a juice treat at the canteen.

I would like to take this opportunity to thank my friends who kept me motivated(Motwani, Battan, Chenoo and Sanket), seniors(esp, Bhavesh Goyal, Ghaisas, Nehal and Anhad) who helped me out at various points of this journey, and people at GNU Mailman who accomodated me doubts and discussions and guided me.

Next post, I am going to share the particulars of the proposal I wrote, till then here's the final draft -
https://docs.google.com/document/d/1-ElRY-7IF4IlTqK_h6JAVCOFgJsant_dMTjGJKgxBV4/edit?usp=sharing