Polly/Design

Source Code Hosting

Github - An initial empty project has been created: https://github.com/ppau/polly

License Agreement

Creative Commons 0: http://creativecommons.org/publicdomain/zero/1.0/

Client Side Technology

Generally JavaScript/AJAX style interface.
HTML5. I can help with this - jscinoz. Ab: I can help with some of this as well.
Mobile clients are a non-requirement for now. Maybe a future project. - Brendan.
Coffee.Script
- http://coffeescript.org/
- For simplified more productive JavaScript coding.
Twitter Bootstrap:
- http://twitter.github.com/bootstrap/
- Rich web component framework
Backbone.js:
- http://backbonejs.org/
- Model View Controller infrastructure (MVC) JavaScript.
- Anticipating a very rich/dynamic web application. There may be a lot going on, so we need a good client side model to tie it all together.
Messaging to server is JSON documents in REST
- http://en.wikipedia.org/wiki/Representational_state_transfer
- The message content across REST with its simple query/insert/update/delete model should be as near as possible to the database data formats.
- Validation is on the server side
- A publish/subscribe type design is preferred to allow loose coupling of components.

Communications

HTTPS
WebSockets
- Wrapped in socket.io: http://socket.io/
  - On Server in Tornado: https://github.com/MrJoes/tornadio2
  - On Client in Backbone.js: http://developer.teradata.com/blog/jasonstrimpel/2011/11/backbone-js-and-socket-io
- Provides asynchronous bi-directional communications between client and server over single HTTPS connection.
- Implement publish/subscribe + client cache system.
  - Transparent to rest of client code. Model just delivers data as requested.

Server Side Technology

AndrewD: Still under investigation right now, but ...

Linux

Python 3.2
- http://www.python.org/

Web Frameworks:
- nginx - Web Server and Proxy (http://www.aosabook.org/en/nginx.html)
  - Download from: http://nginx.org/en/download.html
  - Configuring with Tornado: https://gist.github.com/802576
  - Looks to be higher performance than Apache, especially with lots of concurrent connections.
- Investigating: Tornado -> pymongo -> MongoDB
  - http://www.tornadoweb.org/
  - Facebook's "real-time" open source web server technology
    - Clearly designed for rapid response applications with a lot of concurrency and 1000's of clients.
  - Simple model for coding asynchronous web application server logic
  - Ref: Motor extensions to pymongo to support easily working in this environment.
  - Useful set of OpenId authentication methods built-in - http://www.tornadoweb.org/documentation/auth.html
  - AndrewD: Needs prototyping
- Investigated(& rejected): django -> django-mongodb -> mongodb
  - Current thinking is that django is not a good fit to the emerging architecture of Polly.
  - We like MongoDB for it's ability to directly store the JSON representation that we'd already be using in the client, but django imposes a lot of relational style expectations on the approach. django models can technically be extended to accommodate this, but ...
  - It doesn't seem like we actually want a server side MVC, given the MVC in the client and the REST base messaging.
  - Non-flat data models don't work in django forms.
  - Not comfortable with the style of django - it seems to spread even the logically simplest of web application code across way too many files for easy comprehension.

MongoDB
- http://www.mongodb.org/ - 64-bit version only please.
- http://openmymind.net/mongodb.pdf - A nice kick started intro.
- Recommended viewing: http://www.youtube.com/watch?v=PIWVFUtBV1Q
- Investigating: "Motor" = MongoDB Tornado. Ref: Tornado Web Framework above.
  - In newer version of pymongo by 10gen(authors of MongoDB) at: https://github.com/ajdavis/mongo-python-driver/tree/motor/
  - This would allow very high performance asynchronous web request processing.

Database Comparisons

Thinking about NOSQL options.
ref: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

What's wrong with MySQL? - Chris

 AndrewD:
 The problem is not so much MySQL as SQL in general. An entire job description (DB Administrator) exists because of what is wrong with SQL.
 Coding to SQL is not hard and relational theory is all well and good, but the administration inherent in the whole approach just sucks.
 Why do they have to be so rigid? It's built on 1960's assumptions of waterfall style development practices, upgrade cycles and planned down-time.
 SQL started out with a goal of making databases accessible to non-programmers, but that never really happened and the programmers end up writing queries in a declarative language, then reading the query optimizer results to figure out why it's not implementing things the efficient way they imagined it would when they wrote it, instead of just describing what they want to happen. It's stupid.
 Then, when you want to roll out a new version of your application, you have to take it all down and convert everything in a carefully planned series of rehearsed activities.
 Then there's scalability. SQL just sucks at that. You have to design distribution in from the start, or else a successful application runs into a massive roadblock, but designing it in fron the start just massively complicates your original development.
 ORM is just an attempt to reconcile the object based way we like to work in our applications, with an archaic old way of storing things.

Likely
- Mongodb
  - Python & JaveScript, JSON/BSON, django interface
  - Replication, HA, Sharding built-in
  - Map-Reduce
- Neo4j
  - Looks like a good fit to the style of data relationships we probably want.
  - Java & Ruby primarily, but has Python interface (neo4j-embedded)
  - Full ACID conformity!
Not Likely
- Riak - Multi-site requires commercial license. Crystal ball suggests this could be bad for us.
- CouchDB - Pre-defined queries and limitations on data format change are show stopper for me.
- Redis - "Should fit mostly in memory". Scaling issue right there.
- HBase - Hadoop based (Java & Ruby). Sledgehammer - requires its own file system.
- Cassandra - Strongly Java centric, "Bloat and complexity". No thanks.
- Membase - Seems very performance centric, scaling and distribution, but doesn't look strong on data integrity (responds to client before writing)

Encryption

HTTPS for all requests - Brendan. +1 - Chris
I'm technically OK with HTTPS for everything, but it costs a lot of server processing time or front end equipment. Maybe we can design with configuration that says whether to turn on HTTPS for read/write/both - AndrewD

Authentication

Quoting: http://en.wikipedia.org/wiki/Multi-factor_authentication
"Existing authentication methodologies involve three basic “factors”:

Something the user knows (e.g., password, PIN);
Something the user has (e.g., ATM card, smart card); and
Something the user is (e.g., biometric characteristic, such as a fingerprint).

I'm recommending a loose kind of 2-factor authentication.

Factor 1: UserId and Password (Something the user knows)
Factor 2: External email account (Something the user has)

Would be associated with their account, but not exposed to other people.
A daily(or on specified frequency) log of account activity would be notified to the users email account.

This strategy keeps the simplicity of regular user logon for access, but applies our principle of "Self Evident Integrity" to ensure any breach is rapidly known and corrected.
The system becomes self-healing.