Almost live blogging – PHPDay 2011 in Verona – day 3
I don’t know who is responsible for having the first session start at 9h30 on a Saturday and after a GitHub night out but I’m happy not to have to take that slot.
Today is the last day of PHPDay in Italy. Yesterday was great fun and today will be just as good I reckon.
I didn’t stay at the party until the bitter end to be able to see some sessions. Also it’s always nice to support good people by attending their sessions.
Git for Subversion users
Stefan explained the basic principles of Subversion and how to work with it and compared it to those of Git.
As Git repositories are located on your local machine other than Subversion repositories that are located on a central server you don’t need admin access to that central repository to create a new one.
Git makes it easier to selectively add changes so you can easily make logical commits rather than committing everything that’s changed. And because Git is local your commits are not available to anyone else as there is a difference between committing a change and pushing it to some central (or other) repository. This is especially useful as it allows you to maintain a local history without interfering with your co-workers code before you’re not fully done.
With GitHub it becomes much easier for people not directly related to a project to make changes by forking a repository, committing and pushing a change and then sending apull request to the original repositorys maintainer to pull in your changes after review.
To maintain specific versions of your project you want to use tagging. Committing to a tag (which must be considered evil) is not possible in Git which ensures that these snapshots stay stable.
A special feature of Git is that you can rewrite history using git rebase. This might sound weird but can help to document distributed developments done by many people who worked on different branches. You can also use the interactive rebase to remove stuff from a previous commit i.e. you have forgotten to remove a serial or password from your code with rebasing you can remove this from history without anyone being able to read your passes.
Another cool Git feature is stashing allowing you to put your current changes aside to focus on a bugfix that you stumbled upon, commit the fix and apply your stash again.
Workflows in Subversion are limited as branching is often a nightmare and the central repository structure defines how to work with it. With Git there can be various different workflows implemented like single persons managing the master repo or having lieutenants who manage changes by other developers before accessing the master or many more.
Designing HTTP Interfaces and RESTful Web Services
After years of appending ugly query strings to our index.php files there had to be ways to mak things nicer. SOAP was one attempt which produced more pain that good. Other attempts like the joind.in API tried to be better but used POST all the time which had serious consequences for loadbalancers and proxies but also the clients browsers.
REST (REpresentational State Transfer) comes to the rescue.
There are some constrains to REST such as it is meant for client-server communication, it has to be stateless and cacheable and it has to be transparent as it is working on a layered system with loadbalancers, proxies and so forth.
Most important though is the uniform interface which is again a set of rules.
- A URL identifies a resource
- The URLs have an implicit hierarchy
- Methods perform operations on resources
- The operation is not part of the URL (not really possible as browsers don’t support PUT and DELETE …)
- A hypermedia format represents the data
- Link relations are used to navigate the resources
Nice to be remembered that you should use query strings for filtering collections instead of making them part of the URL structure.
David continues to explain CRUD (Create, Retrieve, Update, Delete) where GET is always used to retrieve something and POST for the rest (as PUT and DELETE are not supported by browsers). You can use the HTTP return codes to make meaningful replies and use Accept headers to decide on the format.
He examined the Twitter API to show its weaknesses and explained why things like not having a users ID in the URL when posting a status is actually a bad idea.
David went on to show some example of how you can use your own XML schemes to include further links to other resources to make clients understand where to go from a resource (i.e. from a product to its category).
All in all an interesting talk and I recommend to fetch th slides when they become available.
Testing untestable code
Stephan says there are only three kinds of untestable code
- Wrong object contruction (new is evit)
- Tight coupling
- Uncertainty (i.e. globals)
In legacy systems you will find a lot of those aspects and the only way to make those testable is to refactor. However according to Martin Fowler you need a test coverage before you can start refactoring which is a chicken and egg problem.
Still we can do something. But we don’t want to change existing code as we could not say what bugs we may introduce.
Instead we can use PHPs autoload facility or the include path to load our own implementations of hard coded dependencies and we can use stream wrappers to mock i.e. file access. We can also replace database servers by proxies or changing your local /etc/hosts file when access details are hardcoded. You can even use Reflection or autoloading and rewriting to change private function public but that should be a last ressort. The whole idea is not to change the dependencies within the code at this point but to exchange the targets they point to.
This way you can remove side effects and isolate the code you want to test.
Stephan also mentioned vfsStream which mocks a filesystem and pecl runkit which you can use to override PHP internal functions like i.e. the mail() function.
In his company Stephan uses Generative Programming to generate (testable) software from configuration and code provided by their customers. So they have an automated way of wrapping client code without changing it and removing dependencies replacing them with mocks. This seems to be a very interesting approach that I will do a bit more research on myself I guess.
NoSQL Databases: What, When and Why – NoSQL Databases demystified
NoSQL still has a great buzz but only few people seem to be able to explain properly what use cases require them and when a traditional RDBS is still the best option. Lorenzo tried to clarify this confusion a bit.
Nowaday we see much bigger data, much more concurrent access to it and we see the data to be much more connected but also to be more diverse. It is physically not possible to deal with these facts on single nodes. And also RDBMS are just no longer able to deal with this and at the same time keep responsive.
But NoSQL databases are not the holy grail. You have to understand their strengths and theirs weaknesses. Instead of doing pointless benchmarks across different implementations Lorenzo takes a step back to the conceptual level. He starts by explaining the ACID (Atomicity, Consistency, Isolation, Durability) and the CAP (Consistency, Availability, Partitions) principles that databases want to follow. But there are conflicts in this principle that have to be considered i.e. Consistency as defined in ACID affects concurrency and therefore Availability. So decisions have to be made which of these properties are most important to your use cases. In terms of RDBMS the focus has always been on Consistency.
To deal with this there are two strategies
- CP – focussing on Consistency (but tolerating in-Availability)
- AP – focussing on Availability (but tolerating in-Consistency)
Lorenzo went on to explain in depth some partition strategies as described in the Dynamo Paper by Amazon and then explained some of the following implementation like Voldemort or membase, memcache or Riak and then Google BigTable and Facebooks Cassandra. He focussed on the partition strategies as they are making all the difference between implementations and they define whether an implementation suits your needs.
Finally he talked about todays favorites CouchDB and MongoDB which focus on high availability for both writing and reading but it does not resolve conflicts. Those are up to you to deal with in the application. Simply put CouchDB is optimized for recurring queries while MongoDB is more flexible but does not reliably persist your data.
Lorenzo shows in great detail the differences of those various implementations and made it clear that only a closer look helps to find the right solution for a particular use case. I was missing some example use cases though.
Finally it all ended
Three days went very fast and now there is only one evening left for me before I will leave for home tomorrow.
Thanks a lot for all the fellow speakers, sponsors and foremost all involved in organizing this fantastic event!!!