Almost live blogging – PHPDay 2011 in Verona – day 2
Another night out with some great guys! Today is the second day of PHPDay!
And today I promise I will be attending more sessions!
Here’s an impression from the dinner.
How your business can benefit from Symfony2
No I’m not going to review my own session. I only want to link to the event on joind.in where attendees have hopefully given some feedback and to SlideShare where you can find my slides. I decided to have a lot of text on my slides so it makes sense for reading without listening to me. I hope this works out for you.
Large-scale data processing with Hadoop and PHP
David from Bitextender talks about MapReduce using Hadoop as a technology to process large amounts of data. As todays computers can process data faster than reading it you have to care about I/O.
In 2004 Google came up with a concept on how to distribute large data in smaller digestible chunks across a network for processing. These chunks are actually replicated to other machines to take over if the original machine fails. One replication will be stored on the same rack to be able to take over fast and another is located as far as well as possible in case of fires and such. Distribution is done via BitTorrent.
First a mapper will chunk your data randomly. Next a reducer will group the result logically (i.e. by an IP address in case of an Apache log file) to forward logical groups of data to the processing units.
Hadoop is a framework that lets you concentrate only on writing the mapper and the reducer parts and provides the magic around it. Hadoop is i.e. used by Facebook in combination with Hive that provides an SQL like interface to the processed data.
Although Hadoop is a Java technology you can still use PHP with it as it supports mapping and reducing via streams. You can use Davids HadooPHP for that.
Lessons learned after 10 years with eZ Publish and the road into a mobile future
ez Publish is now over 10 years old and has its roots in PHP4. It is content centric following a document oriented storage strategy and it’s mostly OOP code.
Paul said that ez Publish regards the editor as a write of content but not a designer of content. Content is saved in XML rather than HTML ad the editor is not meant to do any design work on his contents. This strikes me as odd as ez is used foremost in the media business where print editors are working who are used to define the look and feel of their contents in print magazines…
Content types like articles can be defined without any programming involved but there is a bunch of predefined content types already available.
Contents are automatically versioned and depending on how you shape your workflows there can be intermediate states like drafts and waiting for approval.
There’s deep integration with Apache Solr for finding contents especially in the backend and of course there are image editors and rich text editors by now. Latest edition might be ez Flow which is a kind of layout editor that allows you to build page layouts from content elements.
Paul went on with a demonstration of ez in Action and all seemed to make sense. But in the last few years I had so many issues with this software that I simply don’t trust ez. In my perception most of the backend workflows provide a bad user experience making an editors life less efficient.
Additional to that Paul explained that ez has a lot of performance problems when growing to a huge number of contents i.e. user generated contents or high traffic. The common answer to that so far has been caching on many levels. It therefore has a maximum of about 1 million objects (this includes users). It’s just not build to scale.
Planned for the future are mobile integrations and a new content repository structure that actually does scale and supports different persistence layers like RDBMS, NoSQL and much more (slides mentioned hadoop at this point though I can not see how that works for storage..).
Why MVC is not an application architecture
Stefan started with a little travel back in time as far as 1979 when the Model View Controller pattern was actually invented way before the internet even existed. The point he wanted to make is that the inventor of the MVC pattern couldn’t consider the internet as it simply didn’t exist back than but was created 11 years later in 1990.
Stefan went on to explain by some demonstration code that the whole point of MVC is about multiple views to one model.
The major difference on the web is that views are remote and we have no idea of changes in the view (user interaction). Of course you can do a lot of Ajax but that can not possibly cover all the interaction. Additionally the view is supposed to observe the model but that is not in reach and the view (rendered HTML) can not be notified (unless you think of Comet but again that can not cover everything).
Starting from a classic MVC illustration Stefan added common components that can be found in most frameworks like a FrontController, Routing, ModelFinder, multiple Views and multiple Controllers. If you put it like that it’s easy to see that the original MVC pattern simply does not exist in a web application.
Instead everything is talking to every other thing apparently and sold by the framework manufacturers as an MVC pattern which it is not and can not be.
Instead it is important to distinguish between domain logic, application logic and presentation logic and to understand that the model is not about data access and should not be. Instead the model is about logic as well.
M is where the action is!
Varnish in action
Reverse proxies like Varnish are there to protect your servers.
After installation Varnish will only act on caching headers so it can fetch stuff from the server upon request or if it was able to cache stuff before it can return directly from cache without speaking to the server.
Thijs presented some useful tools like varnishlog, varnishtop, varnishncsa and varnishreplay that help you to gather information about the traffic on your URLs. You can even connect via telnet and change settings such as adding rulesets on the fly without restarting the varnish server.
Internally there is a simple workflow receiving the request, checking if it is cacheable, reading it from the cache or fetching it from a server and returning the response. You can use rulesets to hook into each step of this flow. This enables you to manipulate headers, ignore cookies or trigger stuff. You can even do load balancing with this.
A good strategy is for example to use stale responses from cache when the backend replied with an error.
If you cache very aggressively you want to purge the cache sometimes to allow fresh content to come through to the user. You can do that from the backend server by sending a purge request to the varnish instance which will then request that specific URL from the backend the next time a user requests that URL instead of reading it from the cache even if the caching time is not yet expired.
Varnish scripts look really straight forward and easy to learn.