September 23, 2010
One of the inevitable and immensely positive side effects of being the Community Manager at Basho Technologies has been taking a keen interest in other open source projects championed by members of our community. One such member is Chris Villalobos.
I first had the pleasure of speaking with Chris some months back after he let it leak that he used Riak to build a distributed event registration system for his church (about which I quickly coerced him into writing a blog post). Chris has since changed jobs and is now an open source developer working at the University of Florida.
The University of Florida is one of eight universities that make up the Southeast Climate Consortium (SECC), whose mission "is to use advances in climate sciences...to provide scientifically sound information and decision support tools for agricultural ecosystems, forests and other terrestrial ecosystems, and coastal ecosystems of the Southeastern USA." Chris is now working on the Open AgroClimate Project which is an extension of the SECC. Open AgroClimate is helping farmers and other providers in the Southeast USA, South America, and soon the world, manage their farming resources more effectively given differing climate conditions using very specialized and soon-to-be open source software.
I interviewed Chris about Open AgroClimate and, more specifically, his role and how he is working to open source these valuable climate risk tools that have the potential to help farmers the world over.
My questions are in bold. Chris' responses are in standard text.
What is the Southeast Climate Consortium and how does it relate to Open AgroClimate?
The Southeast Climate Consortium has been around for about four years with the purpose of creating climate risk tools to help farmers local to the Southeast US. They started a website called AgroClimate which is a collection of software tools built to help local producers manage their resources, such as crops, and assess their risks. It's a project that started as an outreach from different agricultural providers in the Southeast wanting to know things like rainfall patterns and how crops were affected by the area's climate.
To go back a bit further, what they did was to take the work of professors and PhD students in the area of agriculture and wrap code around it to make it actually work. There had been much work in the field which aimed to promote better crop growth through the study of historical weather patterns. They are now building out more crop-specific tools that take into account, for example, how weather patterns will affect their upcoming crop. The result of this work is primarily the interactive tools accessible through the the AgroClimate site.
What I'm working on is known as Open AgroClimate, which is an extension of the AgroClimate project with the emphasis on open sourcing these tools to expand their usage and development. The SECC made the decision to open source these tools last year. They want to get bigger than their current scope in the Southeast and they need more contributors to make it grow. They use R-Scripts extensively in the tools and had seen it succeed as an open source project and reasoned, "Why can't we do it?" Not much has been done outside of just saying, "Let's open source it." And, so, I was hired because I am passionate about open source. I've been focusing on that aspect for a few months now.
What exactly are these tools and how will they benefit from being open source?
The tools are different software programs - currently various, continuously-run algorithms in the form of PHP scripts - that use historical data from different weather stations to calculate weather patterns and then determine how a given weather pattern will affect a different crop in a given area. For example, The Drought Tools show you the risk of drought in your area based on various factors; the Climate Risk Tool shows you what your current climate situation is and how you should account for it moving forward; another example is the Strawberry Disease Tool. This will show farmers how large a chance their strawberries have to get a disease based on the area they are in and what pesticides they've already sprayed. It will also make recommendations on what to do next to hopefully ensure high strawberry yield.
At the moment, these tools (of which there are about ten) are focused strictly on the areas covered by the SECC. We want to expand to other areas. For example, a person in Texas wants to use it to help with cotton growth. By virtue of the tools being open source, he can take these algorithms and the different formulas we are using, plug in his weather data, and modify it for cotton.
We want to take it to other countries, too. Individuals in Brazil and Paraguay are currently interested in using the software. In order to to do that, however, we needed a better development platform. So I put in a lot of time to switch us to the WordPress development platform, which makes it relatively easy for an entry level user to manage. And, by making the source open and accessible, the tools will be easier to adapt to their needs. This is what I'm spending the majority of my time on.
My long-term plan is to expand this to as many countries as possible. I see this having a huge impact in third world countries. Most of these countries have the data lying around but they have no way to apply it in a useful manner to help their farmers grow more efficiently. If, however, a local farmer in Ethiopia were able to access a municipal website and see when it was best to plant a given crop, this would make their practices much more sustainable. And obviously, the more open we make the tools, code, and project itself, the easier it is to spread the usage.
It's a large project and I don't know of any other agricultural climate risk type tools that are open source. This should be able to be implemented in any country, and we feel compelled to make it work as advertised, in an open nature, as there aren't many projects doing anything like this with as much potential and scope.
What language are the tools actually written in?
It's primarily in PHP. We are also using R-scripts to so some of the data analysis and graphics on the backend, which, at the moment are specific to what the SECC is doing. But we are going to use our R scripts to guide others and release this code, alongside the PHP code, entirely open source. I'm also hoping to standardize on jQuery and WordPress for platform development as it's easy for an end user to setup and configure.
What hurdles are you running into?
The bureaucratic nature of academia makes it hard to get decisions made in a timely manner on a project like this. The largest hurdle at the moment, for instance, is simply settling on which license to use. From what I understand, the question of which license to use has been in the doldrums for about a year now. The first thing I did when I arrived was to gather the appropriate approvals to expedite the process such that, at the moment, the license issue is being reviewed by the final committee. What this means is that we should have a decision within the coming weeks.
Intellectual Property is a problem, too. We are coming from the university/academic background, so IP issues are delicate. As I mentioned before, the tools are based on the thesis work of various PhD students. It can be difficult to get people to understand that we are taking what was once their thesis and simply wrapping the appropriate code around it to make it more functional.
Getting people used to open source tools for development has also been a hurdle. Something as simple as Git or Mercurial, which someone like me is quite accustomed to, is unfamiliar to a lot of people. I'm spending my time training a lot of people. Remember, the code is partly being written by scientists and students, not career developers. This also lends itself to code which could be a lot cleaner. As a result, I've been working with our developers to refactor our various code bases before it's released.
So, unfortunately, you're still working on actually open sourcing the code. What will the final license be?
I want to use the BSD License. The MIT License does offer the amount of openness we are looking for, but due to the academic nature of the tools and the code, I would really prefer the language used in the BSD's "Non-endorsement" clause. As for the GPL, for our purposes it's too limiting. I want people to be able to roll this into a commercial product if they desire to do so. Just as long as there is proper attribution, we are fine with it.
Where will the project be hosted?
As of right now, we are planning on using GitHub when the license issue has been ironed out. One of the goals is to raise the number of contributors and awareness and GitHub has proven itself quite capable in that respect. We will also maintain our own repositories on our servers as well.
So Open AgroClimate will be providing the code to help farmers manage their crop risk. Where are these farmers getting the weather data?
Right now, the SECC is getting the data from organizations like the NOAA/NWS, and University has connections with local weather extensions such as FAWN and AEMN. Suffice it to say, there are many data sources.
Unfortunately at the moment the code base is tailored to the needs of the SECC. One of the tools I am currently working on will enable what I am calling "Plug and Play" data usage. When complete, you'll be able to point your data source at it, whatever form it may be in (csv, excel, flatfile, json, etc.). And it's modular, so that if there isn't code already available that will connect your type of data, you (or another contributor) can write a connector, enabling you to parse the data and load it into the primary database where we can process and analyze it with the tools.
Flexibility with data sources is a large component of the project, and it becomes more critical as we move forward because we really will need to be able to parse anything. A while back someone said, "Let's just standardize on XML." You're going to tell me that some weather station in Paraguay is going to be storing their weather data in XML? Doubtful. So the ability to be flexible with data types will be essential to our expansion.
Are you actively recruiting data providers?
It's not a primary initiative. That will come as we start rolling out the infrastructure to places like Paraguay. They will be our first test. They have data sources and types that we've not yet encountered.
Besides an internet connection, what does a rural farmer in Paraguay need to know and do to use your tools?
Thankfully, the way it's set up right now, all the data that a farmer would need is coming from weather stations. End users need only specify where they are and they're off and running. This is how it works with the risk planning tools. In the case of the strawberry planning tool, you input what you've already done to the crop, and it shows you what your risk of disease is based on various factors. It will also make recommendations on what pesticides to use or not to use. So, in essence, they only need to input information that they are likely to already know about their land and farming practices.
For some of the tools we actually have a mobile application that works on various smart phone platforms. It's less detailed but it's still quite useful. And we already know of farmers who take their cell phones into the fields with them to use while they are farming.
I should also note that one of the primary tenets is "Don't Develop in Isolation." We encourage developers to work with the farmers (if possible) when developing these tools such that if the farmers can't figure out how to use them, it needs to be changed. That is where our heart is. Not all scientists or developers think like end users.
How many people are contributing at the moment?
At the University of Florida there are about six. We also have some in Brazil, and several throughout the universities that compose the SECC.
Once all the tools are officially open sourced and more accessible, what types of contributors are you looking for?
Let's see...basic tool developers would be one type. The front end interfaces need writing and converting into different crops like cotton and corn. Support for using different data calculation methods would be great, too. We also need database and backend developers. We are using MySQL currently to store the data but would love to have support for other DBs if they were needed and more useful. We also need people to write the code that makes it possible for any data source to be plugged into one of the tools for analysis. That's the "Plug and Play" code I mentioned before.
Graphics designers would also be a huge win for us, especially as we move towards other demographics. At the moment we are working with one template which has served us well but having more options would be optimal.
Finally, translators are needed. We would like people to be able to see these tools in their locale, and because of the internationalization support of WordPress, it is very possible. The tools currently under development and going forward will use .po/.mo files for translation purposes.
Needless to say, there is a lot of room for contribution, so we want to talk to anyone and everyone who is interested.
Speaking of contributors, what is the best way for a developer to get involved right now?
Right now, the best way for people to get involved is to stay informed of the progress via the mailing lists and forums on the Open AgroClimate site. They are admittedly sparse at the moment, but as we come closer to starting the engines for a release, they will be the hub of communication for the project.