Title: Data Mining, Big Data, and Data Ownership
Date: Tue Aug 2, 2016
Time: 8:00 AM - 9:40 AM
Moderator: N/A
'Spatial Discontinuity Analysis' a Novel Geostatistical Algorithm for On-farm Experimentation

Traditional agronomic experimentation is restricted to small plots. Under appropriate experimental designs the effects of uncontrolled environmental variables are minimized and the measured responses (e.g. in yields) are compared to controllable inputs (seed, tillage, fertilizer, pesticides) using well-trusted design-based statistical methods.

However, the implementation of such experiments can be complex and the application, management, and harvesting of treated areas might have to be done manually or with specialist equipment. Furthermore, these experiments only compare treatment performance over a relatively small area and the same relationships might not apply over larger management zones, or fields. In addition, the small area of the experiments might limit their precision.

These problems have motivated a number of researchers and farmers to consider field-scale experiments which tend to be based on systematic rather than randomized designs. Systematic experimental designs enable different treatments to be applied relatively simply using farm equipment. However, design-based statistical methods are not applicable for the systematic designs and correlation amongst the observed responses can lead to exaggerated estimates of the significance of any observed treatment differences. Therefore, geostatistical or model-based methods must be used to quantify and account for this correlation.

In this study the statistical analysis of such systematic experiments is considered with reference to a trial where the fertiliser nitrogen rate was varied on an arable field. The response variable was a vegetation index derived from an aerial photograph. The magnitude and significance of the average treatment effects across the experiments can be calculated if the data are represented by a linear mixed model which includes spatial correlation. If the spatial correlation is neglected then the confidence intervals for the treatment effects were erroneously small. Furthermore, a novel analysis method referred to as spatial discontinuity analysis (SDA) is proposed. SDA was used to focus on the boundaries between different treatments and to test whether there was a significant jump in the response variable where the treatment changed. In this context, the spatial correlation was advantageous since, in the absence of treatment effects, the expected differences between adjacent observations was smaller than if they had been uncorrelated. Therefore, the local treatment effects could be more easily distinguished from the underlying variation of the response variable.

Benjamin Marchant (speaker)
Sebastian Rudolph
Roger Sylvester-Bradley
Length (approx): 20 min
Translating Data into Knowledge - Precision Agriculture Database in a Sugarcane Production.

The advent of Information Technology in agriculture, surveying and data collection became a simple task, starting the era of "Big Data" in agricultural production. Currently, a large volume of data and information associated with the plant, soil and climate are collected quick and easily. These factors influence productivity, operating costs, investments and environment impacts. However, a major challenge for this area is the transformation of data and information (collected in the field) in applicable knowledge. Within the context of Precision Agriculture (PA), which comprises a set of tools and technologies for georeferenced data collection to understand and manage inherent spatial variability within crop fields, the Brazilian sugarcane industry lacks results to assist farmers. The hypothesis of this work is that with the knowledge of the spatial variability of soil fertility and crop productivity, through the application of data mining techniques, it is possible to assist sugarcane producer in the correct management of the crop. Two areas cultivated with sugarcane, with 10 and 30 ha, were monitored over the years 2012, 2013 and 2014. During this period, soil sampling was taking annually (117 and 107 points, respectively) and yield maps registered using a yield monitor. Using a computational environment created to support sugarcane agricultural research, data acquisition, formatting, verification, storage, and analysis of the principal component analysis (PCA) and decision trees for knowledge extraction were performed. The results show that a major factor for variation of sugarcane crops yield is related to texture, the amount of organic matter available and soil pH. Where there was an increase in the levels of organic matter from one year to another there was an increase in capacity cation exchange (CTC) and greater availability of Potassium and Phosphorus. Based on the knowledge rules by a decision trees analysis, it is possible to created specific management zones in the field that support the grower in a decision making. With the expanded dataset, we expect to recognize relevant patterns that are reproduced consistently across distinct experiments, assisting producers in the correct crop management to improving the profitability of production.

Paulo Magalhaes (speaker)
Length (approx): 20 min
Surplus Science and a Non-linear Model for the Development of Precision Agriculture Technology

The advent of ‘big data technologies’ such as hyperspectral imaging means that Precision Agriculture (PA) developers now have access to superabundant and highly  heterogeneous data.  The authors explore the limitations of the classic science model in this situation and propose a new non-linear process that is not based on the premise of controlled data scarcity. The study followed a science team tasked with developing highly advanced hyperspectral techniques for a ‘low tech’ sector in which non-adoption by farmers is a significant risk. Hyperspectral imaging creates multi-layered, geo-referenced data early in the science process in superabundance.  This data is created at high speed in near real-time and does not require expensive ground sampling.  The data is extremely versatile and has the potential for many different measurements from one record. These data traits increase the likelihood of producing ‘surplus science’, that is, science that exceeds what was judged necessary to solve the problem as defined at project launch. The production of superabundant and highly versatile data early in the science process increases the possibility of discovering new forms of valuable knowledge (methods and solutions) during the course of an investigation. However, realizing the value of these opportunities requires a departure from the classic science model. Under data-scarcity conditions, such surplus science would be classified as undesirable ‘project creep’. In response we propose an alternative process based on a non-linear, iterative approach that utilizes heterogeneous actors to refine value from hyperspectral data. The paper documents how a ‘big-data’ setting generates surplus science and unexpected value possibilities. We outline the challenges that science teams face if they are to realize these possibilities. These challenges include the linearity of project design and set up, which limits the ability to identify unexpected opportunities and re-organize in response. Moreover, the science team may not have either sufficient time or appropriate expertise to exploit an opportunity. In light of these findings, it is proposed that for innovation in the PA sector to make the necessary rapid advances both technically and in terms of adoption, changes are needed in the way research projects are funded and structured. In addition, we suggest changes to the make-up of science teams and the inclusion of a variety of end-user perspectives during the research and development process.


Megan Cushnahan (speaker)
Russell Wilson
Length (approx): 20 min
Key Data Ownership, Privacy and Protection Issues and Strategies for the International Precision Agriculture Industry

Precision agriculture companies seek to leverage technology to process greater volumes of data, greater varieties of data, and at a velocity unfathomable to most. The promises of boundless benefits are coupled with risks associated with data ownership, stewardship and privacy. This paper presents some risks related to the management of farm data, in general, as well as those unique to operating in the international arena.  Examples of U.S. and international laws related to data protection also are provided.  Finally, best practices in drafting agreements that can assist in managing risks relevant to those companies that presently operate or plan to expand their business to areas outside the United States are examined.

Joan Archer (speaker)
Length (approx): 20 min
Ownership and Protections of Farm Data

Farm data has been a contentious point of debate with respect to ownership rights and impacts when access rights are misappropriated. One of the leading questions farmers ask deals with the protections provided to farm data. Although no specific laws or precedence exists, the possibility of trade secret is examined and ramifications for damages discussed. Farm management examples are provided to emphasize the potential outcomes of each possible recourse for misappropriating farm data.

Paul Goeringer (speaker)
Ashley Ellixson
Length (approx): 20 min