Working with the Data

AP Harvester collects an ever-growing data entry log of form submissions. Every time a user submits a Harvester form, a new row is created in the form's entry sheet. This provides transparency and a clear log of your data collection process, but with certain schemas (particularly schemas with an index) it can make the entry sheet a bit more difficult to work with.

If you've set up a Harvester project and entered some data you'll see that your entry sheet contains two more columns than you may have expected. Harvester automatically injects a timestamp for the entry and a record of the user who entered the data (the latter is 0 if the user was not authenticated). After those first two columns, each column corresponds to the column entries in your schema in the same order that they appear in the schema tab.

If your dataset does not use an index then that's all great; you're keeping track of form entries and you will have one row for each entry. However, if your dataset does use an index then your entry sheet gets a little more complicated. If you enter data for the same index value twice then you'll see that you still end up with two rows: one for each form submission, like normal. But we know those two rows correspond to the same entity; that's what the index means after all!

This is where it can be helpful to export the data from Harvester rather than working with the entry sheet directly. You can export a dataset by clicking the "Download" button next to the dataset headline in the form. When Harvester exports a dataset it will only include the most recent entry for each index value, so you'll get the most up-to-date representation of your dataset.