28  Data Portal

Author

Nick Lyon

Modified

December 19, 2024

The data portal link is the preferred method for data submission for (at least phase 2). It is written in R Shiny and is built for the phase 2 template Excel file but will work (with some warnings) for the phase 1 Excel template. This is what you need to know to change and/or troubleshoot the app.

28.1 Your Job After Someone Submits Data via the Portal

The data portal puts submitted data in the “App Uploads - Phase 2” folder. The portal would be self-sufficient but I have added a step to require human involvement that I’ll describe here.

  1. You need to move all files from that folder to the “Phase II Raw Data” folder

  2. All phase 2 wrangling scripts will download raw data from that latter folder

    1. I’ve set up all of the wrangling scripts to download raw data in an if() else{} framework that will print a message reminding you to move the data out of the app upload folder if you ever forget to/don’t see that new data have been uploaded
  3. That’s it! The data wrangling scripts will work without issue now that you’ve moved the files to the correct folder

28.2 Updating the Portal

It may become necessary to edit the portal, especially if a user emails you indicating they had a problem and it seems like that problem is inside of the app rather than (not to be mean) user error.

  1. All of the portal code is in the “Data-Portal” GitHub repository

    1. The script from which the portal is created is called “app. R” and is the only file in the folder “Data Portal Actual”

    2. You will also need the “deployment-faq. R” script in the “Support Scripts” folder in order to deploy the app after you have made/tested any changes to the portal’s code. i. If it is of interest, the “Support Scripts” folder also includes my (Nick’s) incremental forays into the world of Shiny so you can see the first through eighth versions of the portal before getting to a version that was deployed.

  2. Before changing the portal I strongly recommend asking the user who pointed out the issue for a screenshot of how they’ve filled the app out immediately before the error

    1. The error is almost always (or at least has usually been) something to do with how the user filled out the app or attached their data. If that is the case, you may need only point that out to them (in a polite way) and go about your day

    2. Also, the only time the app can break is when they click the “Submit Data” button. Prior to that, the app is not actually trying to do anything, so any app-breaking user error will not be apparent to them until they click that button

    3. HOWEVER, some users who experience an issue actually create a larger error that will prevent them from uploading their data even after you point the app key issue out to them. To handle such cases, see the next subsection. The importance of this is also noted in part d of the next bullet 3. To change the data portal do the following:

    1. First, modify the app. R script as desired. Note that every Shiny App consists of three components: (1) the user interface, (2) the server that includes all the internal mechanisms for the portal, and (3) a ShinyApp() call that combines the UI and server.
    1. If the app is not working, it will likely be in the server component

    2. If the app doesn’t look right but does function appropriately, modify the UI

    3. If the app does not collect some information that it should, you will need to change both the UI and server

    4. The ShinyApp call at the end never needs to be modified so don’t worry about that bit

    1. Second, test the app on your computer by running the app
    1. In R Studio, the top right of the R script panel containing a ShinyApp has a “Run App” button to the right of a green ‘play’ button

    2. Pushing this button will create a local version of the portal that functions as the app will but does not deploy to the internet (yet).

    3. I recommend submitting a test data file (see the folder of the same name for pre-built phase 2 data that you can use) from start to finish to ensure that everything works as desired.

    1. Third, once you are satisfied with your changes, you can deploy the app to replace the old publicly-available version on the internet!
    1. In the “deployment-faq. R” script, you will load the “rsconnect” library (line 12) and then use it to redeploy the app (line 18)

    2. Running the deployApp function will prompt you in the console to type a “Y” if you’re sure that you want to re-deploy the app

    3. After you type “Y” and hit return in the console, it will build your new portal, terminate the old one, replace the old with the new, and then activate the new one for all users

    4. You’ll know this is done when R automatically kicks you to a new tab in your web browser with the new portal open

  3. Fourth, and this is crucial, if your changes to the app were because a user was having issues, you need to delete any files they successfully submitted

    1. See the next sub-section for information on how to do this/why it needs to be done
  4. Finally, notify the user that initially contacted you letting them know that you have resolved the issue on your end, thanking them for bringing it to your attention, and inviting them to reach out again if it still isn’t working for them

28.3 Deleting Old Data

The data portal will fail if it tries to create two files of the same name.

  1. The amount of information used to create the file name means that it is incredibly unlikely that two different users could accidentally create the same file name

  2. BUT, as mentioned in the “Updating the Portal” section, it is entirely possible (and has happened previously) for the same user to try to submit data more than once and inadvertently create two files of the same name

    1. This occurs when the following happens:

    2. First, the user tries to submit data using the portal but something goes wrong so their data aren’t actually submitted (but a blank GoogleSheet of the user-supplied name is created)

    1. Second, the user tries to re-submit data (possibly after you fix the issue in the portal and notify them) but the blank document they unknowingly created earlier now causes a different error (i. e. , that there are now two files of the same name)

    2. Unfortunately, because Shiny Apps are noninteractive (see the “Service Account FAQ” section) the user will never be provided with an informative error (neither will you) so you’ll need to diagnose this as part of your ‘fixing the app’ process

  3. To resolve this, I (Nick) have created a second Shiny App that is not deployed

    1. To be clear, it should never be deployed to prevent its accidental (mis)use by general HerbVar members
  4. Justification and location of the second app

    1. The second app is in the “Data-Portal-Maintenance” GitHub repository

    2. The “Service Account FAQ” section below gives more context but in brief: the ‘app key’ file that the data portal makes users attach is actually activating a sort of Google robot with the authority to create Google Sheets and move them i. This is necessary because an online portal cannot send an authorization request to each user in the way that R/R Studio does when such code is run on a local computer (again, see the “Service Account FAQ” section for more details)

    3. This ‘robot’ then is the true owner of all data files submitted through the app

    4. The portal cannot submit data to a Shared Drive (due to issues with the R packages that connect R and Google that are outside of our control) so this is an unavoidable state

    5. So, if a user accidentally creates a flawed data object of the same name as their real data they will be unable to submit their real data until the flawed one is deleted

    6. HOWEVER, because the ‘robot’ owns those files, you cannot actually delete any of its files (when you “delete” a file you don’t own you actually just remove yourself as a collaborator with no effect on the original file)

    7. Here is where we get to the need of a second app

    8. The robot’s GoogleDrive cannot be accessed via a Graphical User Interface in the way that you would access any other Google Drive

    1. So, to truly delete these files so a user can re-submit their data successfully, you will need to use this second app

    2. If you fail to delete the bad data, the user will never be able to successfully submit data of the same name 1. In theory, you could ask them to re-name their file in some slightly different way (i. e. , by changing their site name), but that would still have this flawed data floating in the ether which is not desirable

  5. Tutorial of the second app

    1. To reiterate, this Shiny app should never be deployed.

    2. You will see why, but for the moment, take my word on it that deploying this app has a non-zero potential of permanently deleting data files you actually want

    1. By keeping the script in GitHub and locking view access to only Will & the Data Scientist, we preserve its utility without opening Pandora’s box of deploying it and possibly having an HerbVar member use it improperly
    1. The second app is fully contained in the “check-service-acct-files. R” script (the only script in this project)

    2. Open that script and click the “Run App” button in the top right of the R script pane of R Studio

    3. As with updating the data submission portal, this will create a new tab in your web browser that contains a fully functional (but not available on the internet) version of the app

    4. The app is divided into three columns that you will proceed through from left to right

    5. First, download and attach the key for the service account that owns the files you want to look through (column 1)

    6. For now we only have one service account for phase

  6. I recommend creating a new service account for each subsequent phase to evade data storage limits and partition sources of error in a clean, behind-the-scenes sort of way

  1. See below for information on creating Service Accounts
  1. Second, click the “Authorize” button to notify the app that it should attach the app key (column 1)

  2. This may take a few seconds but should generate a full list of all files owned by the robot (i. e. , owned by the service account) g. Third, after looking at the list of files, click the “Extract File Names” button (column 2)

  3. This just populates the third column so don’t worry about the violence implied by the verb ‘extract’

  4. Fourth, scroll through the drop down list (column 3) and select the file you want to delete (could be a test data file or the product of a specific user’s failed attempt to upload their data)

  5. Fifth, above the dropdown menu, check the “Yes” option beneath “I am ready to delete a file”

  6. I recommend doing this after selecting a file to further mitigate the risk of deleting the wrong file j. Sixth, click the “Delete Selected File” button i. Because you attached the service account key in column 1, you are viewing and interacting with the robot’s files as the robot (rather than as yourself)

  1. This gives you access to actually delete files rather than just–as mentioned before–removing yourself from seeing the file
  1. Seventh, once a dialogue has popped up below the “Delete Selected File” button confirming the file has been deleted, click the “Update List of Drive Contents” button

  2. This will update the dropdown menu with the new file list now that the file you marked for deletion has been erased l. Finally, scroll through the dropdown menu (or look at the list of files in column 2) to ensure that all problem files have been deleted 6. After you’ve gone through that process to delete the flawed file, you can notify the user that it is safe for them to resubmit their data

  3. This is all likely too much information for the user though so I suggest that you just tell them you have fixed the data portal and leave it at that

  1. Also, I have written the app to work with any service account key that owns Google files so unless the structure of future phases’ data portals changes massively, this app should be sufficient for all issues involving service account-owned files in the future

28.4 Service Account FAQ Background Information

The data submission portal accepts uploaded data locally and then (1) creates a Google Sheet version of the data and (2) moves that sheet into the designated folder in the HerbVar Admin Drive. However, the Shiny app is “non-interactive” (see gargle’s vignette) which means that a user cannot input a gmail or access token to tell Google Drive/Sheets who is creating/moving files. A “service account” is necessary to get around this.

A service account is essentially a robot that we pre-approve to (1) create google sheets, (2) move files, and (3) have access to the folder(s) we want those sheets made in/moved to. ○ Side note: see the list of people with access to the folder the Shiny portal saves files to and you’ll see the service account I created in that list.

To create/manage a service account you need to use “Google Cloud Platform” as described below:

28.5 Tutorial

  1. Sign into the herbvar@gmail. com Google Account

  2. Visit the Google Cloud Platform

    1. If there is a pale red/pink error saying you don’t have sufficient permissions to view the page, select the herbvar@gmail. com account from the drop down in the top right of the screen

    2. The page should then re-load to the dashboard

  3. Don’t get overwhelmed by the level of detail on this page!

  4. In the left sidebar, click “APIs & Services” and within that menu click “Credentials”

  5. Click the service account name in the “Service Accounts” list at the bottom of the screen

  6. Keys can be managed in the “Keys” tab

  7. In the event of a security breach (not sure what that would look like but still good to have the contingency), delete the existing keys and create a new one

  8. Download that key and replace the one HerbVar members have access to with the new key file.

  9. If creating a new key or service account prompts you to add permissions to the account be sure that it includes BOTH the GoogleSheets API AND the GoogleDrive API

    1. Both are needed because the data portal uses the service account key to both create a GoogleSheet (using the eponymous API) and move that GoogleSheet (using the GoogleDrive API)