1. Case Study 1: Data Ingestion

ChatGPT 4

By using ChatGPT 4’s Advanced Data Analysis Beta Feature, the user can upload a variety of files in different formats (including CSV) that can be parsed by the model. This makes ChatGPT 4 the only model that can ingest full datasets, giving it full access to the information one would need to perform the data analysis.

At the same time, some amount of pre-processing needs to be done before the files are uploaded to ChatGPT 4. For example, oftentimes ChatGPT 4 has difficulty extracting information from WORD and PDF format files that include text and charts, images, or graphs (see below Erwin Ephron Case Study for further elaboration). One way to solve this is to identify headers, paragraphs, and general structure of document; another way is to parse files to more manageable data. Finally, if nothing else works, a simple ‘copy’ and ‘paste’ will do.

ChatGPT 4 also demonstrated difficulty understanding datasets downloaded directly from Qualtrics. In the Privacy Report survey, for instance, even after specifying that there are 3 header rows and that the second row contains the actual question, ChatGPT 4 was unable to identify data. Thus, it was easiest to delete all header rows except the second one that contains the question, and re-upload the data set.

Bard

Bard claims to be able to access publicly available documents via links yet this is not the case.

Consequently, it is easier to resort to inputting the data in raw text format. Bard also purports to possess an expedited mechanism that facilitates direct linkage with files stored within Google Drive or Drop Box, inclusive of CSV files. However, when uploading data files to Google Drive and asking Bard to access them, the outputs are not aligned with the documents’ actual content. Additionally, Bard’s attempts to analyze the unprocessed data file from Google Drive resulted in Python code for statistics not present in the dataset or previous reports.

Claude AI

Claude AI boasts the most streamlined data ingestion process, permitting file uploads including PDF, txt, CSV, albeit with a size limit that prevents the upload of the full data file (5 documents max, 10MB each).

Claude AI did ingest the original files in CSV format but said that it could not reproduce any of the data within the file due to copyright constraints. As a result, it returned generalized code in Python that one might use to perform a data analysis, which was not specific to the data file supplied (for example, all variable names and column names had been falsified). Thus, the code would require significant altering in order to be used for analysis on the actual data set.

With a free Claude AI account, user interactions with the model are capped. As a result, comprehensive analyses might need to be split up across multiple times/days to incrementally achieve the intended outcome.

Data ingestion capabilities

Add a Comment