How to Use Generative AI for Content Analysis

Generative AI has opened up a variety of possibilities for content analysis. The AI engine can take unstructured language, such as free-form text in a survey, and organize it into categories and themes. The AI engine can also take structured writing and score it against goal-oriented criteria. Both processes help content creators and analysts alike save time in crafting messages and in finding insights delivered through human language.

In “The Gartner Predictions for 2024: Data & Analytics“, Gartner predicts that “by 2027, 75% of new analytics content will be contextualized for intelligent applications through GenAI, enabling a composable connection between insights and actions.” This mouthful basically says that data analysts will soon focus their attention on providing AI platform inputs well-suited for generative AI processing. The benefit from this purposeful approach will be insights that directly motivate and justify decision-making.

In the case of content analysis, the data analyst can best leverage generative AI’s ability to craft insightful structures from human language by applying a systematic process of inquiry that starts with prompt engineering.

Generative AI for content analysis
Generative AI for content analysis

Prompt Engineering

For educational purposes, I scored the marketing copy of a well-known high tech company with typical SEO (search engine optimization) criteria. It is tempting to just throw the content at a generative AI engine and then observe the magic. I prefer to start from a structured guidance for the generative AI and reduce the need for follow-up inquiry and refinements. This approach comes in the form of prompt engineering.

Prompt engineering is the modern form of systematic inquiry. This methodology requires thinking of a question as a well-structured set of instructions. Prompts are essentially code for natural language processing. Here is the prompt I structured to analyze the marketing copy:

content scoring - prompt engineering
content scoring – prompt engineering

The color-coding reveals the structure of the prompt, the code for natural language processing. The prompt starts with a request. This request tells the generative AI the domain of inquiry. In this case, I want an analysis, a structured review of the content. The prompt next provides the context for the request: a blog post titled “85% of IT Leaders See AI Boosting Productivity, but Data Integration and Overwhelmed Teams Hinder Success.” I spelled out the title to provide full context and ensure the generative AI looks at the content with an accurate notion of the article’s theme or purpose.

Since I do not want just any old analysis, I next provided a set of requirements that constrain the search space of analysis. The requirements all relate to SEO: readability, sentence length, passive voice, transition words, consecutive sentences, subheading distribution, and paragraph length. Finally, I need a way to judge the validity of the answers from the generative AI. This validator part of the prompt is particularly powerful for domain experts who will critique the results. People who are not domain experts in the area of inquiry may need to iterate before achieving a relevant set of validators.. Either way, the core principle relies on reviewing a justification and explanation according to some understandable and interpretable standard.

Content Analysis with Content Scoring

For this exercise, I used ChatGPT 4. I provided the prompt to ChatGPT 4 followed by the content of the article (I did not provide a link in the prompt to ensure ChatGPT 4 analyzed the correct content). The result was quite elegant. ChatGPT 4 produced exactly what I wanted and more. The table below shows how ChatGPT created a 10-point score along with a detailed explanation and justification for its scoring. (Large caveat: ChatGPT can exhibit variability in its responses to the exact same questions, even in the same session and account. Sometimes certain libraries for analysis become unavailable or computing resources become too constrained. In these cases, refine the prompt further to meet the specific requirements including splitting up the content into smaller chunks. I do not expect such limitations in enterprise level generative AI platforms).

Scoring (of 10)Explanation and Justification
Readabilty: 8Relatively easy to read, using straightforward language and clear examples to explain complex ideas. However, the dense technical content and industry-specific terms may slightly hinder readability for laypersons.
Sentence Length: 7Maintains a balanced sentence length, mixing shorter sentences with longer, more complex ones to convey detailed information without overwhelming the reader. Still, some sections could benefit from shorter sentences to enhance clarity.
Passive Voice: 6Moderate use of passive voice, which can sometimes obscure the subject or make statements less direct. Reducing passive voice and using more active constructions could improve engagement and readability.
Transition Words: 7Uses transition words effectively in several places, guiding the reader through the narrative and arguments. However, more consistent use across all sections would help in making the content flow more smoothly.
Consecutive Sentences: 8Does well in varying the structure and beginning of consecutive sentences, which helps in keeping the reader’s attention. There are few instances of repetitive sentence starts, contributing positively to the reading experience.
Subheading Distribution: 9Subheadings are well-distributed, breaking up the text into manageable, thematic blocks. This structure aids in scanning and understanding the content, which is crucial for both reader engagement and SEO.
Paragraph Length: 8Paragraphs are generally concise, aiding in readability and making the content appear less daunting. Some sections could be broken down further to enhance skimmability, especially for readers on mobile devices.
Content scoring: Inspect detailed analysis – are explanations compelling?

I next judged whether I find the results compelling. In more advanced scenarios, each justification and explanation could be a branch of of further inquiry. For example, I might ask for specific examples from the content to get a full explanation of . In this case, I was quite satisfied with the results!

ChatGPT 4 also provided a summary of the results which revealed an interesting structure that could also branch out to further inquiry.

Scoring (of 10)Explanation and Justification
Overall Score: 7.5This blog post effectively communicates its message, supported by research and statistics. To further improve, focusing on reducing passive voice, shortening some of the longer sentences, and enhancing the use of transition words across the entire text could make the content even more accessible and engaging. Additionally, a stronger focus on SEO techniques such as strategic keyword use, optimization of meta elements, and link strategy could increase visibility and reach.
Content scoring: Inspect the summary for high-level validation that the correct problem is in play

In bold are key structural elements of the overall summary. The claim of effectiveness is the summary of all the scores. ChatGPT 4 then moves on to provide some specific recommendations based on the scoring. The AI specifically goes after the categories with the lowest scores. Finally, there is a component that may require further exploration. ChatGPT 4 makes reference to “SEO techniques” that I did not include in my prompt and some of which did not directly come from my request or the content. Depending on the use case, I can reject these bonus recommendations out of hand or start more branches of inquiry.

Calibrating Content Scoring

A domain expert can take the content scoring another step further. by comparing them to established techniques. This calibration step provides directional validation for ChatGPT 4’s results. In some cases, the calibration step could also challenge the established technique!

Yoast provides a free plug-in for WordPress to help content creators optimize content for SEO. I examined the available scoring for the marketing copy and compared it to the ChatGPT 4 scores.

CategoryChatGPT 4Yoast
% of Long Sentences (>20 words)78%51%
% of Sentences in Passive Voice33%10%
% of Sentences with Transition Words22%23%
Compare content scoring against established/known standards: ChatGPT vs Yoast

ChatGPT and Yoast agree on the percentage of sentences with transition words. The alignment is encouraging. The other two categories are far apart. I manually counted 59% of the sentences as being “long”. Given the gaps, I would look for differences in methodology (at the time of writing, ChatGPT 4 was not able to reprocess the count while excluding stop words).

The gap in scoring passive voice was most concerning since Yoast sets a 10% threshold for SEO-ready content. At the time of writing, ChatGPT 4 was unable to produce a list of the passive voice sentences from the entire document. As a workaround, I divided the document into smaller chunks and prompted ChatGPT 4 to identify the passive voice in each segment of content. Surprisingly, using this method, ChatGPT identified fewer passive voice sentences than Yoast. Among the 4 disagreements, I sided with ChatGPT just once. In one case, ChatGPT 4 seemingly got distracted by two uses of the verb “to be” in the same sentence. This close examination highlighted the on-going need for human assessment and validation of generative AI results! Moreover, sampling may reveal surprising limits of the computing power the generative AI chose to apply to the content analysis.

For reference, ChatGPT provided the following definition of passive voice: “To identify passive sentences, we look for phrases that typically involve a form of the verb “to be” (is, are, was, were) followed by a past participle (often ending in -ed, though there are many irregular verbs)”.

Conclusion

Generative AI can be a powerful tool for content analysis. Combined with human domain expertise, generative AI can provide an assessment of content according to goal-oriented criteria. The data analyst remains at the center of the content analysis given the need to validate results whether through AI-generated justifications and validations or whether through manual sampling of content. (Note that in cases where ChatGPT 4 is unable to process the data, it can still provide the (Python) code for analyzing the content off-line in another platform).

Content analysis through generative AI sets the stage for more intelligent, efficient, and effective content strategies. The capabilities for content analysis should improve over time by reducing the need for validation and expanding the breadth of content coverage.

Let me know in the comments your success and challenges with content analysis using generative AI!

Do you want help in setting up a systematic content analysis process? Contact Ahan Analytics, LLC for more information.