Please view our final project website, Digital History Methods, to see what we produced in the Spring 2014 semester!
Category Archives: Uncategorized
As we continue to address the question of how similar Texas runaway ads were to those of other states, we have begun focusing on a handful of specific topics within that question. This week we have been working on answering some of the questions we had after the close-reading and initial digital reading of the ads. Daniel and I split up the labour by tackling different questions. Below, find results for the digital reading I conducted on my half of the questions.
One of the things I was interested in after completing the close-reading was whether Texas had a higher frequency of group runaway attempts. In TAPoR’s Comparator, a higher occurrence of the plural word “negros” suggests that Texas might have had more multi-person runaway attempts.
19th C. ads seem to have more commonly spelled “negroes” with an “e” however. (Just as a side note, this highlights the importance of checking for variations or abnormalities in spelling when conducting digital research.) Although the raw count for “negroes” is higher in Arkansas, the relative frequency based on document length is higher for Texas, although not by much.
For the word “runaways,” Texas and Arkansas are relatively equal. Just based on these word counts, it is difficult to make a conclusion about group runaways in Texas, but appears that rates of mentions for group runaways in Texas and Arkansas were roughly equivalent.
In comparison to the Mississippi corpus, Texas rates of “negros” and “negroes” are both higher.
However, use of “runaways” is significantly larger in Mississippi.
Curious about why this might be, I clicked through to the word in context section.
This snippet from the use of the word “runaways” in the Mississippi ads reveals that it was often used in relation to jailor’s notices. Even more so than other runaway ads, jailor’s notices tend to follow a very standard format, often following a word-for-word form. In Mississippi, many of them are titled “Runaways In Jail” and conclude with a note about the “law upon the subject of runaways,” as seen above. This is one example of how the presence of jailor’s notices in the mix of runaway ads can skew the trends in one direction or another.
Ultimately, the digital reading of the ads through TAPoR suggest to me that Texas did not have a significantly higher frequency of group runaways than either Arkansas or Mississippi. If we choose to pursue this question more closely, it will be necessary to do a combination of digital and close-reading to confirm one way or the other.
Note: For an in-depth explanation of how TAPoR calculates relative frequency ratios, see this earlier post explaining the calculations behind the figures.
Thieves or accomplices?
In my rough-draft of the close-reading of the ads, I noticed that many runaway ads suspect an accomplice of persuading the slave to run away, or a thief of stealing the slave for resale. These ads provide historical clues about some of the routes of aid that runaways might have had, or perhaps, instances in which slaves were made to “run away” against their will. In ads where the subscriber suspects a thief, it is not possible to know whether those suspicions are accurate or not, but it can give us a sense of the climate in that region. If subscribers are more suspicious of thieves or accomplices in one region vs. another, it may suggest a situation of lawlessness and slaveholder fear in that state.
In Texas compared to Mississippi, the word “stolen” appears relatively more frequently, but the word “thief” relatively less frequently.
Another word which appears frequently in this category of ad is “persuaded,” which appears more frequently in Mississippi. This screenshot shows snippets of how the word “persuaded” appears in the ads.
To me, these results suggest that the Mississippi ads may have had more instances of accomplice and/or thief suspicions than Texas. Texas and Arkansas had very similar rates for the words “stolen,” “thief,” and “persuaded”.
In Voyant Tools, this embedded word trends from the Telegraph and Texas Register collection shows the correlation between the words “thief” and “stolen”. Both words follow similar patterns across the corpus.
One of the benefits of Voyant is that it allows users to copy an html link to embed Voyant tools into their own blog. Rather than a static screenshot, readers can change the settings on the embedded tool. Try playing around with it!
This tool also allows you to visualize where in the corpus peaks in slave theft occur. Patterns such as these raise questions about why theft might have spiked in the Houston area at that time. This is just one example of how a digital reading can notice patterns difficult to discern through close-reading alone, but inspire further close-reading through the questions raised.
Racial Descriptors? Differences in Racial/Ethnic involvement?
In most runaway ads, the subscriber tends to give some description of the runaway’s complexion or racial status. We were interested in tracking variations in these terms across states. We were also interested in tracking runaway slaves’ involvement with various racial or ethnic groups in their geographical area.
This embedded graph from Voyant shows trends for the words “African” and “Africans” across the Telegraph and Texas Register. Over time, occurrences of these words goes down until eventually disappearing. In class, we talked about potentially finding evidence of the illegal international slave trade continuing for a while in the early years of Texas. These trends would suggest that to be the case.
Additionally, “African” appears most frequently in Texas compared to the other states, and slightly more frequently in Mississippi than in Arkansas. This confirms my suspicions from the close-read that Texas had higher rates of Africans than the other states, as well as my hunch that Mississippi and Texas, with access to ocean ports, would have higher rates of African slaves than landlocked Arkansas.
One of the terms we both noticed in our close-readings was the French word Griff(e). “Griff” and “Griffe” occur much more frequently in Texas, followed by Mississippi, and not at all in Arkansas. Tracking the word “Griffe” alongside “French” and “Mulatto” reveals some interesting trends. While Texas and Mississippi have higher use of the word Griff(e), Arkansas has higher use of the word Mulatto. Additionally, Texas and Mississippi — the states where the French word “Griff” is used — also have higher occurrences of the word “French” suggesting a more significant presence of French people or the French language in these states. Possibly, in Texas and Mississippi, subscribers were more likely to prefer the term Griff(e) to refer to someone of part white, part black ancestry, whereas in Arkansas they were more likely to prefer the term Mulatto.
This final screenshot reveals the high relative frequency of “Mexico” and “Mexican” in Texas. In Arkansas and Mississippi, these words never occur at all. This screenshot also shows how the favorites function works in Voyant. To track several related words, use the search function in Words in the Entire Corpus, then select the word and hit the favorites heart in the bottom bar. Then you can toggle back and forth from favorites and search to either look at the list of words you want to track or select more words. From here, you can select one or more words from the favorites list to track visually chart their progress across the corpus. This is very handy tool for tracking word correlation and relation.
If you are interested in looking into these runaway trends in Voyant more closely, follow this link to a saved Voyant skin containing all of the collected ads from Texas, Arkansas, and Mississippi.
(note: this is a draft of the essay, so please feel free to comment with suggestions for final revisions!)
How similar were Texas runaway slave advertisements to those of Arkansas and Mississippi? A collection of runaway digitized slave advertisements from a variety of newspapers spanning the years 1800 to 1865* can help answer this question. In the end, patterns of runaway advertisements in Texas, Arkansas, and Mississippi, are on the whole very similar, with some notable distinctions.
In all three states, runaway advertisements follow a standard format, usually providing similar kinds of information. Most include a description of the runaway slave(s)’ name, age, physical characteristics (such as height and complexion), distinctive marks or injuries, and notable personality traits. The ads also provide information about slave ownership or origins, when the slave escaped (or date captured, in the case of found runaway notices), where the slave escaped from, and where they are believed to be headed. When relevant, the ads provide information about suspected accomplices or a descriptions of horses used for running away. More detailed ads sometimes describe the slave’s clothing, familial relationships, hobbies, or skilled crafts. Most advertisements concluded with the subscriber’s name, and where and how they can be contacted. And as incentive, runaway notices almost always prominently advertise that a generous reward will be given for information about or capture of the runaway slave. At first glance, this consistently short, boilerplate format of runaway ads makes it difficult to really distinguish between them. The ads from Texas, Arkansas, and Mississippi start to all look practically indistinguishable, making it difficult for close-reading alone to recognize pattern breaks between the states, without the assistance of computational data. However, there are certain distinctive details that appear more in one state vs. another.
Before describing these differences, it is worth noting the similarities in runaway advertisements across state lines. In all three states, the “typical” runaway is a young male. Ads for female runaways occur disproportionately infrequently, and it is rare to see an advertisement for a child or an elderly slave. As a means of transportation, runaways often take from their masters a horse or a mule. All states feature ads which contain a “white man” suspected of stealing or persuading slaves to run away. These ads often refer to the man as a “white villain,” clearly angered over this blatant disrespect to their property, and offer a separate reward for the apprehension and conviction of the white thief. Some ads describe a specific white man who has been seen in the neighborhood, and are able to provide details of name and appearance. Others, on the other hand, treat this “white man” as a nebulous, unnamed threat. One slaveholder in Texas, for example, found it hard to believe that his slave would run away of his own volition, stating that: “It is believed that he was instigated to run away by white persons, as he has always been treated with great kindness”. For this slaveholder it was easier to blame the outside influence of a villainous white man than question a paternalistic belief in the slave’s happiness. Unlike the other two states, in Texas, the nefarious accomplice/thief is sometimes listed as a “Mexican” as well as a “white man”.
In general, Texas runaways appear to have had more interaction with Mexico and Mexicans than in the other two states – not surprising, considering the state’s shared border with Mexico. Slaveholders in Texas were conscious of the presence of Mexico, often speculating that runaways were headed to the nearby nation. Embodying both fears of white predation and of the looming Mexican border, one Austin Gazette subscriber speculated that his runaway slave was “in company with some rascally white person. It is my impression said boy is making his way west, and will, under the guidance of white men, and with the assistance of his free pass, endeavor to reach Mexico”. While law enforcement officials and slaveholders in nearby Southern states could be depended upon to support the institution of slavery and return runaway slaves, slavery in Mexico had been abolished. This made Mexico an appealing destination for runaway slaves, and a concerning one for slaveholder subscribers.
Arkansas and Mississippi runaway ads, on the other hand, contain more mentions of interaction with Native American tribes than Texas ads. These two states much more frequently mention slaves suspected to be fleeing towards Native American tribes, slaves who are part Native American, or slaves who can speak a Native American language. Whether slaves in Arkansas were described as part Cherokee, or slaves in Texas were described as being able to speak “Mexican” (Spanish), runaway ads from all of the states suggest the diversity of the United States during the 19th century. These ads create a picture of how slaves interacted with and often benefited from the diversity of cultures in the United States and bordering nations.
It is also important to remember that regions of Texas varied both culturally and geographically, and runaway patterns across the state are not homogeneous. While ads from all the states mention instances of slaves stealing and carrying weapons upon escape, these appear more frequently among Texas ads. In particular, notices of runaway slaves carrying guns and sometimes knives appear frequently in ads listed in the Austin Gazette. Significantly, this newspaper circulated in a central Texas area, closer to the Western frontier, compared to the Houston-area based Telegraph and Texas Register. Potentially, proximity to the Western frontier, and the dangers associated with that area, gave slaves more access to weapons, or made slaves feel that taking a weapon with them on their escape was more essential to their success. Similarly, the Austin Gazette more frequently mentions slaves running for Mexico, suggesting that slaves in the central region of Texas ran for Mexico more often than the gulf region (or at least their masters suspected they did).
In all of the states, one thing to keep in mind is that patterns of runaway slave advertisements may not necessarily be the same as actual runaway patterns. There were many reasons why a slaveholder may not have placed an ad for a runaway, or would have delayed placing the ad. Maybe they believed a missing slave was merely “lying out” and would return to the plantation soon on their own. Maybe the slaveholder was using personal means to pursue and recapture runaways, such as a search team of locals, and didn’t feel the need to make a public announcement. Maybe the runaway was not valuable enough to justify the cost of running an ad. Or, in some cases, the slaveholder may not have even noticed the missing slave, if the slave ran away at an opportune time when the plantation was in chaos, such as the death of a master. However, the differences that exist between Texas runaway ads and those from Arkansas and Mississippi are enough to suggest that runaway patterns in Texas were distinct from other U.S. Southern states. These differences appear to be related in part to Texas’s proximity to Mexico and to the Western frontier.
*These included ads from the Telegraph and Texas Register from the years 1836 to 1860, hosted by the Portal to Texas History and transcribed by students at Rice University and the University of North Texas; advertisements from the Austin Gazette from 1850 to 1860, transcribed by students at UNT; and a collection of ads from several newspapers from Arkansas, 1820 to 1865, and Mississippi, 1800 to 1860, transcribed and publicly available from the Documenting Runaway Slaves project at the University of Southern Mississippi.
We have been making progress as a whole, both in the close reading essay and the search and comparison of the ad texts using digital tools.
Daniel’s initial findings through Voyant-
Initial searches of the Arkansas ads did not yield huge amounts of information, but enough to demonstrate that Voyant as a tool can help answer questions about the data. Some of the use for Voyant can simply be demonstrating a lack of a strong trend on a certain topic. The first question I used Voyant to answer was whether or not Texas slaves appeared to be receiving greater abuses or punishments than those in Arkansas. This required a search for vocabulary sets related to this. The close reading revealed that “scar”, “disfigure” and “lame” were used to describe slaves who seemed to have suffered injuries. While there are many specifically listed injuries, those are the most frequently used. By searching these words in all the sets of ads, I was able to reveal that Arkansas seemed to have proportionally more references to scars than Texas.
In our previous readings, we had talked about how slaves could have been more likely to have carried guns in dangerous Texas locations. Searching the text for references to armed runaways carrying rifles, shotguns, or knives, I found the results to indicate that there were proportionally less references to guns in Texas than in Arkansas.
Searching for references to Mexico and Mexicans, I found nothing in the Arkansas ads referencing Mexico. It does seem to be a Texas-specific location so far in terms of a destination for runaways.
There were proportionately more references to horses and mares in the Texas ads. This could tie into the sheer size of Texas for escaping across, a higher likelihood of property owners having horses, or perhaps that the acquisition of horses was necessary to try to make it all the way to Mexico.
From the first search through the ads, there were a few specific improvements I had in mind for future searches. One is to make a simpler way to compare numbers in data sets of different sizes. I was using rough proportions to compare the quantities of occurrences, but somehow finding specific sets within the States that were the same size would make a more straightforward process.
Another issue is the inclusion of jailor’s ads. For references to weapons and means of transportation, these will not be included as frequently in the ads of those already captured. Thus, different proportions of included jailors ads in the sets will further skew results.
Future searches include terms describing the intelligence of slaves and descriptions of their skilled labor, to be compared to negative terms, as well as searches for references to accomplices, thieves, or others who might have persuaded or forced slaves to escape.
Upon getting to work and trying to follow our schedule, we have realized that we planned to have too many things due in a very short period of time. We are in the process of adjusting the schedule and replanning what needs to be done. However, the three of us have been working on a few different tasks during the break.
Clare has been working on a draft of the close reading. Her rough draft includes an introduction, analysis of Arkansas, discussion of the advantages of the digital, and conclusion. The analysis of Arkansas will act as a prototype of what she plans to do with the conclusions she is reaching about Mississippi and Texas, although her data collection requires more time than previously realized. She plans on diversifying advertisement examples, as her current examples are from a few select years. We will discuss suggestions for the progress of the essay with Dr. McDaniel.
Aaron wrote a python script called placetagger.py that tags locations in each advertisement. He ran the cleaned advertisements from the two Texas newspapers and the Mississippi and Arkansas corpora through the script and saved them as JSON files. I then started to try to run the tagged locations through GeoNamesMatch, but I quickly ran into some difficulties. After discussing with Aaron, we decided that the input and output of this particular program was inconvenient for what we are trying to do. Aaron played around with Google’s free geocoding API (using the Python library Geopy) and had some success with it, so we have decided to use that instead. Aaron and I then started cleaning up the pretty printed JSON of the tagged locations, and we realized that even though we don’t have to correct spelling or extend state abbreviations, this task is going to take a very long time because of the large number of advertisements we have, especially in Mississippi. Our original plan was to compare the output of NER to the actual advertisement–essentially just using the NER results as a footing for the actual list of locations–but due to the large amount of data and the limited amount of time left in the semester, that might be infeasible.
Through cleaning the tagged locations, we noticed that the python script has been separating locations that should be together. Some results come out as [Travis County], [Texas] instead of [Travis County Texas], or even as [County] instead of [Travis County]. Additionally, we noticed that NER misses county names when the word “County” appears lowercase, so before we run the script again we will fix the capitalization in our input files. It is unlikely that we will ever be able to write a script that catches every location with precision, but we would like to be as close as we can get.
Thus, Aaron is planning on revising placetagger.py so that it does not split up the county or city name from the state name, perhaps by setting a threshold for the gap between each match in the text for the results to be considered distinct entities. Once that is done, he will rerun the advertisement corpora through the new script, and then he and Kaitlyn will begin cleaning the results. We will need to come up with a few parameters or rules for cleaning the results so that there is consistency across the states. We will also need to decide if we should compare the results for each advertisement to the original text. That could be a very time consuming process, so we may choose to compare a subset of the entire results to the original advertisements, or reduce the number of advertisements overall for which we will produce data.
Even though we will have to reclean the results, we can still use the current cleaned up results from Texas to start thinking about how we want to visualize our results. Right now, we are planning on looking at Palladio to see if it will fit our needs. We also have been thinking about creating a map that shows how many times a state has been referenced in another state’s newspapers. Ideally, we would like to be able to hover over a state with the cursor and it shades that state and other states with intensity determined by number of mentions of places in that state from the origin state’s ads, but we are still figuring out how to do that. We can start to see how this would work by using the current Texas data in Google Fusion Tables to create a preliminary visualization. Aaron and Kaitlyn will also give feedback on the close reading essay to Clare as she continues to revise her draft.
The method of close-reading runaway slave advertisements between 1835 and 1865 allows for an exploration of whether patterns of listed locations differ between the states, specifically in relationship to how Texas trends might differ. Various newspapers from Mississippi, Texas, and Arkansas provide the data set for this analysis. Trends are most easily analyzed by individual state, followed by a conversation and comparison of these overall trends between the states.
Spanning the years 1835 to 1865, the pattern of Arkansas’s runaway slave ads shifts with its relative position to other states. A territory until it reached statehood in 1836, Arkansas was the borderland of the United States for the earlier years between 1835 and 1865. Texas declared independence in 1836 and maintained its autonomy from the United States until 1845. Arkansas, then, was essentially a western borderland. The passage of the Mississippi River through Arkansas also allowed slaves the opportunity to escape by boat, as the mulatto Billy attempted when fleeing from New Orleans (AR_18360526_Helena-Constitutional-Journal_18360526).
Many jailer notices in Arkansas advertise captured slaves from more eastern states, indicating that Arkansas was a popular destination or point on the route to freedom. For example, in 1836 two slaves Jacob and Jupiter say “that they Belong to H. B. JOHNSON, residing in Yazoo county, Mississippi” (AR_18360705_Arkansas-Gazette_18360409). Similarly, the captured “Negro man” Henry claims his home is in Memphis, Tennessee, with a Mr. Staples (AR_18551123_Democratic-Star_18551123). Numerous other examples also support this trend.
In addition, many slaveowners from other states advertised for their runaways in Arkansas, indicating that they considered Arkansas a likely location for their runaways. George and James of Mississippi are advertised for in the 1838 Arkansas Gazette, in addition to re-publication of the advertisement in the Memphis Enquirer and Little Rock Gazette (AR_18380314_Arkansas-Gazette_18371002).
Although the westward movement seemed to be generally assumed among slaveowners, a handful considered family ties stronger, such as Martin Miller of Fayetteville, Texas, who advertised for his slave in the Arkansas Gazette: “Said Negro was brought from Georgia, and is probably making his way back to that State” (AR_18360909_Arkansas-Gazette_18360727).
With the passage of time, these trends shifted. Arkansas lost its “borderland” status to Texas. With these changes came a change in the fugitive slave advertisements. The number of runaways from Arkansas increased, probably due to a rise in population. The number of jailer’s notices advertising slaves who claimed to be from other states also increased, however, suggesting that Arkansas still served as a way station for slaves on their journeys to Texas or Mexico.
Despite the projection of locations onto their runaways, slaveowners acknowledged that these assumptions were just that – merely assumptions. An 1836 ad from the Arkansas Gazette states “I have dreamed, with both eyes open, that he went toward the Spanish county; but as dreams are like some would be thought honest men―quite uncertain―he may have gone some other directions.” Although most fugitive slave advertisements were slightly less flowery in their language, the inaccuracies of projected direction were subtly acknowledged in the advertisements.
Mississippi ads tend to be both jailer’s notices and runaways ads of and for slaves from Mississippi. This trend suggests that Mississippi, unlike Arkansas, was a more stable slave economy and not as frequently a destination for slaves.
Texas, the focus of this research, offers data from the Texas Telegraph and the Texas Gazette. William Dean Carrigan, in his article “Slavery on the frontier: the peculiar institution in central Texas” sets Texas up as “a world torn in three directions by four different cultures.” The Native American tribes and the Mexican border both helped to define Texas as a borderland. How this exhibited itself through the runaways, however, is still contested. Campbell states that runaways tended to head toward either Mexico (for freedom) or toward the east (to rejoin relatives that they had been separated from) but does not indicate which was more prevalent.
The extensive size of the data set results in certain implications based on the time-consuming and labor-intensive nature of the manual labor of close reading. When analyzing the data by the human eye, pre-conceived assumptions come into play, and unexpected results are less likely to be found if present. In digital analysis, however, unique results can be reached more easily through an unbiased re-organization of the data. Without digital tools to sift through the information and help identify patterns, the presence of human error in evaluating the advertisement trends is more likely to be present, especially based around expectations. Focusing on multiple elements or the connections between them is also more difficult. For example, perhaps there exists a correlation between the amount of the reward and the projected location of the slave or distance between the locations of the advertisement and the owner. Without the extremely labor-intensive process of creating a spreadsheet, this evidence is difficult to analyze. Specific locations (cities and plantations) fall to the generalization and recognizability of states and counties. With over 1000 advertisements in the Mississippi corpora alone, analysis and trends are very difficult to find in a short period of time.
Based on these observations, the borderland status of states does change the location trends present in runaway slave advertisements. The advantages of digital tools, however, will help us analyze these conclusions to evaluate the correlation between digital tools and close-reading, as well as possibly reveal unexpected patterns in the data set.
Our task is to compare the Texas ads to those from Arkansas and Mississippi . We will be utilizing a range of tools for this, many of which we have touched on in class. We are looking at our work as a series of progressive tasks mining the ad content for deeper trends and information.
Our first task is to complete a close reading of the ads in question. This will familiarize us with the material and provide grounds for categorizing key concepts, words, and phrases that we should search for. By Wednesday, April 2nd, Alyssa will have done a close reading of the Austin Gazette ads and Daniel will have read the Texas Register ads from the overlapping years. This will give a starting point of keywords and concepts we’re looking at to start the write up.
By April 7th, Daniel will have begun the text analysis of Texas and Arkansas ads with digital tools and the findings of the close readings. He will share these with Alyssa, who will work on the close reading write-up due that day.
Our second task will then consist of utilizing these keywords and concepts to search the text with Voyant and TF-IDF. We can look for trends and differences across states in the various results and visualizations we get from these. We can also verify some of the categories by running the text through topic modeling, and searching those results for trends in Voyant as well. Our future progress reports will include a commentary on how well these tools have worked for various purposes, for the benefit of future studies using these tools.
From this point, we plan to be flexible based on the findings of the text analysis. Specific results might encourage further mining of the text for trends, or else might require us to go back to some of our earlier readings for comparison or contrast of findings. We can also at this point see if filtering the jailors’ ads from the text will change the language trends of the ads significantly, or if in fact the language of the different types of ads are highly similar.
Monday, March 31st
Split up Arkansas and Mississippi corpus into individual ad files using drsparser.py – Aaron
Write python script placetagger.py to tag places using Pyner in a folder of text files and save the results – Aaron
Run placetagger.py on Arkansas, Mississippi, and Texas (Gazette and Telegraph) corpus – Aaron
Run placetagger.py on Mississippi corpus – Kaitlyn
Start looking over required readings from earlier in the semester for more information about trends in runaway destinations and connections among Texas, Mississippi, and Arkansas. Do additional readings if necessary – Clare
Clean up Named Entity Recognitions results – Aaron and Kaitlyn
- Remove false positives
- As thoroughly as possible given the magnitude of the collection, scan through tagged documents for any obvious false negatives
- Tag each tagged location as a to or from, projected or real
Look over our data and outline the essay – Clare
Test drive Palladio and research other mapping options for displaying our place connections results – All
Analyze Named Entity Recognition results – All
If a lot of NER results:
- Research geocoding APIs to parse our NER results and generate latitude/longitude coordinates for all named places – Aaron
- Write script to generate coordinates for tagged locations and execute on our data – Aaron
- Manually search for and store coordinates – Kaitlyn
Draft the close reading essay – Clare
Write progress update for course blog – Aaron and Kaitlyn
Decide on how we want to display our place connectedness results – All
- How to display our results? Lines connecting the “to” coordinates (e.g. projected destination) and the “from” coordinates (e.g. coordinates of Houston for the Texas Register)? Something more individualized, at the ad-level? Collapse lines between cities or even states into single weighted lines by the number of that connection?
- Building onto the first question, how to indicate direction: different line shapes/colors? For example, if there is a Texas ad that says their runaway probably went to his family in Arkansas, how to we differentiate that from an Arkansas jailor notice for a runaway slave saying he is from Texas? Is it important at all for us to make this distinction? If not, we might do better with a map in the form of an undirected graph.
- How to separate projected runaway “to’s” (and guessed “from’s” for jailor’s notices, if ads like that exist) from actual “to’s” and “from’s”? Do we have much more of one type (real. vs guessed) — probably almost exclusively guessed locations?
Coordinate analysis and clean up – All
Re-assess rest of semester schedule in light of presentation format choices – All
Choose a mapping tool. Start building our map based on our decisions about to, from, projected, etc – All
Discuss our overall findings, and how our graphs and/or interactive tools share this information – All
Write and post progress report on course blog (by Monday) – All
Begin Methods page – Aaron and Kaitlyn
Begin Conclusions page, including followup questions and summary of findings – Clare
Finish our map and other graphics – All
Write and post progress report – All
Finish Methods page – Aaron and Kaitlyn
Finish Conclusions page – All
Finalize website pages – All
Throughout, Clare will re-work the essay in light of any new info.
After completing last week’s progress report, one of the questions we were left with is how the TAPoR Comparator calculates relative ratio. The documentation page does not specify where the relative count or the relative ratio come from, but a few trial calculations we able to lead us down the right path. We tested out numbers for “negro,” the most frequently occurring word in the Arkansas document from the Documenting Runaway Slaves Project project.
The results? The relative count equals the word count divided by the total number of words, so in this case, 920/80,690 for Arkansas, and 2,688/235,602 for Mississippi. Next, the relative ratio equals the Text 1 relative count divided by the Text 2 relative count, 0.0114/0.0114. Words that are relatively more frequent in Text 1 (AR) have a relative ratio value higher than 1, words that are relatively more frequent in Text 2 (MS) have a relative ratio value lower than 1, and words that are relatively equal have a value of 1. The relative ratio adjusts for document length and raw word counts to compare relative word frequencies. For example, even though “negro” has more than double the word count for Mississippi, the relative count for both AR and MS is ~0.0114. This places the relative ratio at 0.9994 – almost 1. (The reason this value is not exactly 1 is because the displayed relative counts get rounded off after the 4th decimal place. The relative counts for AR and MS are not actually precisely the same numbers down to the last decimal place).
So, Comparator balances the differences in document length between AR and MS to reveal that relatively, advertisements from the two states use the word “negro” with practically equal frequency. This sort of comparison could be useful for determining how language used to refer to the race of slaves does (or doesn’t) change across states. Similarly to TF-IDF, Comparator attempts to adjust for term frequency across documents to locate words that are more commonly occurring in one document compared to the rest of the corpus.
Now that we know how they both work, it would be interesting to compare our documents using both TAPoR’s Comparator and TF-IDF to see how the results differ. Here are the results for the word “negro” in Voyant’s TF-IDF option, recently added by Stefan Sinclair.
Again, AR and MS have very similar TF-IDF scores for the word “negro” despite MS’s raw word count being much higher.
This week, we used the Stanford Named Entity Recognition program to find the newspapers in the Mississippi ads corpus. We had to break up the corpus into several different files because the original text was too long to run in the NER at the same time. By breaking it up into 9 different files, we found 12 different newspapers that were tagged as organizations: the Vicksburg Register, the Natchez Courier and Journal, the Memphis Enquirer, the Louisville Journal, the Cincinnati Gazette, the Port Gibson Correspondent, the Southern Argus, the Alabama Journal, the Woodville Republican and Wilkinson Weekly Advertisor, the Southern Tribune, the Fayette Watch Tower, and the Mississippian State Gazette. We then searched for the number of occurrences of these newspaper titles in the original Mississippi ad corpus. The numbers below are inflated because we still have not figured out how to remove the footnotes. Most search results contained occurrences in footnotes, but most had too many to manually count and remove.
|Newspaper Title||Number of Occurrences|
|Natchez Courier and Journal||70|
|Port Gibson Correspondent||158|
|Woodville Republican and Wilkinson Weekly Advertisor||51|
|Fayette Watch Tower||18|
|Mississippian State Gazette||26|
The following newspapers were merely mentioned and not actual newspaper entries: the Memphis Enquirer, the Louisville Journal, the Cincinnati Gazette, and the Alabama Journal. It seems that other newspapers were reprinting advertisements from these newspapers. From these numbers, it also seems that the Port Gibson Correspondent has a similar number of advertisements as we have collected from the Texas Telegraph and Register from the same time period.
During our Wednesday discussion, it was brought up that there are about 150 ads in the Texas Telegraph. With 244 ads in the Arkansas Gazette, we have two sources relatively similar in size that we can make comparisons between. However, it was also pointed out in class that there are also different kinds of ads in these sources. With the inclusion of notices by jailers or sheriffs, these sources are a mix of different aspects of runaway advertisements. This raises the analytical issue of whether these should be differentiated between. If the sheriff’s ads are considered distinctly different, then they should be filtered out through identification of keywords like “sheriff” in the text. However, if sufficiently similar content is available in both types of ads for our purposes (descriptive words, location information, whatever we’re mining for) then they could be allowed to remain in the data set, leaving us with a larger sample size.