EXPLORING LITERARY LANDSCAPES : FROM TEXTS TO SPATIOTEMPORAL ANALYSIS THROUGH COLLABORATIVE WORK AND GIS

This article argues that the study of literary representations of landscapes can be aided and enriched by the application of digital geographic technologies. As an example, the article focuses on the methods and preliminary findings of LITESCAPE.PT—Atlas of Literary Landscapes of Mainland Portugal, an on-going project that aims to study literary representations of mainland Portugal and to explore their connections with social and environmental realities both in the past and in the present. LITESCAPE.PT integrates traditional reading practices and ‘distant reading’ approaches, along with collaborative work, relational databases, and geographic information systems (GIS) in order to classify and analyse excerpts from 350 works of Portuguese literature according to a set of ecological, socioeconomic, temporal and cultural themes. As we argue herein this combination of qualitative and quantitative methods—itself a response to the difficulty of obtaining external funding—can lead to (a) increased productivity, (b) the pursuit of new research goals, and (c) the creation of new knowledge about natural and cultural history. As proof of concept, the article presents two initial outcomes of the LITESCAPE.PT project: a case study documenting the evolving literary geography of Lisbon and a case study exploring the representation of wolves in Portuguese literature.

Exploring Literary Landscapes our aim in creating this corpus was not to explain the meaning and role of specific places used by writers in their creative activity (a recurrent subject within literary scholarship), but rather to use a large collection of texts to examine how literary representations of Portugal's landscapes have changed over time.In this way, our research can be viewed as embedded both within the specific framework of 'macroanalytic' literary history (as advocated by Matthew Jockers) 7 and within the wider framework of geo-criticism, 8 as well as 'literary cartography', 9 'literary geography', 10 and 'literary GIS'. 11In addition, as will be clarified below, our approach also takes a theoretical orientation from ecological criticism 12 in that it draws on literature as a resource for studying environmental history.

quantitative and qualitative analyses
Quantitative digital methods are useful for working with large amounts of data.They allow the researcher not only to observe and compare patterns, but also to define goals and to test hypotheses. 13Such methods are, of course, most commonly associated with research in the social sciences.However, they have more recently begun to be championed by scholars in the humanities.Franco Moretti, for one, has advocated that literary historians should move away from the 'close reading' of individual texts and instead engage in the 'distant reading' of large text corpora. 14According to Moretti, one of the main limitations of close reading is that it tends to create blind spots in literary history.This is because the reading of individual texts often leads scholars to focus solely on major works and, accordingly, to ignore lesser-known texts.In response to this problem, Moretti has proposed distant reading as a more scientific and operational method for literary studies. 15he use of such scientific approaches in the humanities can be seem to reflect a desire to follow the example of disciplines, such as physics and evolutionary biology, in using large amounts of data to guide critical inquiry.Not without causing controversy, this 'identification with new scientific methods gives the impression of a revolutionary new style of research emerging in the humanities', as Paul Gooding, Melissa Terras, and Clair Warwick have put it. 16Beyond the understandable refusal to change scholarly paradigms and procedures, criticism of the application of scientific methods in the humanities is also based on the problems raised by the manipulation of large volumes of digital data, including both technical issues (such as the mixed quality of optical-characterrecognition digitisation) and epistemological concerns (such as bias generated by the automated creation of textual metadata).
Largely on account of this, the current trend in the humanities is to strive for a better integration of both quantitative and qualitative approaches. 17Underlying this trend is a desire to make more efficient use of large volumes of texts that

Daniel Alves and Ana Isabel Queiroz
remain the fundamental sources for researchers in this field.Mass digitization efforts and the democratization of the World Wide Web have made digital texts more accessible than ever before.The tools needed to extract data and to perform spatial analyses of it are now also increasingly available through the capabilities of database management and GIS platforms, which are enabling researchers around the globe to generate new pathways for research and even, in some cases, to address old questions in new ways.
With the development of GIS technology many areas of knowledge, the humanities included, have sought to use and to incorporate new methodologies.In the discipline of history, for example, the main methodological innovation has been the incorporation of time as a variable in historical geographic information systems (HGIS) research.As has elsewhere been shown, including temporal data enables researchers to join attributes commonly evaluated using GIS such as location, extent, and volume. 18Researchers have also sought to consolidate temporal components by spatialising historical information and by analysing the evolution of geo-referenced datasets.Some of the more recent developments along these lines have focused on the exploration of literary texts and the construction of spatial narratives. 19s yet, however, the use of GIS tools is not sufficiently widespread in the humanities.As a result, their potential is not fully recognised.There are many plausible explanations for this, not least the difficulties involved in planning, creating, and managing a GIS project.Moreover, the application of GIS in the humanities is often less than straightforward.GIS were designed to handle large volumes of quantitative data and, in many cases, this not the type of data that scholars in the humanities use.When joining information and geography to temporal attributes, for example, relations tend to multiply and generate datasets that contain more exceptions than rules, more arbitrariness than standards. 20everal models have been proposed to resolve these problems and to provide GIS with a better way of working with texts.Among the more recent is the proposal to integrate GIS with database management systems (DMS).This integration has only recently become possible, when it materialized Stephen Ramsay's 2004 thinking on the evolution of relational database design and its future impact in research.As Ramsay explains a 'database . . .can be set up in such a way as to allow multiple users access', and data entry 'from a number of different sources'. 21In this way, he concludes, 'the logical statements that would flow from that ontology would necessarily exceed the knowledge of any one individual.The power of relational databases to enable the serendipitous apprehension of relationships would be that much more increased'. 22Alongside these innovations, the recent advent of 'cloud' databases offers increasing functionalities for structuring, searching, and accessing information, allowing scholars greater freedom to engage in collaborative research.Embracing the idea that writers are also mapmakers, 23 the project places the mapping the literary texts at its core.The identification of the geographical references within the corpus is therefore key to our research.In order to facilitate the identification of these references, each literary representation of mainland Portugal in our corpus has been digitized and then registered in a shared database as a discrete literary excerpt.These excerpts are passages that can be read and understood independently and that, moreover, give a clear sense of the aesthetic aspects of the works from which they derive.Once identified and extracted, these excerpts were classified into categories (to indicate whether they were concerned with geographic, ecological, socioeconomic, cultural, and/or temporal issues) and then geo-referenced.Ultimately, this will enable us to depict the spatial and thematic information the excerpts contain on an interactive map, called the 'Atlas of Literary Landscapes', that will serve as a source for the development of further interdisciplinary research. 24s the foregoing account of the project suggests, the LITESCAPE.PT project uses a hybrid methodology.Specifically, it combines traditional closereading methods with 'distant reading', collaborative work, a shared PostgreSQL database, GIS tools, and quantitative methods.The project, at this level, has similarities with other digital literary mapping project around the globe. 25But in addition to being unique in the field of Portuguese literary studies, it also has some other special defining features.For instance, it is as a large, trans-historical corpus that not only includes contemporary works, but also writings from the Romantic era, when the appreciate of landscape as an aesthetic apprehension of nature came to the fore.LITESCAPE.PT, moreover, focuses not on a particular region or location, but instead embraces the entire national mainland territory of Portugal with all its natural and cultural diversity.The project, furthermore, focuses on landscape changes by using a set of analytical descriptors relating to five main categories.These descriptors included relief forms, land use, natural heritage, cultural heritage, and human activities.

Daniel Alves and Ana Isabel Queiroz
Classifying the excerpts in this way, and compiling a list of metadata to facilitate searching through and analysing them, is a time consuming task, especially given that the project does not, as yet, have a fully funded research team.In order to address this, we invited fellow academics and graduate students in literary studies, as well as school teachers of Portuguese language, geography, history and science, to assist us in collecting, recording, and classifying the texts.We used a standardised reading protocol (introduced through a short training session) to ensure that each of our participants (hereafter called 'readers') followed the same procedures and offered them continuous supervision and support.
The database was made accessible to the readers through ODBC (Open Database Connectivity).This allowed for the database to be shared and for the information recorded by every reader becomes immediately available to the group.It also allowed readers who wished to find out more about a specific topic to explore the entire corpus.The subjects explored in these texts can cover a wide range of topics, including (1) the identification of fictional and nonfictional place names and their relation to human occupation in the territory; (2) the characterisation of land uses; (3) the exploitation of natural resources; (4) landscape processes associated with human activities; (5) landscape changes observed over suitably organised time periods; and (6) the identification of plant and animal species mentioned in the literary scenarios.The goal, in this sense, is not to engage in a philological approach to a small set of literary works, but instead to provide for a thorough analysis of texts to be carried out by the team members on a wide range of literary works.This approach has the virtue of retaining the advantages of the traditional reading methods while, in the process, overcoming some of the pitfalls related to other 'distant reading' approaches, such as the need to disambiguate place names, proper nouns and other errors that normally emerge from an automated computational process of text mining. 26ecognising that digital literary mapping is potentially an extremely broad field of research, LITESCAPE.PT defines its parameters by establishing the identification of a geographical unit as a minimum criterion for selecting and registering the literary excerpts.Three inclusive administrative divisions were considered.The larger of these (the so-called NUTS 3) 27 is a cluster of twentyeight municipalities in mainland Portugal.Whenever possible, municipalities or civil parishes have also been identified.In some cases (for instance in urban centres or in descriptive literary works) a precise location was registered by combining places mentioned in the texts with latitudinal and longitudinal coordinates extracted from Google Maps or other gazetteers. 28The resulting information can be read into a GIS application in order to analyse the different excerpts according to the five thematic categories mentioned above.The GIS can also facilitate spatiotemporal analysis of the excerpts and allow the research to integrate (and draw comparisons with) data from other sources.In this way, One methodological challenge facing the project at present is the difficulty of using more advanced computational-linguistic techniques for exploring the Portuguese texts.Computational linguistic software is mainly available for English.Although a research team is building a version for Portuguese, to date, only a very small sample of the Portuguese literature has been scanned and digitized. 29Accordingly, it is difficult to take advantage of the automated text extraction tools that other mapping projects employ.Ultimately, the project aims to design a methodology that could overcome this and to extract from the literary texts their 'absolute singularity, but with potential links to broader phenomena; [their] irreducible difference, but with similarities that may nonetheless be discerned'. 30Keeping up with all those links and similarities, the database will preserve the original text for subsequent interpretation while at the same time enabling a quantitative and spatial approach.In this process a literary excerpt become part of the relevant material for analysis with a set of structured metadata.Relevant links, trends and patterns are then detectable between many excerpts, even from different works and writers.In this sense 'the corpus as entity shifts meaning away from the text and towards the network'. 31

landscape changes in portuguese literature
The results of the collaborative work can be summarized in a few numbers gathered from the LITESCAPE.PT database.On July 2014, the database comprised 172 authors (mainly Portuguese), 350 literary works (published between 1843 and 2014), 6,082 literary excerpts, and almost 1,400,000 words.All the excerpts have one or more mandatory geographical descriptors.In addition to the twenty-eight NUTS 3 municipalities, readers associated the literary excerpts with more than 2,500 locations.77.3% of these locations were assigned exact geographical coordinates.The rest were either found to be fictional places, locations that no longer exist, or that are still in the process of being identified.Of all the excerpts, 87.4% were found to have at least one thematic descriptor.On average, readers classified each literary excerpt within two categories and around five thematic descriptors.There are therefore more than 4,000 thematic descriptors in the database, all organised into the five aforementioned categories and twenty-seven subcategories.
Up to now, the project has involved thirty-six readers.Their overall contribution is displayed in Figure 1.Two of these readers have been responsible for over 40% of all the work in the database.The work of the five most active readers (all of whom have contributed more than 500 excerpts) accounts for more than two-thirds of the total.Most of the readers recorded excerpts from only one to three books, whereas only four failed to contribute a complete literary work.An overall identical level of participation was observed in similar crowdsourcing projects that use multiple collaborators to foster digitization, despite their focus on different sources and different research questions. 32n overview of the entire corpus showed that landscape descriptions are spread across mainland Portugal, although some territorial units showed a higher concentration (Figure 2).Lisbon and its surroundings stood out with a maximum of 3,132 excerpts, followed by the Douro region with 1,116 excerpts.This different distribution results from the literary production itself (which privileges some regions) and the readers' interest in certain regions, writers, or subjects.Understanding these two aspects helps avoid a biased conclusion about the distribution of literary landscapes, their meanings and scope throughout the Portuguese literature.

Daniel Alves and Ana Isabel Queiroz
Lisbon has been widely portrayed in art and literature and was even considered one of the three world literary cities, along with Rome and Constantinople. 33Additionally, it was the scene of major urban, political, and cultural transformations, which is a relevant research topic in the context of the project.In addition to the beauty and cultural appeal of the literary landscapes of the Douro region depicted in many texts, including those of Aquilino Ribeiro and Miguel Torga, who were born in the Douro and portray the region in their works.This spatiotemporal reading, which has recently come to the attention of several researchers, 34 was until recently commonly overlooked in Portugal.
As for our project, several specific research projects have profited from the material stored in the database. 35Since it is not been feasible to document all these projects in this article, we have decided to focus on two exemplary case studies: one concerning literary representations of Lisbon and one concerning literary representations of wolves.The spatiotemporal analysis applied in both studies is representative of the potentials of LITESCAPE.PT project. 36is case study reveals the benefits of integrating methods from different fields.It assesses the development of the literary space of Lisbon over time and it discusses how literary representations of the city resonate with Lisbon's evolving identity as a cultural and social space.Bringing together 35 novels published from the mid-nineteenth century onwards, the study relied on an interdisciplinary approach to identify and to present literary geographical patterns and to combine these with other sources of information about Lisbon.The literary space of the city, which can be geo-referenced and drawn on a map, was assumed to be defined by the period setting or as evoked by the characters. 37All of the literary works analysed were chosen because they have Lisbon as the central stage in the narrative, and also according to the clearly identified historical period in which they were either written or published. 38onverting literary locations from points to spots, through density analysis, and then to polygons; developing the concepts of literary space cumulative literary space and common literary space; and using methods borrowed from animal ecology: taking each of these steps made it possible to introduce size calculations and to build on the findings of other researchers. 39From a comparative or evolutionary perspective, the polygon shown in Figure 3 enable sequential and overlapping visualizations, which in turn facilitate comparisons with demographic data from different sources that were spatially referenced using the same framework.

Lisbon as a literary space
The results of the study suggested that the literary space did not match the urban space and, furthermore, that it commonly took thirty to forty years for Lisbon's literary geography to catch up with the city's expanding urban landscape.Accordingly, whereas the old commercial and political centre of the city, persists as its literary space in all the novels analysed, Lisbon's peripheries are either absent or underrepresented until 1974.This occurs in spite of the fact that many of these peripheral areas were included in Lisbon's administrative limits as far back 1886 and were intensively urbanized during the 1950s and 1960s.In this way, the study showed that the mapping of an enlarged literary corpus, collected collaboratively and analysed by a combination of qualitative and quantitative approaches (and through a combination of traditional and digital techniques) can produce new insights the cultural evolution of urban landscapes.
The methodology employed in this study is fully replicable; it could be applied to another literary corpus to study other cities in other times and to investigate the relationship between real and imagined geographies.Accordingly, it has the potential to lead to a better understanding of the process of 'how city boundaries  shaped visions of the urban space as it was lived and experienced', and how literature can elucidate about 'the history of space becoming place'. 402 Representations of wolves in Portuguese literature 41 In this case study, the researchers created a lupine corpus containing literary representation of mainland Portugal that contained representations of wolves.This 'lupine corpus' was then augmented by the work of seven other readers, who classified other works that had not been identified in the first stage.This common effort led to the creation of a corpus of 262 excerpts from 68 literary works by 29 writers, published between 1875 and 2010. A tent analysis was performed using a grid with several categories.These categories encompassed the various forms that the relationship between humans and nature can take.All literary representations were spatially referenced to one or more NUTS 3 municipalities and they were also associated with three time periods that applied to the first publication date and the time setting of the narrative.In order to study the literary representation of wolves, these timestamps were then compared with other three time periods extrapolated from the historical knowledge about the trends of the Iberian wolf's range across Portugal and its different conservation statuses (Figure 4).42 January 2, 2015 Time: 03:36pm ijhac.2015.0138.tex

Exploring Literary Landscapes
Quantitative analysis revealed that although wolves have been represented in literature since the late nineteenth century, the proportion of representations was not independent of the time period of publication.Notably, a strong decline occurred in the works published after 1980.Literary representations of wolves were found to be combined with a variety of topics, approaches, and perspectives, although they were generally found to be less rich and less diverse in terms of their composition over time.The results also suggest the literary representation of wolves is not homogeneously throughout mainland Portugal, and that the geographic distribution of these representation more-or-less matches that of the Iberian wolf's range and distribution over time.
The approach followed here was enhanced by teamwork, which facilitated the shared analytical effort of classifying and organising the contents of the database concerned with humans' relationships with wolves.By using quantitative and digital methods (namely, mapping with GIS) this explanatory analysis was able to highlight the structure and composition of literary representations of wolves across time and space.From an eco-critical perspective, this approach can be seen as 'an example of the advantages of researching into an enlarged sample of literary texts, producing accurate and comparable results and discussing them using current ecological knowledge'. 43

bridging the divide
The advance of digital methods is a challenge both for those who produce tools and use them in digital humanities.Despite technological advances, difficulties persist for their application within the humanities.These difficulties can no longer be viewed exclusively as a consequence of the refusal to embrace new technologies, as was the case some years ago. 44Instead, the must be seem to arise because most of the sources and methodologies used by the humanists are hard to fit into the structured data embedded in the operation model of databases and GIS.This gap results, in part, from the fact that narrative texts are the main sources and outputs of literary or historical analysis, and these cryptic or nuanced texts may be difficult to read and analyse with digital tools, largely because of their inherent standardisation, where everything has to fit into pre-formatted 'boxes'.But even if the digital approaches may present some pitfalls (supposedly reductionist features that some authors associate with the use of these tools in humanities research) 45 researchers should not refrain from using them.
In this context, landscapes representations in literature are challenging objects of study, not only because of their inner complexity, but also because of their continuous changes over time and space.If an appropriate procedure is applied, it necessarily results in converting texts to comparable and measurable features through digital methods and technologies.As we have tried to show above, Time: 03:36pm ijhac.2015.0138.tex

Daniel Alves and Ana Isabel Queiroz
these results are becoming available and the effort made during the compilation process will probably be fully rewarded.
From an extensive archive of literary representations of landscapes, a varied range of inquiries with ambitious goals may be pursued.The studies of the literary space of Lisbon and of literary representations of wolves demonstrate but do not exhaust the full potential of information we have compiled.The main value of these studies results in combining the traditional academic reading (1 st stage identification and selection) with 'distant reading' strategies (2 nd stage, analysis and outcomes) with the focus given to certain aspects of the text.These studies relied on a collaborative approach that improved the chances of analysing, on solid ground, the topics depicted in the literary texts as well as increasing scientific productivity.Relying on a single reader would be unfeasible for exploring an enlarged literary corpus such as LITESCAPE.PT, and might cause one either to overlook influential, contemporary works or even to ignore 'forgotten titles'. 46Furthermore, collaborative work helps one deal with large amounts of information and to overcome a lack of time and funding. 47rom this perspective, participatory and collaborative research cane be seen to have far more benefits than drawbacks for the digital humanities, since it allows scholars to engage with more information and promotes interdisciplinary research.There are, of course, concerns about projects based on an 'imperialistic division of labour among scholars'. 48But, as LITESCAPE.PT affirms, one can pursue collaborative work in a way that shares knowledge equally among participants and, in the process, addresses 'the common problem by giving other voices a chance to speak'. 49cademia would benefit from more research projects working from texts to spatiotemporal analysis using collaborative work and GIS.Technology-mediated approaches, such as the use of digital materials, methods, and perspectives, can become a fruitful trend.The methods and outcomes presented in this article contribute to that trend by fostering new insights and by modelling new interdisciplinary practices for bridging the divide between the sciences and the humanities.
end notes

Figure 1 .
Figure 1.Distribution of the excerpts registered in the LITESCAPE.PT database per readers.(Readers who recorded more than 50 registers are identified by their initials).