Detailed information about the flow of potential customers in a city is extremely relevant for strategic decisions of various service providers such as taxi companies or advertising agencies. The knowledge about highly frequented regions as well as peak times in specific areas provides a crucial business advantage to competitors. Today, business relevant decisions about the positioning of service providers and advertising spaces or the balancing of capacity are primarily based on experience only. In this paper, we present a novel approach to gain knowledge about the distribution of potential customers over time and space based on the data of taxi rides, which have been recorded for documentation purposes. By leveraging the performance of in-memory databases, we build an applica- tion, which allows the user to analyze about 700 million taxi rides in real-time. The application allows companies to get an impression in which areas and in what timeframes they can reach a large audience of potential customers. Additionally, we demonstrate that the developed visualization concept enables the comparison of different regions and allows to analyze trends in the customer flow over time.
The biomedical scientific literature is a rich source of information not only in the English language, for which it is more abundant, but also in other languages, such as Portuguese, Spanish and French. We present the first freely available parallel corpus of scientific publications for the biomedical domain. Documents from the ”Biological Sciences” and ”Health Sciences” categories were retrieved from the Scielo database and parallel titles and abstracts are available for the following language pairs: Portuguese/English (about 86,000 documents in total), Spanish/English (about 95,000 documents) and French/English (about 2,000 documents). Additionally, monolingual data was also collected for all four languages. Sentences in the parallel corpus were automatically aligned and a manual analysis of 200 documents by native experts found that a minimum of 79% of sentences were correctly aligned in all language pairs. We demonstrate the utility of the corpus by running baseline machine translation experiments. We show that for all language pairs, a statistical machine translation system trained on the parallel corpora achieves performance that rivals or exceeds the state of the art in the biomedical domain. Furthermore, the corpora are currently being used in the biomedical task in the First Conference on Machine Translation (WMT’16).
