-
Notifications
You must be signed in to change notification settings - Fork 6
API
When you install Human Ecosystems, you don't only install a data capture, analysis and visualisation environment, you also install a system which makes available an API which provides the data you captured in open formats which can be integrated in other applications, services, research, art projects, to publish open data and more.
Human Ecosystems provides several API endpoints each with JSON output.
To access the API you call a specific URL, which is composed through the root directory of your install plus the API endpoint name.
For example, if I installed my Human Ecosystems on a webserver at IP address 192.19.1.12, at the directory 'HEv3', the API endpoint 'getRelations' would be invoked as:
http://192.19.1.12/HEv3/api/getrelations
Each endpoint has a series of parameters, which are passed on the URL string.
For example, one of such parameters is researches.
Data collection processes in Human Ecosystems are organised in researches. Each research is a collection of research elements each of which can focus on a series of keywords, hashtags, social networking users, Facebook pages etcetera.
Each research is on the Human Ecosystem database with a certain ID (identificator). For example, I might have researches with ID 35 and 36.
To call the previous API endpoint to get data coming from both researches, for example, I would do:
http://192.19.1.12/HEv3/api/getrelations?researches=35,36
This feature allows to extract data from one or more researches at a time. If you think about it it's very convenient, as one may setup a series of different researches, each from a different perspective, for example, and combine them in different ways to see what results come up.
Common parameters are also limit and language. Unless otherwise noted, all API endpoints accept limit and language parameters.
Limit, as the name implies, is used to limit the number of results returned. This is done because working with big datasets it can be convenient to establish boundaries about the volume of data which your application can handle. For example, in my API URL I could specify limit=20, to effectively limit the number of elements handled by the routine to 20.
20 of what? It depends. Depending on the API it can be content elements, relations, hashtags or some other thing. Just don't expect that if you say limit=20 you'll get 20 results back. The limit is the limit of the elements from which the data comes from: if the data you are trying to access are messages, then 20 of these messages will be analysed and if this analysis generates 216 hashtags you'll get 216 elements back in the output.
If we limit, which elements will be returned? The most recent, so that you can track changes. If you don't limit all available content will be returned (and prepare to receive a big JSON output, or even to crash your machine: it is always better to limit and to work on the database for large volume operations).
Language determines, if it makes sense, the language of the content you want to work with. If it's english specify en, if it's italian specify it, french fr and so on. You can find the full list of ISO Code Languages HERE.
This does not mean that you will only get (or work on) content in that language: language is evaluated in a best effort strategy.
Consider this message: ":) LOL BRB"
In which language is it? We can't know without a context. And these and other types of considerations happen all the time. Human Ecosystems makes multiple types of assumptions to gain understandings about the language used in the messages, but it's not always right.
And, in many cases, no language can be determined. In this case Human Ecosystems uses the XXX value: it means "no language can be determined". When you specify languages in your researches try to experiment both with and without language, to see what changes, and work on the conditions of your data capture process to ensure that you're capturing the right data.
Latutude and Longitude, provided in the fields (lat,lng). This is how Human Ecosystem handles geographic coordinates.
In some cases, geographic context cannot be determined. In these cases the value -999 is provided for both latitude and longitude. If you see this value you know it's not valid.
Human Ecosystems performs various forms of natural language analysis to classify emotions according to the Circumplex Model of Affect scheme. In this scheme, emotions can be classified according to a multidimensional value space with many dimensions/axes.
The most important dimensions are Pleasure/Comfort and Energy/Arousal. Each emotion can be interpreted as a couple of (Pleasure,Energy) values, which can be interpreted as a point in a bidimensional reference system.
With each content coming from Human Ecosystems you'll find such parameters, named comfort and energy. When you find them in your results, it means that that content was interpreted to represent that emotional state.
Areas in the Comfort/Energy plane can be associated to different emotions such as love, hate, confusion, anxiety, stress etcetera. You can find many sources of information about that.
- getRelations
- getWordNetwork
- getEmotions
- getTimeline
- getEmotionsTimeline
- getWordCloud
- getEnergyComfortDistribution
- getGeoPoints
- getGeoEmotionPoints
- getHashtagNetwork
- getHashtagCloud
- getSentiment
- getContentMatch
- getImages
- getEmotionalBoundariesSeries
- getMultipleMentionsSeries
- getEmotionallyWeightedKeywordSeries
- getMultipleKeywordsTimeline
- getDesireTimeline
- getStatistics
- getSentimentSeries
- getEmotionsSeries
- getActivity
- getTopUsers
- getKeywordSeries
- getMessagesForTagAndDate
- getTopicTimeSeries
- getPostsPerUserID
- getTopSubjects
- getMultipleSubjects
- getSubjectsForGroups
- getMultipleKeywordStatistics
- getStatisticsOnResearches
- getSingleHashtagStatistics
- getSingleHashtagNetwork
- getMessagesFromTimeAgo
- getLanguageStatistics
- getTagsFromToDate
Format:
api/getRelations?researches=[comma separated value researches list]&limit=[number of relations]&sensibility=[min weight]
Get the relations defined among the subjects of the collected data. It's a nodes/links graph, provided in JSON format.
The sensibility parameter gets only the relations with the specified minimum weight, to limit the number of returned nodes (for example useful in visualizations).
Each node has [id,nick,pu] elements, mapping respectively the subject id, the nickname and the profile url (pu, as in profile url). Each link has [id,source,target,weight] elements, mapping respectively the id of the relation, the source and target nicknames and the weight of the relation.
Format:
api/getWordNetwork?researches=[comma separated value researches list]&limit=[number of messages]
Get the relations defined among the words of the selected researches. It's a nodes/links graph, provided in JSON format.
Each node has [id,word,weight,comfort,energy] elements, mapping respectively the word id, the word and the weight in the context of the relation, and the comfort and energy of the word in this context. Each link has [id,source,target,sourceid,targetid,weight] elements, mapping respectively the id of the relation, the source and target words in text and id format and the weight of the relation.
Format:
api/getEmotions?researches=[comma separated value researches list]
Get the number of emotional expressions found in the selected research, divided per single emotion.
Returns an array of JSON objects each with [label,value]: the label of the emotion (anxiety, anger, confidence...) and the number of times it appeared in the research.
Format:
api/getTimeline?researches=[comma separated value researches list]
Get the time series of how many contents have been expressed each day.The dates are provided in ascending order, but may not be consecutive: if a certain day no content was produced the relative date will not appear.
Returns an array of JSON objects each with [date,close], the date (expressed in dd-mm-yyyy format) and the number contents expressed on that day.
Format:
api/getEmotionsTimeline?researches=[comma separated value researches list]&limit=[number of messages]
Gets the time series of how many contents have been expressed each day, classified per emotion. The same considerations of getTimeline apply.
Returns an array of JSON objects of the type:
"emotion_name": [
{
"date": "dd-mm-yyyy",
"close": number
},
... other objects...
]
In which emotion_name is the label for the emotion and the [date,close] object is the same as in getTimeline.
Format:
api/getWordCloud?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO code of language]
Get the word recurrencies for the selected researches. Limit operates on the number of messages: saying limit=20 means "get word recurrencies for the latest 20 captured messages.
Returns a JSON array of [name,value], in which name is the word and value is the number of times it appears in the research.
Format:
api/getEnergyComfortDistribution?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO code of language]
Get the Energy/Comfort distribution for the selected researches, according to the classification using the Circumplex Model of Affect. Great for scatter plots, for example.
Returns an array of JSON objects of the type [comfort,energy,c] in which comfort and energy are th distinct values found in the researches and c is the number of times they appear in them.
For example you may use these values to create a scatterplot in which you use (comfort,energy) as coordinates and c to control the size the circles or to change their color.
Format:
api/getGeoPoints?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO code of language]&search=[filter by LIKE string, without wildcards, a %STRING% wild is added]
Get the geographical distribution of the content in the researches for which it has been possible to obtain the geographical context, under the form of weighted coordinate points.
If provided, limit and language select, respectively, the number of latest georeferenced messages on which to operate and their language. Saying "&limit=20&language=it" means "get the most recent georeferenced messages in italian".
Returns an array of JSON objects of the form [lat,lng,c] where (lat,lng) are the coordinates of a point in WGS84 coordinate system and c is the number of contents produced in that place.
Format:
api/getGeoEmotionPoints?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO code of language]&search=[filter by LIKE string, without wildcards, a %STRING% wild is added]
The same as in getGeoPoints, but for emotions. Returns the geographic distribution of the researches, classified per emotion.
Returns an array of JSON objects of the form: "emotion_name": [ { "c": number, "lat": latitude, "lng": longitude }, ... other objects... ]
Format:
api/getHashtagNetwork?researches=[comma separated value researches list]&limit=[number of messages]
Gets the hashtag network for the researches, showing which hashtags appear in it and how they are related with each other.
As all the other network graph related endpoints, it returns a [node,links] network description in JSON format in which a top-level node property of the returned results object describes the nodes of the network (the hashtags, in this case), and the links property describes the relations among them.
The returned JSON format is as follows:
{
"nodes": [
{
"id": number,
"label": "name of the node",
"weight": number
},
...other nodes...
],
"links": [
{
"source": "name of node A",
"target": "name of node B",
"weight": number
},
]
}
Format:
api/getHashtagCloud?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO Code of language]&minComfort=[number]&maxComfort=[number]&minEnergy=[number]&maxEnergy=[number]
As per getWordCloud, but for hashtags. Returns the recurrencies of hashtags in the selected researches.
The {minComfort, maxComfort, minEnergy, maxEnergy} parameters are optional, but if they are present all of them must be specified: they help to establish the emotional boundaries for the hashtags that we want to extract.
Returns a JSON array of [name,value], in which name is the hashtag and value is the number of times it appears in the research.
Format:
api/getSentiment?researches=[comma separated value researches list]&limit=[number of messages]&language=[ISO Code of language]
The different emotional expressions can be classified and aggregated in order to describe the Sentiment of a certain message. Roughly, we can collect the more positive emotional expressions, the more negative ones, and those which are tendentially neutral, count them, and have a measure of how positive/negative/neutral the expressed mood is for a certain research.
This endpoint returns a single JSON object with [positive,negative,neutral] properties, each containing how many messages of that type appear in the selected researches.
Format:
api/getContentMatch?researches=[comma separated value researches list]&limit=[number of results]&language=[ISO Code of language]&q=[query]
It allows searching for content within your researches.
The _q parameter allows to specify a word pattern, such as in the SQL Like operator. For example, you can specify " ?utomob%" for the q parameter, to search for something which starts with a space, then has a single character (the ?) and then has the string "utomob" and then a series of one or more other characters. This, for example, will match " automobile", " automobiles", but not "smart-automobile" (because there is not a space before the "a").
If no "?" and "%" signs are included in the q parameter, the content is matched as the beginning of a word or sentence, meaning the conditions:
% query%
query%
% ?query%
The endpoint returns the array of the matches under the form of JSON object with format [link,content,created_at,lat,lng,comfort,energy], in which link is the link to the content, content is its text, created_at is the creation date in standard Unix format, (lat,lng) are the geographic coordinates where they are available (and (-999,-999) otherwise), and (comfort,energy) are the emotional parameters of the content.
Format:
api/getImages?researches=[comma separated value researches list]&limit=[number of results]
Gets the images related to your research.
Format:
api/getEmotionalBoundariesSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&language=[ISO language code]&emotion-condition=[condition on emotions]
Gets a time series for the specified emotional condition. Emotional condition is specified using the Energy and Comfort values, for example:
energy>20 AND comfort<0
would represent a condition to use only the content which satisfies those values for emotions. The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getMultipleMentionsSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&mentions=[comma separated list of mentions]&weightwith=[field to use to weight the mentions]&language=[ISO code for language]
Gets a time series for the specified mentions (terms or accounts). Mentions specified using a comma separated list of the accounts (with a '@' before them) or for a term. The 'weightwith' parameter is optional and allows specifying a numeric field to be used to weight the mentions (for example you might want to use number of retweets to weight them, so that you can get a sense of how relevant it has been). The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getEmotionallyWeightedKeywordSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&keyword=[single keyword specification]&language=[ISO code for language]
Gets a time series for the specified mention (of a single keyword), weighted (multiplied) for its comfort and energy (so that you get, for example, negative values for negative mentions). The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getMultipleKeywordsTimeline?researches=[comma separated value researches list]&mode=[day|week|month|all]&keywords=[comma separated list of keywords]
Gets a time series for the specified mention of multiple keywords (specified in the keywords parameter). Multiple values are used in OR (so keyword1 OR keyword2 OR ... keywordN must be present). The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getDesireTimeline?researches=[comma separated value researches list]&mode=[day|week|month|all]
Gets a time series for those contents emotionally expressing a sense of desire (a combination of energy/comfort values which express desire). The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getStatistics?researches=[comma separated value researches list]&mode=[day|week|month|all]
Gets simple statistics for the researches (number of contents and number of users collected through the researches). The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getSentimentSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&sentiment=[positive|neutral|negative]
Gets a time series for those contents which express a certain sentiment specified by the 'sentiment' parameter, with values 'positive', 'neutral' or 'negative'. The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getEmotionsSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&emotion=[one of the emotions defined in the Emotions database table]
Gets a time series for those contents which express a certain emotion, which can be specified using one of the labels found in the Emotions table of the database. The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getActivity?researches=[comma separated value researches list]&mode=[day|week|month|all]
Gets a distribution of the number of contents generated across the 24h of the day in the specified period. Helps to visualize when users are active on a certain topic. The mode parameter indicates whether you want to receive the distribution for the last day, week, month or the complete one.
Format:
api/getTopUsers?researches=[comma separated value researches list]&mode=[day|week|month|all]&language=[ISO code of language]
Gets a list of the top users active in the specified time frame. The 'language' parameter is optional, and allows receiving only the users for which that language has been identified. The mode parameter indicates whether you want to receive the list for the last day, week, month or the complete one.
Format:
api/getKeywordSeries?researches=[comma separated value researches list]&mode=[day|week|month|all]&keyword=[one keyword]&language=[ISO code of language]
Gets a time series for the recurrency of a certain keyword. The 'language' optional parameter allows to specify whether you want to receive the series only for a certain language. The mode parameter indicates whether you want to receive the time series for the last day, week, month or the complete one.
Format:
api/getMessagesForTagAndDate?researches=[comma separated value researches list]&day=[day]&month=[month]&year=[year]&entity=[hashtag]
Gets the messages in a certain research or set of researches that contain the indicated hashtag and that were created on the specified date.
Returns a JSON array of message objects, composed from the fields: link, content, comfort, energy
Format:
api/getTopicTimeSeries?researches=[comma separated value researches list]>=[number]&mode=[day|week|month|all]
Gets a time series of the topics which were generated for a certain research. This would mean the list of topics and the quantities of messages for each day which were exchanged about them in the list of researches.
The optional parameter gt indicates a sensibility, through a number, meaning that a certain topic for a certain day would not be returned unless it had at least number messages for it in that day. Useful for simplifying complex visualizations by only including the most popular topics.
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array of objects containing the topic (entity field), the number of messages (value field) and the date (the date field).
Format:
api/getPostsPerUserID?researches=[comma separated value researches list]&subject_id=[number]&mode=[day|week|month|all]
Gets the posts for a certain user ID (subject_id) in the selected research.
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array of objects containing the content of the message (text field) and the date (the created_at field).
Format:
api/getTopSubjects?researches=[comma separated value researches list]&mode=[day|week|month|all]
Gets the top users for a certain research, ordered by their number of followers.
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array of objects containing the the subject_id, the name, the nick name, the followers count, the friends count, the purl profile url, the imageurl for the profile image, the nposts for the number of posts, the favorites of the number of likes/favorites received, the shares received.
Format:
api/getMultipleSubjects?researches=[comma separated value researches list]&search=[comma separated list of search terms]&mentions=[comma separated list of mentions]&groups=[JSON array]&mode=[day|week|month|all]
Performs multiple searches and aggregations. This is meant to provide a single entrypoint to follow the social dynamics of different subjects (for example for brand comparison). The idea is to pass in Input to this API endpoint all that is required to create a clusterization of the research.
The search includes a comma separated value list of search terms to use as a filter for the output (include if content contains these terms).
The mentions include a comma separated value list of mentions (for example @nickname) to use as a filter for the output (include if content mentions).
The groups parameter is a JSON array which includes indications on how to do the clustering. It is a JSON array of JSON Objects composed of the fields:
- name, indicating the name of the cluster
- slug, indicating the slug of the cluster
- search, indicating a comma separated list of searches that describe the cluster (if the content includes, it is in the cluster)
the searches can include mentions (with the @), hashtags (with #), or wildcards (for example including % and ?, like in SQL).
For example, if I was to create clusters for music genres I could have the following groups array:
var groups = [
{
name: "Rock",
slug: "Rock",
search: "%rock%",
hue: 0.1
},
{
name: "Blues",
slug: "Blues",
search: "%blues%",
hue: 0.2
},
....
];
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array. The array is associative and has the following keys:
- emotions: including an array of {comfort, energy} objects, one for each content which passed the filter
- groups: the resulting clusters, including the same input JSON array and an additional n field of the number of posts
- daysofweek: an array of days with keys "Mon" through "Sat" including the number of posts each day for the results of the filter
- stats: including an object of statistics with the keys comfort-avg with the average comfort of the posts, energy-avg with an energy average of the posts, mentions with the total number of mentions, and engagements with the total number of engagements, as a result of favourites and shares.
Format:
api/getSubjectsForGroups?researches=[comma separated value researches list]&groups=[JSON array]&mode=[day|week|month|all]
The idea is that this endpoint checks out which subjects in the research are simultaneously interested in something.
The groups parameter is a JSON array which includes indications on how to do the clustering. It is a JSON array of JSON Objects composed of the fields:
- name, indicating the name of the cluster
- slug, indicating the slug of the cluster
- search, indicating a comma separated list of searches that describe the cluster (if the content includes, it is in the cluster)
the searches can include mentions (with the @), hashtags (with #), or wildcards (for example including % and ?, like in SQL).
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array of objects. each object has the following keys
- s1id and s2id the IDs of two users
- then as many keys as there are __name__s in the groups array, one per name indicating how many times the two subjects have related on each topic
Format:
api/getMultipleKeywordStatistics?researches=[comma separated value researches list]&keywords=[comma separated list of keywords]
Get statistics on multiple keywords at the same time, indicated in the keywords parameter
The results come under the form of a JSON array of objects. each object has the following keys
- c is the count for each keyword
- comfort is the average comfort
- energy is the average energy
Format:
api/getMultipleKeywordStatistics?researches=[comma separated value researches list]
Get statistics on multiple researches at the same time
The results come under the form of a JSON array of objects. each object has the following keys
- research_element_id is the id of the research element
- research_id is the id of the research
- research_name is the name of the research
- research_element is the content of the research element
- research_element_type is the type of research
- number is the number of contents
- num_subjects is the number of subjects
- from_date is the minimum date
- to_date is the maximum date
- comfort is the average comfort
- energy is the average energy
Format:
api/getSingleHashtagStatistics?researches=[comma separated value researches list]&topic=[hashtag]&mode=[day|week|month|all]
Get statistics on a single hashtag, indicated in topic.
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON array of objects. each object has the following keys
- Number is the number of contents
- Comfort is the average comfort
- Energy is the average energy
- Language is the language (there may be multiple entries, each on a different language)
Format:
api/getSingleHashtagNetwork?researches=[comma separated value researches list]&topic=[hashtag]&mode=[day|week|month|all]
Get the network (with other hashtags) on a single hashtag, indicated in topic.
The optional parameter mode acts as in the previous endpoints, focalizing on messages generated in the last day, week, month or from the beginning.
The results come under the form of a JSON objects. The object contains the nodes and links JSON Arrays.
- the nodes array includes objects with fields id, label, weight
- the links array includes objects with fields source, target, weight of the links
Format:
api/getMessagesFromTimeAgo?researches=[comma separated value researches list]&number=[number]&unit=[second|minute|hour|day|week|month|year]
Get the contents for the list of researches, from earlier than number unit ago.
The results come under the form of an array of JSON objects. The objects contain the following fields for each content:
- id
- link
- content
- created_at
- lat
- lng
- comfort
- energy
Format:
api/getLanguageStatistics?researches=[comma separated value researches list]&unit=[second|minute|hour|day|week|month|year]
Get statistics of how languages are used in the selected researches, from earlier than 1 unit ago.
The results come under the form of an array of JSON objects. The objects contain the following fields for each content:
- language (2 letter ISO code)
- n
Format:
api/getTagsFromToDate?researches=[comma separated value researches list]&fromdate=[date and time in PHP format Y-m-d H:i:s]&todate=[date and time in PHP format Y-m-d H:i:s]
Get the hashtags from the specified timeframe (>=fromdate , <todate)
The results come under the form of a JSON object with a 'children' attribute containing an array of JSON objects. The objects contain the following fields for each content:
- label the hashtag
- c how many times it appears
- energy average energy
- comfort average comfort