Indexing Tweets with Logstash
Fun
Sentiment Analysis
- What do people think about a brand?
- How do people like my new product?
- How effective is my current ad?
Twitter-River
Rivers are deprecated
- Cluster stability
- Scalability
- Fault tolerance
Logstash to the rescue
Logstash for logfiles
Logstash for Tweets
Input
input {
twitter {
consumer_key => "..."
consumer_secret => "..."
oauth_token => "..."
oauth_token_secret => "..."
keywords => [ "logstash", "elasticsearch" ]
full_tweet => true
}
}
Output
output {
stdout {
codec => rubydebug
}
elasticsearch {
protocol => "http"
host => "localhost"
index => "twitter"
document_type => "tweet"
}
}
Tweets are arriving
"created_at": "Wed Aug 26 11:45:59 +0000 2015",
"id": 636504862134521900,
"text": "Looking forward to be at #elasticsearch FFM this evening. I'll be giving a short talk on how to index tweets with #logstash",
[...]
"user": {
"id": 313122677,
"name": "Florian Hopf",
"screen_name": "fhopf",
"location": "Karlsruhe",
[...]
Aggregate on username?
curl -XPOST "http://localhost:9200/twitter/_search" -d'
{
"aggs": {
"users": {
"terms": {
"field": "user.name"
}
}
}
}'
Aggregate on username?
"buckets": [
{
"key": "florian",
"doc_count": 1
},
{
"key": "hopf",
"doc_count": 1
}
]
Elasticsearch-Output accepts index template
elasticsearch {
protocol => "http"
host => "localhost"
index => "twitter"
document_type => "tweet"
template => "twitter_template.json"
template_name => "twitter"
}
Dynamic template
{
"template": "twitter",
[...]
"mappings": {
"tweet": {
"dynamic_templates" : [ {
[...]
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
}
}
}
} ]
[...]
Aggregate on username?
curl -XPOST "http://localhost:9200/twitter/_search" -d'
{
"aggs": {
"users": {
"terms": {
"field": "user.name.raw"
}
}
}
}'
Aggregate on username?
"buckets": [
{
"key": "Florian Hopf",
"doc_count": 1
}
]
- 29. - 01.10.2015
- 30.09.: Search Driven Applications
- 01.10.: Einführung in Elasticsearch